lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-22 21:09:58 +00:00

Author	SHA1	Message	Date
LuQQiu	e118c37228	ci: enable java auto release (#1602 ) Enable bump java pom.xml versions Enable auto java release when detect stable github release	2024-09-19 10:51:03 -07:00
LuQQiu	abeaae3d80	feat!: upgrade Lance to 0.18.0 (#1657 ) BREAKING CHANGE: default file format changed to Lance v2.0. Upgrade Lance to 0.18.0 Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0	2024-09-19 10:50:26 -07:00
Gagan Bhullar	b3c0227065	docs: hnsw documentation (#1640 ) PR closes #1627 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-19 10:32:46 -07:00
Will Jones	521e665f57	feat(rust): remote client write data endpoint (#1645 ) * Implements: * Add * Update * Delete * Merge-Insert --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-09-18 15:02:56 -07:00
Will Jones	ffb28dd4fc	feat(rust): remote endpoints for schema, version, count_rows (#1644 ) A handful of additional endpoints.	2024-09-16 08:19:25 -07:00
Lei Xu	32af962c0c	feat: fix creating empty table and creating table by a list of RecordBatch for remote python sdk (#1650 ) Closes #1637	2024-09-14 11:33:34 -07:00
Ayush Chaurasia	18484d0b6c	fix: allow pass optional args in colbert reranker (#1649 ) Fixes https://github.com/lancedb/lancedb/issues/1641	2024-09-14 11:18:09 -07:00
Lei Xu	c02ee3c80c	chore: make remote client a context manager (#1648 ) Allow `RemoteLanceDBClient` to be used as context manager	2024-09-13 22:08:48 -07:00
Rithik Kumar	dcd5f51036	docs: add understand embeddings v1 (#1643 ) Before getting started with managing embeddings. Let's understand embeddings (LanceDB way) ![Screenshot 2024-09-14 012144](https://github.com/user-attachments/assets/7c5435dc-5316-47e9-8d7d-9994ab13b93d)	2024-09-14 02:07:00 +05:30
Sayandip Dutta	9b8472850e	fix: unterminated string literal on table update (#1573 ) resolves #1429 (python) ```python - return f"'{value}'" + return f'"{value}"' ``` --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-13 12:32:59 -07:00
Sayandip Dutta	36d05ea641	fix: add appropriate QueryBuilder overloads to LanceTable.search (#1558 ) - Add overloads to Table.search, to preserve the return information of different types of QueryBuilder objects for LanceTable - Fix fts_column type annotation by including making it `Optional` resolves #1550 --------- Co-authored-by: sayandip-dutta <sayandip.dutta@nevaehtech.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-13 12:32:30 -07:00
LuQQiu	7ed86cadfb	feat(node): let NODE API region default to us-east-1 (#1631 ) Fixes #1622 To sync with python API	2024-09-13 11:48:57 -07:00
Will Jones	1c123b58d8	feat: implement Remote connection for LanceDB Rust (#1639 ) * Adding a simple test facility, which allows you to mock a single endpoint at a time with a closure. * Implementing all the database-level endpoints Table-level APIs will be done in a follow up PR. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-09-13 10:53:27 -07:00
BubbleCal	bf7d2d6fb0	docs: update FTS docs for JS SDK (#1634 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-13 05:48:29 -07:00
LuQQiu	c7732585bf	fix: support pyarrow input types (#1628 ) fixes #1625 Support PyArrow.RecordBatch, pa.dataset.Dataset, pa.dataset.Scanner, paRecordBatchReader	2024-09-12 10:59:18 -07:00
Prashant Dixit	b3bf6386c3	docs: rag section in guide (#1619 ) This PR adds the RAG section in the Guides. It includes all the RAGs with code snippet and some advanced techniques which improves RAG.	2024-09-11 21:13:55 +05:30
BubbleCal	4b79db72bf	docs: improve the docs and API param name (#1629 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-11 10:18:29 +08:00
Lance Release	622a2922e2	Updating package-lock.json	2024-09-10 20:12:54 +00:00
Lance Release	c91221d710	Bump version: 0.10.0-beta.2 → 0.10.0 v0.10.0	2024-09-10 20:12:41 +00:00
Lance Release	56da5ebd13	Bump version: 0.10.0-beta.1 → 0.10.0-beta.2	2024-09-10 20:12:40 +00:00
Lance Release	64eb43229d	Bump version: 0.13.0-beta.2 → 0.13.0 python-v0.13.0	2024-09-10 20:12:35 +00:00
Lance Release	c31c92122f	Bump version: 0.13.0-beta.1 → 0.13.0-beta.2	2024-09-10 20:12:35 +00:00
Gagan Bhullar	205fc530cf	feat: expose hnsw indices (#1595 ) PR closes #1522 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-10 11:08:13 -07:00
BubbleCal	2bde5401eb	feat: support to build FTS without positions (#1621 )	2024-09-10 22:51:32 +08:00
Antonio Molner Domenech	a405847f9b	fix(python): remove unmaintained ratelimiter dependency (#1603 ) The `ratelimiter` package hasn't been updated in ages and is no longer maintained. This PR removes the dependency on `ratelimiter` and replaces it with a custom rate limiter implementation. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-09 12:35:53 -07:00
Gagan Bhullar	bcc19665ce	feat(nodejs): expose offset (#1620 ) PR closes #1555	2024-09-09 11:54:40 -07:00
Will Jones	2a6586d6fb	feat: add flag to enable faster manifest paths (#1612 ) The new V2 manifest path scheme makes discovering the latest version of a table constant time on object stores, regardless of the number of versions in the table. See benchmarks in the PR here: https://github.com/lancedb/lance/pull/2798 Closes #1583	2024-09-09 11:34:36 -07:00
James Wu	029b01bbbf	feat: enable phrase_query(bool) for hybrid search queries (#1578 ) first off, apologies for any folly since i'm new to contributing to lancedb. this PR is the continuation of [a discord thread](https://discord.com/channels/1030247538198061086/1030247538667827251/1278844345713299599): ## user story here's the lance db search query i'd like to run: ``` def search(phrase): logger.info(f'Searching for phrase: {phrase}') phrase_embedding = get_embedding(phrase) df = (table.search((phrase_embedding, phrase), query_type='hybrid') .limit(10).to_list()) logger.info(f'Success search with row count: {len(df)}') search('howdy (howdy)') search('howdy(howdy)') ``` the second search fails due to `ValueError: Syntax Error: howdy(howdy)` i saw on the [docs](https://lancedb.github.io/lancedb/fts/#phrase-queries-vs-terms-queries) that i can use `phrase_query()` to [enable a flag](https://github.com/lancedb/lancedb/blob/main/python/python/lancedb/query.py#L790-L792) to wrap the query in double quotes (as well as sanitize single quotes) prior to sending the query to search. this works for [normal FTS](https://lancedb.github.io/lancedb/fts/), but the command is unavailable on [hybrid search](https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/). ## changes i added `phrase_query()` function to `LanceHybridQueryBuilder` by propagating the call down to its `self. _fts_query` object. i'm not too familiar with the codebase and am not sure if this is the best way to implement the functionality. feel free to riff on this PR or discard ## tests ``` (lancedb) JamesMPB:python james$ pwd /Users/james/src/lancedb/python (lancedb) JamesMPB:python james$ pytest python/tests/test_table.py python/tests/test_table.py ....................................... [100%] ====================================================== 39 passed, 1 warning in 2.23s ======================================================= ```	2024-09-07 08:58:05 +05:30
Will Jones	cd32944e54	feat: upgrade lance to v0.17.0 (#1608 ) Changelog: https://github.com/lancedb/lance/releases/tag/v0.17.0 Highlights: * You can do "phrase queries" by adding double quotes around phrases (multiple tokens) in FTS. Added follow ups in: https://github.com/lancedb/lancedb/issues/1611	2024-09-06 14:10:02 -07:00
Jon X	7eb3b52297	docs: added a blank line between a paragraph and a list block (#1604 ) Though the markdown can be rendered well on GitHub (GFM style?), but it seems that it's required to insert a blank line between a paragraph and a list block to make it render well with `mkdocs`? see also the web page: https://lancedb.github.io/lancedb/concepts/index_hnsw/	2024-09-06 09:38:19 +05:30
BubbleCal	8dcd328dce	feat: support to create table from record batch iterator (#1593 )	2024-09-06 10:41:38 +08:00
Philip Zeyliger	1d61717d0e	docs: fix get_registry() usage (#1601 ) Docs used `get_registry.get(...)` whereas what works is `get_registry().get(...)`. Fixing the two instances I found. I tested the open clip version by trying it locally in a Jupyter notebook.	2024-09-06 01:48:24 +05:30
Lei Xu	4ee7225e91	ci: public java package (#1485 ) Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2024-09-05 11:48:48 -07:00
Rithik Kumar	2bc7dca3ca	docs: add changes to Embeddings-> Available models-> overview page (#1596 ) adding features and improvements to - Manage Embeddings page Before: ![Screenshot 2024-09-04 223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0) After: ![Screenshot 2024-09-05 214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914) ![Screenshot 2024-09-05 213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de) ![Screenshot 2024-09-05 215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261)	2024-09-05 22:19:08 +05:30
Gagan Bhullar	b24810a011	feat(python, rust): expose offset in query (#1556 ) PR is part of #1555	2024-09-05 08:33:07 -07:00
Jon X	2b8e872be0	docs: removed the unnecessary fence code tag (#1599 )	2024-09-05 14:40:38 +05:30
Ayush Chaurasia	03ef1dc081	feat: update default reranker to RRF (#1580 ) - Both LinearCombination (the current default) and RRF are pretty fast compared to model based rerankers. RRF is slightly faster. - In our tests RRF has also been slightly more accurate. This PR: - Makes RRF the default reranker - Removed duplicate docs for rerankers	2024-09-03 14:00:13 +05:30
Rithik Kumar	fde636ca2e	docs: fix links - quick start to embedding (#1591 )	2024-09-02 21:55:35 +05:30
Ayush Chaurasia	51966a84f5	docs: add multi-vector reranking, answerdotai and studies section (#1579 )	2024-08-31 04:09:14 +05:30
Rithik Kumar	38015ffa7c	docs: improve overall language on all example pages (#1582 ) Refine and improve the language clarity and quality across all example pages in the documentation to ensure better understanding and readability. --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-31 03:48:11 +05:30
Ayush Chaurasia	dc72ece847	feat!: better api for manual hybrid queries (#1575 ) Currently, the only documented way of performing hybrid search is by using embedding API and passing string queries that get automatically embedded. There are use cases where users might like to pass vectors and text manually instead. This ticket contains more information and historical context - https://github.com/lancedb/lancedb/issues/937 This breaks a undocumented pathway that allowed passing (vector, text) tuple queries which was intended to be temporary, so this is marked as a breaking change. For all practical purposes, this should not really impact most users ### usage ``` results = table.search(query_type="hybrid") .vector(vector_query) .text(text_query) .limit(5) .to_pandas() ```	2024-08-30 17:37:58 +05:30
BubbleCal	1521435193	fix: specify column to search for FTS (#1572 ) Before this we ignored the `fts_columns` parameter, and for now we support to search on only one column, it could lead to an error if we have multiple indexed columns for FTS --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-29 23:43:46 +08:00
Ayush Chaurasia	bfe8fccfab	docs: add hnsw docs (#1570 )	2024-08-29 15:16:27 +05:30
Rithik Kumar	6f6eb170a9	docs: revamp Python example: Overview page and remove redundant examples and notebooks (#1574 ) before: ![Screenshot 2024-08-29 131656](https://github.com/user-attachments/assets/81cb5d70-5dff-4e57-8bbe-3461327aed7d) After: ![Screenshot 2024-08-29 131715](https://github.com/user-attachments/assets/62109a37-7f66-4fd4-90ed-906a85472117) --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-29 13:48:10 +05:30
Rithik Kumar	dd1c16bbaf	docs: fix links, convert backslash to forward slash in mkdocs.yml (#1571 ) Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-28 16:07:57 +05:30
Gagan Bhullar	a76186ee83	fix(node): read consistency level fix (#1567 ) PR fixes #1565	2024-08-27 17:03:42 -07:00
Rithik Kumar	ae85008714	docs: revamp embedding models (#1568 ) before: ![Screenshot 2024-08-27 151525](https://github.com/user-attachments/assets/d4f8f2b9-37e6-4a31-b144-01b804019e11) After: ![Screenshot 2024-08-27 151550](https://github.com/user-attachments/assets/79fe7d27-8f14-4d80-9b41-a1e91f8c708f) --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-27 17:14:35 +05:30
Gagan Bhullar	a85f039352	fix(bug): limit fix (#1548 ) PR fixes #1151	2024-08-26 14:25:14 -07:00
Bill Chambers	9c25998110	docs: update serverless_lancedb_with_s3_and_lambda.md (#1559 )	2024-08-26 14:55:28 +05:30
Ayush Chaurasia	549ca51a8a	feat: add answerdotai rerankers support and minor improvements (#1560 ) This PR: - Adds missing license headers - Integrates with answerdotai Rerankers package - Updates ColbertReranker to subclass answerdotai package. This is done to keep backwards compatibility as some users might be used to importing ColbertReranker directly - Set `trust_remote_code` to ` True` by default in CrossEncoder and sentence-transformer based rerankers	2024-08-26 13:25:10 +05:30

1 2 3 4 5 ...

1219 Commits