lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-23 05:19:58 +00:00

Author	SHA1	Message	Date
QianZhu	17c9e9afea	docs: add async examples to doc (#1941 ) - added sync and async tabs for python examples - moved python code to tests/docs --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-01-07 15:10:25 -08:00
BubbleCal	e70fd4fecc	feat: support IVF_FLAT, binary vectors and hamming distance (#1955 ) binary vectors and hamming distance can work on only IVF_FLAT, so introduce them all in this PR. --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-12-24 10:36:20 -08:00
Prashant Dixit	b3bf6386c3	docs: rag section in guide (#1619 ) This PR adds the RAG section in the Guides. It includes all the RAGs with code snippet and some advanced techniques which improves RAG.	2024-09-11 21:13:55 +05:30
Rithik Kumar	ae85008714	docs: revamp embedding models (#1568 ) before: ![Screenshot 2024-08-27 151525](https://github.com/user-attachments/assets/d4f8f2b9-37e6-4a31-b144-01b804019e11) After: ![Screenshot 2024-08-27 151550](https://github.com/user-attachments/assets/79fe7d27-8f14-4d80-9b41-a1e91f8c708f) --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-27 17:14:35 +05:30
Lei Xu	5857cb4c6e	docs: add a section to describe scalar index (#1495 )	2024-08-16 18:48:29 -07:00
Raghav Dixit	96914a619b	docs: llama-index integration (#1347 ) Updated api refrence and usage for llama index integration.	2024-06-09 23:52:18 +05:30
Ayush Chaurasia	76fc16c7a1	docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326 ) - Tried to address some onboarding feedbacks listed in https://github.com/lancedb/lancedb/issues/1224 - Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case) - Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking - Add some benchmarks for the above - Add missing cohere docs --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-06-08 06:25:31 +05:30
Raghav Dixit	c9d5475333	Documentation: Langchain Integration (#1297 ) Integration doc update	2024-05-13 10:19:33 -04:00
Ayush Chaurasia	44c03ebef3	docs : Update Reranking docs (#1213 )	2024-04-11 15:20:00 +05:30
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
Ayush Chaurasia	510e8378bc	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-04-05 16:30:30 -07:00
Ayush Chaurasia	a41f7be88d	feat(python): Hybrid search & Reranker API (#824 ) based on https://github.com/lancedb/lancedb/pull/713 - The Reranker api can be plugged into vector only or fts only search but this PR doesn't do that (see example - https://txt.cohere.com/rerank/) ### Default reranker -- `LinearCombinationReranker(weight=0.7, fill=1.0)` ``` table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas() ``` ### Available rerankers LinearCombinationReranker ``` from lancedb.rerankers import LinearCombinationReranker # Same as default table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=LinearCombinationReranker() ).to_pandas() # with custom params reranker = LinearCombinationReranker(weight=0.3, fill=1.0) table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=reranker ).to_pandas() ``` Cohere Reranker ``` from lancedb.rerankers import CohereReranker # default model.. English and multi-lingual supported. See docstring for available custom params table.search("hello", query_type="hybrid").rerank( normalize="rank", # score or rank reranker=CohereReranker() ).to_pandas() ``` CrossEncoderReranker ``` from lancedb.rerankers import CrossEncoderReranker table.search("hello", query_type="hybrid").rerank( normalize="rank", reranker=CrossEncoderReranker() ).to_pandas() ``` ## Using custom Reranker ``` from lancedb.reranker import Reranker class CustomReranker(Reranker): def rerank_hybrid(self, vector_result, fts_result): combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic # Custom rerank logic here return combined_res ``` - [x] Expand testing - [x] Make sure usage makes sense - [x] Run simple benchmarks for correctness (Seeing weird result from cohere reranker in the toy example) - Support diverse rerankers by default: - [x] Cross encoding - [x] Cohere - [x] Reciprocal Rank Fusion --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:28:56 -07:00
Raghav Dixit	472344fcb3	feat(python): Embedding fn support for gte-mlx/gte-large (#873 ) have added testing and an example in the docstring, will be pushing a separate PR in recipe repo for rag example --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-04-05 16:28:56 -07:00
Prashanth Rao	4d5d748acd	docs: Updates and refactor (#683 ) This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:27:12 -07:00
Lei Xu	b5e57ebce3	doc: add doc to use GPU for indexing (#611 )	2024-04-05 16:22:59 -07:00
Ayush Chaurasia	1c42894918	[DOCS][PYTHON] Update embeddings API docs & Example (#516 ) This PR adds an overview of embeddings docs: - 2 ways to vectorize your data using lancedb - explicit & implicit - explicit - manually vectorize your data using `wit_embedding` function - Implicit - automatically vectorize your data as it comes by ingesting your embedding function details as table metadata - Multi-modal example w/ disappearing embedding function	2024-04-05 16:22:59 -07:00
Ayush Chaurasia	52fa7f5577	[Docs] Small typo fixes (#460 )	2023-09-02 22:17:19 +05:30
Tevin Wang	b8f32d082f	Clean up docs testing - exclude by glob instead of by file (#450 )	2023-08-24 07:30:37 +05:30
Ayush Chaurasia	74ef141b9c	[Docs] add Tables guide (#381 ) * Rename "Reference" -> "Guides" to create distinction b/w api reference and user facing docs * Add all the various ways to create, add and delete from table Related - https://github.com/lancedb/lancedb/pull/391	2023-08-06 12:34:08 +05:30
Ayush Chaurasia	15f4787cc8	[Docs]: Add badges, CTA and updates examples (#358 ) <img width="1054" alt="Screenshot 2023-07-24 at 6 13 00 PM" src="https://github.com/lancedb/lancedb/assets/15766192/a263a17e-66d0-4591-adc7-b520aa5b23f6"> Is this a problem? Are we using metadata to track usage or something?	2023-07-26 16:35:46 +05:30
Tevin Wang	b731a6aed9	Add docs code testing & documentation syntax changes (#196 ) - Creates testing files `md_testing.py` and `md_testing.js` for testing python and nodejs code in markdown files in the documentation This listens for HTML tags as well: `<!--[language] code code code...-->` will create a set-up file to create some mock tables or to fulfill some assumptions in the documentation. - Creates a github action workflow that triggers every push/pr to `docs/**` - Modifies documentation so tests run (mostly indentation, some small syntax errors and some missing imports) A list of excluded files that we need to take a closer look at later on: ```javascript const excludedFiles = [ "../src/fts.md", "../src/embedding.md", "../src/examples/serverless_lancedb_with_s3_and_lambda.md", "../src/examples/serverless_qa_bot_with_modal_and_langchain.md", "../src/examples/youtube_transcript_bot_with_nodejs.md", ]; ``` Many of them can't be done because we need the OpenAI API key :(. `fts.md` has some issues with the library, I believe this is still experimental? Closes #170 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2023-06-28 11:07:26 -07:00

21 Commits