lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 15:12:53 +00:00

Author	SHA1	Message	Date
Gagan Bhullar	14677d7c18	fix: metric type inconsistency (#2122 ) PR fixes #2113 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-12 10:28:37 -07:00
Will Jones	7ac5f74c80	feat!: add variable store to embeddings registry (#2112 ) BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should not implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-02-24 15:52:19 -08:00
Lei Xu	bd82e1f66d	feat(python): add support for Azure OpenAPI SDK (#1906 ) Closes #1699	2024-12-04 13:09:38 -08:00
fzowl	f2e3989831	docs: voyageai embedding in the index (#1813 ) The code to support VoyageAI embedding and rerank models was added in the https://github.com/lancedb/lancedb/pull/1799 PR. Some of the documentation changes was also made, here adding the VoyageAI embedding doc link to the index page. These are my first PRs in lancedb and while i checked the documentation/code structure, i might missed something important. Please let me know if any changes required!	2024-11-18 14:34:16 -08:00
Will Jones	0fd8a50bd7	ci(node): run examples in CI (#1796 ) This is done as setup for a PR that will fix the OpenAI dependency issue. * [x] FTS examples * [x] Setup mock openai * [x] Ran `npm audit fix` * [x] sentences embeddings test * [x] Double check formatting of docs examples	2024-11-13 11:10:56 -08:00
fzowl	cbbc07d0f5	feat: voyageai support (#1799 ) Adding VoyageAI embedding and rerank support	2024-11-09 00:51:20 +05:30
Akash Saravanan	d6b5054778	feat(python): add support for trust_remote_code in hf embeddings (#1712 ) Resovles #1709. Adds `trust_remote_code` as a parameter to the `TransformersEmbeddingFunction` class with a default of False. Updated relevant documentation with the same.	2024-10-01 01:06:28 +05:30
Rithik Kumar	dcd5f51036	docs: add understand embeddings v1 (#1643 ) Before getting started with managing embeddings. Let's understand embeddings (LanceDB way) ![Screenshot 2024-09-14 012144](https://github.com/user-attachments/assets/7c5435dc-5316-47e9-8d7d-9994ab13b93d)	2024-09-14 02:07:00 +05:30
Philip Zeyliger	1d61717d0e	docs: fix get_registry() usage (#1601 ) Docs used `get_registry.get(...)` whereas what works is `get_registry().get(...)`. Fixing the two instances I found. I tested the open clip version by trying it locally in a Jupyter notebook.	2024-09-06 01:48:24 +05:30
Rithik Kumar	2bc7dca3ca	docs: add changes to Embeddings-> Available models-> overview page (#1596 ) adding features and improvements to - Manage Embeddings page Before: ![Screenshot 2024-09-04 223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0) After: ![Screenshot 2024-09-05 214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914) ![Screenshot 2024-09-05 213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de) ![Screenshot 2024-09-05 215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261)	2024-09-05 22:19:08 +05:30
Rithik Kumar	ae85008714	docs: revamp embedding models (#1568 ) before: ![Screenshot 2024-08-27 151525](https://github.com/user-attachments/assets/d4f8f2b9-37e6-4a31-b144-01b804019e11) After: ![Screenshot 2024-08-27 151550](https://github.com/user-attachments/assets/79fe7d27-8f14-4d80-9b41-a1e91f8c708f) --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-08-27 17:14:35 +05:30
Ayush Chaurasia	a88e9bb134	docs: add lancedb embedding fcn on cloud docs (#1521 )	2024-08-09 07:21:04 +05:30
Robby	8d2ff7b210	feat(python): add watsonx embeddings to registry (#1486 ) Related issue: https://github.com/lancedb/lancedb/issues/1412 --------- Co-authored-by: Robby <h0rv@users.noreply.github.com>	2024-08-06 10:58:33 +05:30
Cory Grinstead	a062a92f6b	docs: custom embedding function for ts (#1479 )	2024-07-30 18:19:55 -05:00
inn-0	cc507ca766	docs: add missing whitespace before markdown table to fix rendering issue (#1471 ) ### Fix markdown table rendering issue This PR adds a missing whitespace before a markdown table in the documentation. This issue causes the table to not render properly in mkdocs, while it does render properly in GitHub's markdown viewer. #### Change Details: - Added a single line of whitespace before the markdown table to ensure proper rendering in mkdocs. #### Note: - I wasn't able to test this fix in the mkdocs environment, but it should be safe as it only involves adding whitespace which won't break anything. --- Cohere supports following input types: \| Input Type \| Description \| \|-------------------------\|---------------------------------------\| \| "`search_document`" \| Used for embeddings stored in a vector\| \| \| database for search use-cases. \| \| "`search_query`" \| Used for embeddings of search queries \| \| \| run against a vector DB \| \| "`semantic_similarity`" \| Specifies the given text will be used \| \| \| for Semantic Textual Similarity (STS) \| \| "`classification`" \| Used for embeddings passed through a \| \| \| text classifier. \| \| "`clustering`" \| Used for the embeddings run through a \| \| \| clustering algorithm \| Usage Example:	2024-07-24 22:26:28 +05:30
Magnus	dc609a337d	fix: added support for trust_remote_code (#1454 ) Closes #1285 Added trust_remote_code to the SentenceTransformerEmbeddings class. Defaults to `False`	2024-07-18 19:37:52 +05:30
Ayush Chaurasia	bb2e624ff0	docs: add fine tuning section in retriever guide and minor fixes (#1438 )	2024-07-11 17:34:29 +05:30
Cory Grinstead	31be9212da	docs(nodejs): add @lancedb/lancedb examples everywhere (#1411 ) Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-10 13:29:03 -05:00
Joan Fontanals	cef24801f4	docs: add jina reranker to index (#1427 ) PR to add JinaReranker documentation page to the rerankers index	2024-07-09 14:39:35 +05:30
Joan Fontanals	08d25c5a80	feat: add Jina integration in Python for Embedding and Reranker (#1424 ) Integration of Jina Embeddings and Rerankers through its API	2024-07-05 01:34:43 +05:30
Sidharth Rajaram	48f8d1b3b7	docs: addresses typos in HF embedding example docs (#1415 ) * `table.add` requires `data` parameter on the docs page regarding use of embedding models from HF * also changed the name of example class from `TextModel` to `Words` since that is what is used as parameter in the `db.create_table` call * Per https://lancedb.github.io/lancedb/python/python/#lancedb.table.Table.add	2024-07-01 12:14:17 +05:30
Ayush Chaurasia	72f339a0b3	docs: add note about embedding api not being available on cloud (#1371 )	2024-06-09 03:57:23 +05:30
Ayush Chaurasia	76fc16c7a1	docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326 ) - Tried to address some onboarding feedbacks listed in https://github.com/lancedb/lancedb/issues/1224 - Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case) - Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking - Add some benchmarks for the above - Add missing cohere docs --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-06-08 06:25:31 +05:30
asmith26	3850d5fb35	Add ollama embeddings function (#1263 ) Following the docs [here](https://lancedb.github.io/lancedb/python/python/#lancedb.embeddings.openai.OpenAIEmbeddings) I've been trying to use ollama embedding via the OpenAI API interface, but unfortunately I couldn't get it to work (possibly related to https://github.com/ollama/ollama/issues/2416) Given the popularity of ollama I thought it could be helpful to have a dedicated Ollama Embedding function in lancedb. Very much welcome any thought on this or my code etc. Thanks!	2024-05-13 13:09:19 +05:30
asmith26	df48454b70	Update embedding_functions.md (#1250 ) `clip.ndims` seems to be a function (I installed with `pip install open_clip_torch`).	2024-05-01 09:33:42 -07:00
Alex Kohler	c1a7d65473	chore: fix get_registry call in baai embeddings example (#1230 )	2024-04-20 07:25:16 +05:30
Ayush Chaurasia	b039765d50	docs : Embedding functions quickstart and minor fixes (#1217 )	2024-04-11 17:30:45 +05:30
Raghav Dixit	1c41a00d87	Embeddings: HF model hub support added via transformers (#1154 )	2024-04-05 16:34:56 -07:00
Ayush Chaurasia	b916f5f132	docs: Add all available HF/sentence transformers embedding models list (#1134 ) Solves - https://github.com/lancedb/lancedb/issues/968	2024-04-05 16:34:38 -07:00
Raghav Dixit	765569425c	doc updates (#1085 ) closes #1084	2024-04-05 16:32:15 -07:00
Ivan Leo	89ce417452	Update default_embedding_functions.md (#1073 ) Added a small bit of documentation for the `dim` feature which is provided by the new `text-embedding-3` model series that allows users to shorten an embedding. Happy to discuss a bit on the phrasing but I struggled quite a bit with getting it to work so wanted to help others who might want to use the newer model too	2024-04-05 16:31:53 -07:00
Louis Guitton	7f9ef0d329	Fix default_embedding_functions.md (#1043 ) typo and broken table	2024-04-05 16:31:36 -07:00
Chang She	484a121866	doc: improve embedding functions documentation (#983 ) Got some user feedback that the `implicit` / `explicit` distinction is confusing. Instead I was thinking we would just deprecate the `with_embeddings` API and then organize working with embeddings into 3 buckets: 1. manually generate embeddings 2. use a provided embedding function 3. define your own custom embedding function	2024-04-05 16:30:40 -07:00
Ayush Chaurasia	510e8378bc	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-04-05 16:30:30 -07:00
Ayush Chaurasia	545a03d7f9	feat(python): Aws Bedrock embeddings integration (#822 ) Supports amazon titan, cohere english & cohere multi-lingual base models.	2024-04-05 16:28:56 -07:00
Prashanth Rao	4d5d748acd	docs: Updates and refactor (#683 ) This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:27:12 -07:00
Ayush Chaurasia	2f72d5138e	feat(python): Add gemini text embedding function (#806 ) Named it Gemini-text for now. Not sure how complicated it will be to support both text and multimodal embeddings under the same class "gemini"..But its not something to worry about for now I guess.	2024-04-05 16:25:52 -07:00
Chris	6698376f02	Minor Fixes to Ingest Embedding Functions Docs (#777 ) Addressed minor typos and grammatical issues to improve readability --------- Co-authored-by: Christopher Correa <chris.correa@gmail.com>	2024-04-05 16:24:47 -07:00
Vladimir Varankin	2fd829296e	Minor corrections for docs of embedding_functions (#780 ) In addition to #777, this pull request fixes more typos in the documentation for "Ingest Embedding Functions".	2024-04-05 16:24:47 -07:00
Bengsoon Chuah	e3ba5b2402	Add relevant imports for each step (#764 ) I found that it was quite incoherent to have to read through the documentation and having to search which submodule that each class should be imported from. For example, it is cumbersome to have to navigate to another documentation page to find out that `EmbeddingFunctionRegistry` is from `lancedb.embeddings`	2024-04-05 16:24:47 -07:00
elliottRobinson	3ab4b335c3	Update default_embedding_functions.md (#744 ) Modify some grammar, punctuation, and spelling errors.	2024-04-05 16:24:47 -07:00
Ayush Chaurasia	088792c821	[Docs]: Add Instructor embeddings and rate limit handler docs (#651 )	2024-04-05 16:23:49 -07:00
Ayush Chaurasia	1c42894918	[DOCS][PYTHON] Update embeddings API docs & Example (#516 ) This PR adds an overview of embeddings docs: - 2 ways to vectorize your data using lancedb - explicit & implicit - explicit - manually vectorize your data using `wit_embedding` function - Implicit - automatically vectorize your data as it comes by ingesting your embedding function details as table metadata - Multi-modal example w/ disappearing embedding function	2024-04-05 16:22:59 -07:00

43 Commits