lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 14:29:56 +00:00

Author	SHA1	Message	Date
josca42	0fe844034d	feat: enable stemming (#1356 ) Added the ability to specify tokenizer_name, when creating a full text search index using tantivy. This enables the use of language specific stemming. Also updated the [guide on full text search](https://lancedb.github.io/lancedb/fts/) with a short section on choosing tokenizer. Fixes #1315	2024-06-20 14:23:55 -07:00
Raghav Dixit	96914a619b	docs: llama-index integration (#1347 ) Updated api refrence and usage for llama index integration.	2024-06-09 23:52:18 +05:30
Ayush Chaurasia	72f339a0b3	docs: add note about embedding api not being available on cloud (#1371 )	2024-06-09 03:57:23 +05:30
Ayush Chaurasia	5e30648f45	docs: fix example path (#1367 )	2024-06-07 19:40:50 -07:00
Ayush Chaurasia	76fc16c7a1	docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326 ) - Tried to address some onboarding feedbacks listed in https://github.com/lancedb/lancedb/issues/1224 - Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case) - Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking - Add some benchmarks for the above - Add missing cohere docs --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-06-08 06:25:31 +05:30
Will Jones	5349e8b1db	ci: make preview releases (#1302 ) This PR changes the release process. Some parts are more complex, and other parts I've simplified. ## Simplifications * Combined `Create Release Commit` and `Create Python Release Commit` into a single workflow. By default, it does a release of all packages, but you can still choose to make just a Python or just Node/Rust release through the arguments. This will make it rarer that we create a Node release but forget about Python or vice-versa. * Releases are automatically generated once a tag is pushed. This eliminates the manual step of creating the release. * Release notes are automatically generated and changes are categorized based on the PR labels. * Removed the use of `LANCEDB_RELEASE_TOKEN` in favor of just using `GITHUB_TOKEN` where it wasn't necessary. In the one place it is necessary, I left a comment as to why it is. * Reused the version in `python/Cargo.toml` so we don't have two different versions in Python LanceDB. ## New changes * We now can create `preview` / `beta` releases. By default `Create Release Commit` will create a preview release, but you can select a "stable" release type and it will create a full stable release. * For Python, pre-releases go to fury.io instead of PyPI * `bump2version` was deprecated, so upgraded to `bump-my-version`. This also seems to better support semantic versioning with pre-releases. * `ci` changes will now be shown in the changelog, allowing changes like this to be visible to users. `chore` is still hidden. ## Versioning NOTE: unlike how it is in lance repo right now, the version in main is the last one released, including beta versions. --------- Co-authored-by: Lance Release <lance-dev@lancedb.com> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-05-17 11:24:38 -07:00
Raghav Dixit	0bd6ac945e	Documentation : Langchain doc bug fix (#1301 ) nav bar update	2024-05-13 20:56:34 +05:30
Raghav Dixit	c9d5475333	Documentation: Langchain Integration (#1297 ) Integration doc update	2024-05-13 10:19:33 -04:00
asmith26	3850d5fb35	Add ollama embeddings function (#1263 ) Following the docs [here](https://lancedb.github.io/lancedb/python/python/#lancedb.embeddings.openai.OpenAIEmbeddings) I've been trying to use ollama embedding via the OpenAI API interface, but unfortunately I couldn't get it to work (possibly related to https://github.com/ollama/ollama/issues/2416) Given the popularity of ollama I thought it could be helpful to have a dedicated Ollama Embedding function in lancedb. Very much welcome any thought on this or my code etc. Thanks!	2024-05-13 13:09:19 +05:30
Will Jones	becd649130	docs: add tip about using allow_http on local servers (#1277 ) Based on user question https://discord.com/channels/1030247538198061086/1197630499926057021/1237350091191222293	2024-05-07 10:15:26 -07:00
Nehil Jain	e933de003d	fix: Docs for embed_func fixed in youtube transcript search notebook (#1269 ) Fixes issue https://github.com/lancedb/lancedb/issues/1268	2024-05-06 11:48:25 +05:30
asmith26	df48454b70	Update embedding_functions.md (#1250 ) `clip.ndims` seems to be a function (I installed with `pip install open_clip_torch`).	2024-05-01 09:33:42 -07:00
Alex Kohler	c1a7d65473	chore: fix get_registry call in baai embeddings example (#1230 )	2024-04-20 07:25:16 +05:30
Weston Pace	deb947ddbd	doc: fix typo, broken links (#1218 )	2024-04-11 14:58:51 -07:00
Ayush Chaurasia	b039765d50	docs : Embedding functions quickstart and minor fixes (#1217 )	2024-04-11 17:30:45 +05:30
Prashanth Rao	d155e82723	[docs] Fix broken links and clarify language in integrations docs (#1209 ) This PR does the following: - Fixes broken/outdated URLs - Adds clarity to the way DuckDB/LanceDB integration works via Arrow	2024-04-11 15:32:08 +05:30
Ayush Chaurasia	44c03ebef3	docs : Update Reranking docs (#1213 )	2024-04-11 15:20:00 +05:30
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
Pranav Maddi	2b132a0bef	Fix markdown formatting (#1188 )	2024-04-05 16:35:10 -07:00
Raghav Dixit	1c41a00d87	Embeddings: HF model hub support added via transformers (#1154 )	2024-04-05 16:34:56 -07:00
eduardjbotha	f749b8808f	SQL Documentation includes DataFusion functions (#1179 ) Show that it is possible to use the DataFusion functions in the `WHERE` clause. Co-authored-by: Eduard Botha <eduard.botha@inovex.de>	2024-04-05 16:34:50 -07:00
Lei Xu	7e5a54b76a	chore: add social link footer (#1177 )	2024-04-05 16:34:50 -07:00
Weston Pace	e21b56293c	docs: add a reference to @lancedb/lance in the docs (#1166 ) We aren't yet ready to switch over the examples since almost all JS examples rely on embeddings and we haven't yet ported those over. However, this makes it possible for those that are interested to start using `@lancedb/lancedb`	2024-04-05 16:34:39 -07:00
Bert	bb179981dd	added new logo to vercel example gif (#1158 )	2024-04-05 16:34:38 -07:00
Bert	2e1f1c6d5d	New logo on docs site (#1157 )	2024-04-05 16:34:38 -07:00
Ayush Chaurasia	b916f5f132	docs: Add all available HF/sentence transformers embedding models list (#1134 ) Solves - https://github.com/lancedb/lancedb/issues/968	2024-04-05 16:34:38 -07:00
Weston Pace	f97c7dad8c	docs: add the async python API to the docs (#1156 )	2024-04-05 16:34:37 -07:00
Pranav Maddi	479289dd38	Adds a Ask LanceDB button to docs. (#1150 ) This links out to the new [asklancedb.com](https://asklancedb.com) page. Screenshots of the change: ![Quick start - LanceDB · 10 20am · 03-22](https://github.com/lancedb/lancedb/assets/2371511/c45ba893-fc74-4957-bdd3-3712b351aff3) ![Quick start - LanceDB](https://github.com/lancedb/lancedb/assets/2371511/d4762eb6-52af-4fd5-857e-3ed280716999)	2024-04-05 16:33:37 -07:00
natcharacter	f6e9f8e3f4	Order by field support FTS (#1132 ) This PR adds support for passing through a set of ordering fields at index time (unsigned ints that tantivity can use as fast_fields) that at query time you can sort your results on. This is useful for cases where you want to get related hits, i.e by keyword, but order those hits by some other score, such as popularity. I.e search for songs descriptions that match on "sad AND jazz AND 1920" and then order those by number of times played. Example usage can be seen in the fts tests. --------- Co-authored-by: Nat Roth <natroth@Nats-MacBook-Pro.local> Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:33:36 -07:00
Weston Pace	0fe0976a0e	docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131 )	2024-04-05 16:33:05 -07:00
vincent d warmerdam	85a9ef472f	Unhide Pydantic guides in Docs (#1122 ) @wjones127 after fixing https://github.com/lancedb/lancedb/issues/1112 I noticed something else on the docs. There's an odd chunk of the docs missing [here](https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe). I can see the heading, but after clicking it the contents don't show. ![CleanShot 2024-03-15 at 23 40 17@2x](https://github.com/lancedb/lancedb/assets/1019791/04784b19-0200-4c3f-ae17-7a8f871ef9bd) Apon inspection it was a markdown issue, one tab too many on a whole segment. This PR fixes it. It looks like this now and the sections appear again: ![CleanShot 2024-03-15 at 23 42 32@2x](https://github.com/lancedb/lancedb/assets/1019791/c5aaec4c-1c37-474d-9fb0-641f4cf52626)	2024-04-05 16:32:47 -07:00
vincent d warmerdam	b9afc01cfd	Explain vonoroi seed initalisation (#1114 ) This PR fixes https://github.com/lancedb/lancedb/issues/1112. It turned out that K-means is currently used internally, so I figured adding that context to the docs would be nice.	2024-04-05 16:32:31 -07:00
Raghav Dixit	765569425c	doc updates (#1085 ) closes #1084	2024-04-05 16:32:15 -07:00
Ivan Leo	89ce417452	Update default_embedding_functions.md (#1073 ) Added a small bit of documentation for the `dim` feature which is provided by the new `text-embedding-3` model series that allows users to shorten an embedding. Happy to discuss a bit on the phrasing but I struggled quite a bit with getting it to work so wanted to help others who might want to use the newer model too	2024-04-05 16:31:53 -07:00
Chang She	10089481c0	doc(python): document the method in fts (#982 ) Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:31:45 -07:00
Chang She	a7dbe933dc	chore(python): use pypi tantivy to speed up CI (#987 )	2024-04-05 16:31:36 -07:00
Prashanth Rao	f9c244e608	[docs]: Fix issues with Rust code snippets in "quick start" (#1047 ) The renaming of `vectordb` to `lancedb` broke the [quick start docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's pointing to a non-existent directory). This PR fixes the code snippets and the paths in the docs page. Additionally, more fixes related to indexing docs below 👇🏽.	2024-04-05 16:31:36 -07:00
Louis Guitton	7f9ef0d329	Fix default_embedding_functions.md (#1043 ) typo and broken table	2024-04-05 16:31:36 -07:00
Chang She	a3761f4209	doc: fix langchain link (#1053 )	2024-04-05 16:31:36 -07:00
Will Jones	c5b0934bfb	feat(node): add `read_consistency_interval` to Node and Rust (#1002 ) This PR adds the same consistency semantics as was added in #828. It does not add the same lazy-loading of tables, since that breaks some existing tests. This closes #998. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:30:40 -07:00
Ayush Chaurasia	538d0320f7	Docs: add meta tags (#1006 )	2024-04-05 16:30:40 -07:00
Johannes Kolbe	32bfb68ac3	apply fixes for notebook (#989 )	2024-04-05 16:30:40 -07:00
Ayush Chaurasia	bc871169f0	docs: Add meta tag for image preview (#988 ) I think this should work. Need to deploy it to be sure as it can be tested locally. Can be tested here. 2 things about this solution: * All pages have a same meta tag, i.e, lancedb banner * If needed, we can automatically use the first image of each page and generate meta tags using the ultralytics mkdocs plugin that we did for this purpose - https://github.com/ultralytics/mkdocs	2024-04-05 16:30:40 -07:00
Chang She	3fc835e124	doc: update navigation links for embedding functions (#986 )	2024-04-05 16:30:40 -07:00
Chang She	484a121866	doc: improve embedding functions documentation (#983 ) Got some user feedback that the `implicit` / `explicit` distinction is confusing. Instead I was thinking we would just deprecate the `with_embeddings` API and then organize working with embeddings into 3 buckets: 1. manually generate embeddings 2. use a provided embedding function 3. define your own custom embedding function	2024-04-05 16:30:40 -07:00
Will Jones	f84a4855ca	docs: show DuckDB with dataset, not table (#974 ) Using datasets is preferred way to allow filter and projection pushdown, as well as aggregated larger-than-memory tables.	2024-04-05 16:30:40 -07:00
Ayush Chaurasia	aecafa6479	docs: Minimal reranking evaluation benchmarks (#977 )	2024-04-05 16:30:40 -07:00
Prashanth Rao	b014c24e66	[docs]: Fix typos and clarity in hybrid search docs (#966 ) - Fixed typos and added some clarity to the hybrid search docs - Changed "Airbnb" case to be as per the [official company name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be capitalized", and the text in the document aligns with this - Fixed headers in nav bar	2024-04-05 16:30:30 -07:00
Ayush Chaurasia	f78fe721db	docs: Add setup cell for colab example (#965 )	2024-04-05 16:30:30 -07:00
Ayush Chaurasia	510e8378bc	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-04-05 16:30:30 -07:00

1 2 3 4 5

231 Commits