417 Commits

Author SHA1 Message Date
Ayush Chaurasia
b039765d50 docs : Embedding functions quickstart and minor fixes (#1217) 2024-04-11 17:30:45 +05:30
Prashanth Rao
d155e82723 [docs] Fix broken links and clarify language in integrations docs (#1209)
This PR does the following:

- Fixes broken/outdated URLs
- Adds clarity to the way DuckDB/LanceDB integration works via Arrow
2024-04-11 15:32:08 +05:30
Ayush Chaurasia
44c03ebef3 docs : Update Reranking docs (#1213) 2024-04-11 15:20:00 +05:30
Will Jones
1d23af213b feat: expose storage options in LanceDB (#1204)
Exposes `storage_options` in LanceDB. This is provided for Python async,
Node `lancedb`, and Node `vectordb` (and Rust of course). Python
synchronous is omitted because it's not compatible with the PyArrow
filesystems we use there currently. In the future, we will move the sync
API to wrap the async one, and then it will get support for
`storage_options`.

1. Fixes #1168
2. Closes #1165
3. Closes #1082
4. Closes #439
5. Closes #897
6. Closes #642
7. Closes #281
8. Closes #114
9. Closes #990
10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged
to use `storageOptions` instead.
2024-04-10 10:12:04 -07:00
Pranav Maddi
2b132a0bef Fix markdown formatting (#1188) 2024-04-05 16:35:10 -07:00
Raghav Dixit
1c41a00d87 Embeddings: HF model hub support added via transformers (#1154) 2024-04-05 16:34:56 -07:00
eduardjbotha
f749b8808f SQL Documentation includes DataFusion functions (#1179)
Show that it is possible to use the DataFusion functions in the `WHERE`
clause.

Co-authored-by: Eduard Botha <eduard.botha@inovex.de>
2024-04-05 16:34:50 -07:00
Lei Xu
7e5a54b76a chore: add social link footer (#1177) 2024-04-05 16:34:50 -07:00
Weston Pace
e21b56293c docs: add a reference to @lancedb/lance in the docs (#1166)
We aren't yet ready to switch over the examples since almost all JS
examples rely on embeddings and we haven't yet ported those over.
However, this makes it possible for those that are interested to start
using `@lancedb/lancedb`
2024-04-05 16:34:39 -07:00
Bert
bb179981dd added new logo to vercel example gif (#1158) 2024-04-05 16:34:38 -07:00
Bert
2e1f1c6d5d New logo on docs site (#1157) 2024-04-05 16:34:38 -07:00
Ayush Chaurasia
b916f5f132 docs: Add all available HF/sentence transformers embedding models list (#1134)
Solves -  https://github.com/lancedb/lancedb/issues/968
2024-04-05 16:34:38 -07:00
Weston Pace
f97c7dad8c docs: add the async python API to the docs (#1156) 2024-04-05 16:34:37 -07:00
Pranav Maddi
479289dd38 Adds a Ask LanceDB button to docs. (#1150)
This links out to the new [asklancedb.com](https://asklancedb.com) page.

Screenshots of the change:

![Quick start - LanceDB · 10 20am ·
03-22](https://github.com/lancedb/lancedb/assets/2371511/c45ba893-fc74-4957-bdd3-3712b351aff3)
![Quick start -
LanceDB](https://github.com/lancedb/lancedb/assets/2371511/d4762eb6-52af-4fd5-857e-3ed280716999)
2024-04-05 16:33:37 -07:00
natcharacter
f6e9f8e3f4 Order by field support FTS (#1132)
This PR adds support for passing through a set of ordering fields at
index time (unsigned ints that tantivity can use as fast_fields) that at
query time you can sort your results on. This is useful for cases where
you want to get related hits, i.e by keyword, but order those hits by
some other score, such as popularity.

I.e search for songs descriptions that match on "sad AND jazz AND 1920"
and then order those by number of times played. Example usage can be
seen in the fts tests.

---------

Co-authored-by: Nat Roth <natroth@Nats-MacBook-Pro.local>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-04-05 16:33:36 -07:00
Weston Pace
0fe0976a0e docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131) 2024-04-05 16:33:05 -07:00
vincent d warmerdam
85a9ef472f Unhide Pydantic guides in Docs (#1122)
@wjones127 after fixing https://github.com/lancedb/lancedb/issues/1112 I
noticed something else on the docs. There's an odd chunk of the docs
missing
[here](https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe).
I can see the heading, but after clicking it the contents don't show.

![CleanShot 2024-03-15 at 23 40
17@2x](https://github.com/lancedb/lancedb/assets/1019791/04784b19-0200-4c3f-ae17-7a8f871ef9bd)

Apon inspection it was a markdown issue, one tab too many on a whole
segment.

This PR fixes it. It looks like this now and the sections appear again:

![CleanShot 2024-03-15 at 23 42
32@2x](https://github.com/lancedb/lancedb/assets/1019791/c5aaec4c-1c37-474d-9fb0-641f4cf52626)
2024-04-05 16:32:47 -07:00
vincent d warmerdam
b9afc01cfd Explain vonoroi seed initalisation (#1114)
This PR fixes https://github.com/lancedb/lancedb/issues/1112. It turned
out that K-means is currently used internally, so I figured adding that
context to the docs would be nice.
2024-04-05 16:32:31 -07:00
Raghav Dixit
765569425c doc updates (#1085)
closes #1084
2024-04-05 16:32:15 -07:00
Ivan Leo
89ce417452 Update default_embedding_functions.md (#1073)
Added a small bit of documentation for the `dim` feature which is
provided by the new `text-embedding-3` model series that allows users to
shorten an embedding.

Happy to discuss a bit on the phrasing but I struggled quite a bit with
getting it to work so wanted to help others who might want to use the
newer model too
2024-04-05 16:31:53 -07:00
Chang She
10089481c0 doc(python): document the method in fts (#982)
Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:31:45 -07:00
Chang She
a7dbe933dc chore(python): use pypi tantivy to speed up CI (#987) 2024-04-05 16:31:36 -07:00
Prashanth Rao
f9c244e608 [docs]: Fix issues with Rust code snippets in "quick start" (#1047)
The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
2024-04-05 16:31:36 -07:00
Louis Guitton
7f9ef0d329 Fix default_embedding_functions.md (#1043)
typo and broken table
2024-04-05 16:31:36 -07:00
Chang She
a3761f4209 doc: fix langchain link (#1053) 2024-04-05 16:31:36 -07:00
Will Jones
c5b0934bfb feat(node): add read_consistency_interval to Node and Rust (#1002)
This PR adds the same consistency semantics as was added in #828. It
*does not* add the same lazy-loading of tables, since that breaks some
existing tests.

This closes #998.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:30:40 -07:00
Ayush Chaurasia
538d0320f7 Docs: add meta tags (#1006) 2024-04-05 16:30:40 -07:00
Johannes Kolbe
32bfb68ac3 apply fixes for notebook (#989) 2024-04-05 16:30:40 -07:00
Ayush Chaurasia
bc871169f0 docs: Add meta tag for image preview (#988)
I think this should work. Need to deploy it to be sure as it can be
tested locally. Can be tested here.

2 things about this solution:
* All pages have a same meta tag, i.e, lancedb banner
* If needed, we can automatically use the first image of each page and
generate meta tags using the ultralytics mkdocs plugin that we did for
this purpose - https://github.com/ultralytics/mkdocs
2024-04-05 16:30:40 -07:00
Chang She
3fc835e124 doc: update navigation links for embedding functions (#986) 2024-04-05 16:30:40 -07:00
Chang She
484a121866 doc: improve embedding functions documentation (#983)
Got some user feedback that the `implicit` / `explicit` distinction is
confusing.
Instead I was thinking we would just deprecate the `with_embeddings` API
and then organize working with embeddings into 3 buckets:

1. manually generate embeddings
2. use a provided embedding function
3. define your own custom embedding function
2024-04-05 16:30:40 -07:00
Will Jones
f84a4855ca docs: show DuckDB with dataset, not table (#974)
Using datasets is preferred way to allow filter and projection pushdown,
as well as aggregated larger-than-memory tables.
2024-04-05 16:30:40 -07:00
Ayush Chaurasia
aecafa6479 docs: Minimal reranking evaluation benchmarks (#977) 2024-04-05 16:30:40 -07:00
Prashanth Rao
b014c24e66 [docs]: Fix typos and clarity in hybrid search docs (#966)
- Fixed typos and added some clarity to the hybrid search docs
- Changed "Airbnb" case to be as per the [official company
name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be
capitalized", and the text in the document aligns with this
- Fixed headers in nav bar
2024-04-05 16:30:30 -07:00
Ayush Chaurasia
f78fe721db docs: Add setup cell for colab example (#965) 2024-04-05 16:30:30 -07:00
Ayush Chaurasia
510e8378bc feat(python): hybrid search updates, examples, & latency benchmarks (#964)
- Rename safe_import -> attempt_import_or_raise (closes
https://github.com/lancedb/lancedb/pull/923)
- Update docs
- Add Notebook example (@changhiskhan you can use it for the talk. Comes
with "open in colab" button)
- Latency benchmark & results comparison, sanity check on real-world
data
- Updates the default openai model to gpt-4
2024-04-05 16:30:30 -07:00
Nitish Sharma
2c3f982f4f Minor updates to FAQ (#935)
Based on discussion over discord, adding minor updates to the FAQ
section about benchmarks, practical data size and concurrency in LanceDB
2024-04-05 16:29:58 -07:00
Ayush Chaurasia
d07817a562 feat(python): Reranker DX improvements (#904)
- Most users might not know how to use `QueryBuilder` object. Instead we
should just pass the string query.
- Add new rerankers: Colbert, openai
2024-04-05 16:29:58 -07:00
QianZhu
b2efd0da53 fix hybrid search example (#922) 2024-04-05 16:29:13 -07:00
QianZhu
2e75b16403 make it explicit about the vector column data type (#916)
<img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964">
[
<img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07">
](url)

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:29:05 -07:00
Weston Pace
4eb819072a feat: upgrade to lance 0.9.11 and expose merge_insert (#906)
This adds the python bindings requested in #870 The javascript/rust
bindings will be added in a future PR.
2024-04-05 16:29:05 -07:00
QianZhu
1f2eafca75 arrow table/f16 example (#907) 2024-04-05 16:29:05 -07:00
Will Jones
d5be6c7a05 docs: provide AWS S3 cleanup and permissions advice (#903)
Adding some more quick advice for how to setup AWS S3 with LanceDB.

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:28:56 -07:00
Ayush Chaurasia
a41f7be88d feat(python): Hybrid search & Reranker API (#824)
based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)


### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`

```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker

# Same as default 
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=LinearCombinationReranker()
                                     ).to_pandas()

# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=reranker
                                     ).to_pandas()
```

Cohere Reranker
```
from lancedb.rerankers import CohereReranker

# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank",  # score or rank
                                      reranker=CohereReranker()
                                     ).to_pandas()

```

CrossEncoderReranker

```
from lancedb.rerankers import CrossEncoderReranker

table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank", 
                                      reranker=CrossEncoderReranker()
                                     ).to_pandas()

```

## Using custom Reranker
```
from lancedb.reranker import Reranker

class CustomReranker(Reranker):
    def rerank_hybrid(self, vector_result, fts_result):
           combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
           # Custom rerank logic here
           
           return combined_res
```

- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:28:56 -07:00
Prashanth Rao
ecbbe185c7 Fix image bgcolor (#891)
Minor fix to change the background color for an image in the docs. It's
now readable in both light and dark modes (earlier version made it
impossible to read in dark mode).
2024-04-05 16:28:56 -07:00
Ayush Chaurasia
b326bf2ef6 doc: Add documentation chatbot for LanceDB (#890)
<img width="1258" alt="Screenshot 2024-01-29 at 10 05 52 PM"
src="https://github.com/lancedb/lancedb/assets/15766192/7c108fde-e993-415c-ad01-72010fd5fe31">
2024-04-05 16:28:56 -07:00
Raghav Dixit
472344fcb3 feat(python): Embedding fn support for gte-mlx/gte-large (#873)
have added testing and an example in the docstring, will be pushing a
separate PR in recipe repo for rag example

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-04-05 16:28:56 -07:00
Lei Xu
911d063237 doc: fix js example of create index (#886) 2024-04-05 16:28:56 -07:00
Lei Xu
12e776821a doc: use snippet for rust code example and make sure rust examples run through CI (#885) 2024-04-05 16:28:56 -07:00
Chang She
1d0578ce25 doc(rust): minor fixes for Rust quick start. (#878) 2024-04-05 16:28:56 -07:00