This exposes the `LANCEDB_LOG` environment variable in node, so that
users can now turn on logging.
In addition, fixes a bug where only the top-level error from Rust was
being shown. This PR makes sure the full error chain is included in the
error message. In the future, will improve this so the error chain is
set on the [cause](https://nodejs.org/api/errors.html#errorcause)
property of JS errors https://github.com/lancedb/lancedb/issues/1779Fixes#1774
BREAKING CHANGE: default tokenizer no longer does stemming or stop-word
removal. Users should explicitly turn that option on in the future.
- upgrade lance to 0.19.1
- update the FTS docs
- update the FTS API
Upstream change notes:
https://github.com/lancedb/lance/releases/tag/v0.19.1
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
- Tried to address some onboarding feedbacks listed in
https://github.com/lancedb/lancedb/issues/1224
- Improve visibility of pydantic integration and embedding API. (Based
on onboarding feedback - Many ways of ingesting data, defining schema
but not sure what to use in a specific use-case)
- Add a guide that takes users through testing and improving retriever
performance using built-in utilities like hybrid-search and reranking
- Add some benchmarks for the above
- Add missing cohere docs
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
We aren't yet ready to switch over the examples since almost all JS
examples rely on embeddings and we haven't yet ported those over.
However, this makes it possible for those that are interested to start
using `@lancedb/lancedb`
I think this should work. Need to deploy it to be sure as it can be
tested locally. Can be tested here.
2 things about this solution:
* All pages have a same meta tag, i.e, lancedb banner
* If needed, we can automatically use the first image of each page and
generate meta tags using the ultralytics mkdocs plugin that we did for
this purpose - https://github.com/ultralytics/mkdocs
- Fixed typos and added some clarity to the hybrid search docs
- Changed "Airbnb" case to be as per the [official company
name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be
capitalized", and the text in the document aligns with this
- Fixed headers in nav bar
- Rename safe_import -> attempt_import_or_raise (closes
https://github.com/lancedb/lancedb/pull/923)
- Update docs
- Add Notebook example (@changhiskhan you can use it for the talk. Comes
with "open in colab" button)
- Latency benchmark & results comparison, sanity check on real-world
data
- Updates the default openai model to gpt-4
based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)
### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`
```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker
# Same as default
table.search("hello", query_type="hybrid").rerank(
normalize="score",
reranker=LinearCombinationReranker()
).to_pandas()
# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
normalize="score",
reranker=reranker
).to_pandas()
```
Cohere Reranker
```
from lancedb.rerankers import CohereReranker
# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
normalize="rank", # score or rank
reranker=CohereReranker()
).to_pandas()
```
CrossEncoderReranker
```
from lancedb.rerankers import CrossEncoderReranker
table.search("hello", query_type="hybrid").rerank(
normalize="rank",
reranker=CrossEncoderReranker()
).to_pandas()
```
## Using custom Reranker
```
from lancedb.reranker import Reranker
class CustomReranker(Reranker):
def rerank_hybrid(self, vector_result, fts_result):
combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
# Custom rerank logic here
return combined_res
```
- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion
---------
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>