Commit Graph

17 Commits

Author SHA1 Message Date
Ayush Chaurasia
eb31d95fef feat(python): hybrid search updates, examples, & latency benchmarks (#964)
- Rename safe_import -> attempt_import_or_raise (closes
https://github.com/lancedb/lancedb/pull/923)
- Update docs
- Add Notebook example (@changhiskhan you can use it for the talk. Comes
with "open in colab" button)
- Latency benchmark & results comparison, sanity check on real-world
data
- Updates the default openai model to gpt-4
2024-02-13 17:58:39 +05:30
Ayush Chaurasia
3ffed89793 feat(python): Hybrid search & Reranker API (#824)
based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)


### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`

```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker

# Same as default 
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=LinearCombinationReranker()
                                     ).to_pandas()

# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=reranker
                                     ).to_pandas()
```

Cohere Reranker
```
from lancedb.rerankers import CohereReranker

# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank",  # score or rank
                                      reranker=CohereReranker()
                                     ).to_pandas()

```

CrossEncoderReranker

```
from lancedb.rerankers import CrossEncoderReranker

table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank", 
                                      reranker=CrossEncoderReranker()
                                     ).to_pandas()

```

## Using custom Reranker
```
from lancedb.reranker import Reranker

class CustomReranker(Reranker):
    def rerank_hybrid(self, vector_result, fts_result):
           combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
           # Custom rerank logic here
           
           return combined_res
```

- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-01-30 19:10:33 +05:30
Raghav Dixit
d1a7257810 feat(python): Embedding fn support for gte-mlx/gte-large (#873)
have added testing and an example in the docstring, will be pushing a
separate PR in recipe repo for rag example

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-01-30 11:21:57 +05:30
Lei Xu
22b9eceb12 chore: convert all js doc test to use snippet. (#881) 2024-01-28 11:39:25 -08:00
Lei Xu
5f62302614 doc: use code snippet for typescript examples (#880)
The typescript code is in a fully function file, that will be run via the CI.
2024-01-27 22:52:37 -08:00
Prashanth Rao
119b928a52 docs: Updates and refactor (#683)
This PR makes incremental changes to the documentation.

* Closes #697 
* Closes #698

## Chores
- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

## Restructure/new content 
- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-01-19 00:18:37 +05:30
Lei Xu
6fb539b5bf doc: add doc to use GPU for indexing (#611) 2023-10-30 15:25:00 -07:00
Ayush Chaurasia
7dfb555fea [DOCS][PYTHON] Update embeddings API docs & Example (#516)
This PR adds an overview of embeddings docs:
- 2 ways to vectorize your data using lancedb - explicit & implicit
- explicit - manually vectorize your data using `wit_embedding` function
- Implicit - automatically vectorize your data as it comes by ingesting
your embedding function details as table metadata
- Multi-modal example w/ disappearing embedding function
2023-10-14 07:56:07 +05:30
Lei Xu
a26c8f3316 feat: use GPU for index creation. (#540)
Bump lance to 0.8.3 to include GPU training

---------

Co-authored-by: Rob Meng <rob.xu.meng@gmail.com>
2023-10-05 20:49:00 -07:00
Ayush Chaurasia
52fa7f5577 [Docs] Small typo fixes (#460) 2023-09-02 22:17:19 +05:30
Tevin Wang
b8f32d082f Clean up docs testing - exclude by glob instead of by file (#450) 2023-08-24 07:30:37 +05:30
Tevin Wang
8f7264f81d [Documentation Code Testing] temp fix for nodejs docs test hang (#404) 2023-08-06 13:13:35 -07:00
Ayush Chaurasia
74ef141b9c [Docs] add Tables guide (#381)
* Rename "Reference" -> "Guides" to create distinction b/w api reference
and user facing docs
* Add all the various ways to create, add and delete from table

Related - https://github.com/lancedb/lancedb/pull/391
2023-08-06 12:34:08 +05:30
Ayush Chaurasia
15f4787cc8 [Docs]: Add badges, CTA and updates examples (#358)
<img width="1054" alt="Screenshot 2023-07-24 at 6 13 00 PM"
src="https://github.com/lancedb/lancedb/assets/15766192/a263a17e-66d0-4591-adc7-b520aa5b23f6">
Is this a problem? Are we using metadata to track usage or something?
2023-07-26 16:35:46 +05:30
Tevin Wang
ad18826579 [Documentation Code Testing] build node sdk in release (#307) 2023-07-14 12:46:48 -07:00
Tevin Wang
39479dcf8e fix sha error in npm (#236)
Currently getting a `npm ERR! code EINTEGRITY` on merge, need to fix
asap.


https://stackoverflow.com/questions/75905223/github-action-npm-install-give-code-eintegrity-integrity-checksum-failed
2023-06-29 09:31:23 -07:00
Tevin Wang
b731a6aed9 Add docs code testing & documentation syntax changes (#196)
- Creates testing files `md_testing.py` and `md_testing.js` for testing
python and nodejs code in markdown files in the documentation
This listens for HTML tags as well: `<!--[language] code code
code...-->` will create a set-up file to create some mock tables or to
fulfill some assumptions in the documentation.
- Creates a github action workflow that triggers every push/pr to
`docs/**`
- Modifies documentation so tests run (mostly indentation, some small
syntax errors and some missing imports)

A list of excluded files that we need to take a closer look at later on:
```javascript
const excludedFiles = [
  "../src/fts.md",
  "../src/embedding.md",
  "../src/examples/serverless_lancedb_with_s3_and_lambda.md",
  "../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
  "../src/examples/youtube_transcript_bot_with_nodejs.md",
];
```
Many of them can't be done because we need the OpenAI API key :(.
`fts.md` has some issues with the library, I believe this is still
experimental?

Closes #170

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2023-06-28 11:07:26 -07:00