Commit Graph

258 Commits

Author SHA1 Message Date
BubbleCal
2bde5401eb feat: support to build FTS without positions (#1621) 2024-09-10 22:51:32 +08:00
Jon X
7eb3b52297 docs: added a blank line between a paragraph and a list block (#1604)
Though the markdown can be rendered well on GitHub (GFM style?), but it
seems that it's required to insert a blank line between a paragraph and
a list block to make it render well with `mkdocs`?

see also the web page:
https://lancedb.github.io/lancedb/concepts/index_hnsw/
2024-09-06 09:38:19 +05:30
Philip Zeyliger
1d61717d0e docs: fix get_registry() usage (#1601)
Docs used `get_registry.get(...)` whereas what works is
`get_registry().get(...)`. Fixing the two instances I found. I tested
the open clip version by trying it locally in a Jupyter notebook.
2024-09-06 01:48:24 +05:30
Rithik Kumar
2bc7dca3ca docs: add changes to Embeddings-> Available models-> overview page (#1596)
adding features and improvements to - Manage Embeddings page

Before:
![Screenshot 2024-09-04
223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0)

After:



![Screenshot 2024-09-05
214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914)

![Screenshot 2024-09-05
213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de)

![Screenshot 2024-09-05
215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261)
2024-09-05 22:19:08 +05:30
Jon X
2b8e872be0 docs: removed the unnecessary fence code tag (#1599) 2024-09-05 14:40:38 +05:30
Ayush Chaurasia
03ef1dc081 feat: update default reranker to RRF (#1580)
- Both LinearCombination (the current default) and RRF are pretty fast
compared to model based rerankers. RRF is slightly faster.
- In our tests RRF has also been slightly more accurate.

This PR:
- Makes RRF the default reranker
- Removed duplicate docs for rerankers
2024-09-03 14:00:13 +05:30
Rithik Kumar
fde636ca2e docs: fix links - quick start to embedding (#1591) 2024-09-02 21:55:35 +05:30
Ayush Chaurasia
51966a84f5 docs: add multi-vector reranking, answerdotai and studies section (#1579) 2024-08-31 04:09:14 +05:30
Rithik Kumar
38015ffa7c docs: improve overall language on all example pages (#1582)
Refine and improve the language clarity and quality across all example
pages in the documentation to ensure better understanding and
readability.

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-31 03:48:11 +05:30
Ayush Chaurasia
dc72ece847 feat!: better api for manual hybrid queries (#1575)
Currently, the only documented way of performing hybrid search is by
using embedding API and passing string queries that get automatically
embedded. There are use cases where users might like to pass vectors and
text manually instead.
This ticket contains more information and historical context -
https://github.com/lancedb/lancedb/issues/937

This breaks a undocumented pathway that allowed passing (vector, text)
tuple queries which was intended to be temporary, so this is marked as a
breaking change. For all practical purposes, this should not really
impact most users

### usage
```
results = table.search(query_type="hybrid")
                .vector(vector_query)
                .text(text_query)
                .limit(5)
                .to_pandas()
```
2024-08-30 17:37:58 +05:30
Ayush Chaurasia
bfe8fccfab docs: add hnsw docs (#1570) 2024-08-29 15:16:27 +05:30
Rithik Kumar
6f6eb170a9 docs: revamp Python example: Overview page and remove redundant examples and notebooks (#1574)
before:
![Screenshot 2024-08-29
131656](https://github.com/user-attachments/assets/81cb5d70-5dff-4e57-8bbe-3461327aed7d)

After:
![Screenshot 2024-08-29
131715](https://github.com/user-attachments/assets/62109a37-7f66-4fd4-90ed-906a85472117)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-29 13:48:10 +05:30
Rithik Kumar
ae85008714 docs: revamp embedding models (#1568)
before:
![Screenshot 2024-08-27
151525](https://github.com/user-attachments/assets/d4f8f2b9-37e6-4a31-b144-01b804019e11)

After:
![Screenshot 2024-08-27
151550](https://github.com/user-attachments/assets/79fe7d27-8f14-4d80-9b41-a1e91f8c708f)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-27 17:14:35 +05:30
Bill Chambers
9c25998110 docs: update serverless_lancedb_with_s3_and_lambda.md (#1559) 2024-08-26 14:55:28 +05:30
Ayush Chaurasia
549ca51a8a feat: add answerdotai rerankers support and minor improvements (#1560)
This PR:
- Adds missing license headers
- Integrates with answerdotai Rerankers package
- Updates ColbertReranker to subclass answerdotai package. This is done
to keep backwards compatibility as some users might be used to importing
ColbertReranker directly
- Set `trust_remote_code` to ` True` by default in CrossEncoder and
sentence-transformer based rerankers
2024-08-26 13:25:10 +05:30
Rithik Kumar
632007d0e2 docs: add recommender system example (#1561)
before:
![Screenshot 2024-08-24
230216](https://github.com/user-attachments/assets/cc8a810a-b032-45d7-b086-b2ef0720dc16)

After:
![Screenshot 2024-08-24
230228](https://github.com/user-attachments/assets/eaa1dc31-ac7f-4b81-aa79-b4cf94f0cbd5)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-25 12:30:30 +05:30
rahuljo
6ad5553eca docs: add dlt-lancedb integration page (#1551)
Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com>
2024-08-22 15:18:49 +05:30
Rithik Kumar
758c82858f docs: add AI agent example (#1553)
before:
![Screenshot 2024-08-21
225014](https://github.com/user-attachments/assets/e5b05586-87c5-4739-a4df-2d6cd0704ba5)

After:
![Screenshot 2024-08-21
225029](https://github.com/user-attachments/assets/504959db-f560-49b2-9492-557e9846a793)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-22 00:54:05 +05:30
Rithik Kumar
0cbc9cd551 docs: add evaluation example (#1552)
before:
![Screenshot 2024-08-21
194228](https://github.com/user-attachments/assets/68d96658-7579-4934-85af-e8c898b64660)

After:
![Screenshot 2024-08-21
195258](https://github.com/user-attachments/assets/81ddb9cd-cb93-47fc-a121-ff82701fd11f)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-21 20:37:04 +05:30
Rithik Kumar
21014cab45 docs: add chatbot example and improve quality of other examples (#1544) 2024-08-17 12:35:33 +05:30
Lei Xu
5857cb4c6e docs: add a section to describe scalar index (#1495) 2024-08-16 18:48:29 -07:00
Rithik Kumar
09ce6c5bb5 docs: add vector search example (#1543) 2024-08-16 21:30:45 +05:30
Lei Xu
b2317c904d feat: create bitmap and label list scalar index using python async api (#1529)
* Expose `bitmap` and `LabelList` scalar index type via Rust and Async
Python API
* Add documents
2024-08-11 09:16:11 -07:00
Ayush Chaurasia
a88e9bb134 docs: add lancedb embedding fcn on cloud docs (#1521) 2024-08-09 07:21:04 +05:30
BubbleCal
f9d5fa88a1 feat!: migrate FTS from tantivy to lance-index (#1483)
Lance now supports FTS, so add it into lancedb Python, TypeScript and
Rust SDKs.

For Python, we still use tantivy based FTS by default because the lance
FTS index now misses some features of tantivy.

For Python:
- Support to create lance based FTS index
- Support to specify columns for full text search (only available for
lance based FTS index)

For TypeScript:
- Change the search method so that it can accept both string and vector
- Support full text search

For Rust
- Support full text search

The others:
- Update the FTS doc

BREAKING CHANGE: 
- for Python, this renames the attached score column of FTS from "score"
to "_score", this could be a breaking change for users that rely the
scores

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-08-08 15:33:15 +08:00
Rithik Kumar
a62f661d90 docs: revamp example docs (#1512)
Before: 
![Screenshot 2024-08-07
015834](https://github.com/user-attachments/assets/b817f846-78b3-4d6f-b4a0-dfa3f4d6be87)

After:
![Screenshot 2024-08-07
015852](https://github.com/user-attachments/assets/53370301-8c40-45f8-abe3-32f9d051597e)
![Screenshot 2024-08-07
015934](https://github.com/user-attachments/assets/63cdd038-32bb-4b3e-b9c4-1389d2754014)
![Screenshot 2024-08-07
015941](https://github.com/user-attachments/assets/70388680-9c2b-49ef-ba00-2bb015988214)
![Screenshot 2024-08-07
015949](https://github.com/user-attachments/assets/76335a33-bb6f-473c-896f-447320abcc25)

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-08-07 03:56:59 +05:30
Robby
8d2ff7b210 feat(python): add watsonx embeddings to registry (#1486)
Related issue: https://github.com/lancedb/lancedb/issues/1412

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
2024-08-06 10:58:33 +05:30
Rithik Kumar
d297da5a7e docs: update examples docs (#1488)
Testing Workflow with my first PR.
Before:
![Screenshot 2024-08-01
183326](https://github.com/user-attachments/assets/83d22101-8bbf-4b18-81e4-f740e605727a)

After:
![Screenshot 2024-08-01
183333](https://github.com/user-attachments/assets/a5e4cd2c-c524-4009-81d5-75b2b0361f83)
2024-08-01 18:54:45 +05:30
Cory Grinstead
a062a92f6b docs: custom embedding function for ts (#1479) 2024-07-30 18:19:55 -05:00
Ayush Chaurasia
513926960d docs: add rrf docs and update reranking notebook with Jina reranker results (#1474)
- RRF reranker
- Jina Reranker results

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-07-25 22:29:46 +05:30
inn-0
cc507ca766 docs: add missing whitespace before markdown table to fix rendering issue (#1471)
### Fix markdown table rendering issue

This PR adds a missing whitespace before a markdown table in the
documentation. This issue causes the table to not render properly in
mkdocs, while it does render properly in GitHub's markdown viewer.

#### Change Details:
- Added a single line of whitespace before the markdown table to ensure
proper rendering in mkdocs.

#### Note:
- I wasn't able to test this fix in the mkdocs environment, but it
should be safe as it only involves adding whitespace which won't break
anything.


---


Cohere supports following input types:

| Input Type               | Description                          |
|-------------------------|---------------------------------------|
| "`search_document`"     | Used for embeddings stored in a vector|
|                         | database for search use-cases.        |
| "`search_query`"        | Used for embeddings of search queries |
|                         | run against a vector DB               |
| "`semantic_similarity`" | Specifies the given text will be used |
|                         | for Semantic Textual Similarity (STS) |
| "`classification`"      | Used for embeddings passed through a  |
|                         | text classifier.                      |
| "`clustering`"          | Used for the embeddings run through a |
|                         | clustering algorithm                  |

Usage Example:
2024-07-24 22:26:28 +05:30
Lei Xu
c9c61eb060 docs: expose merge_insert doc for remote python SDK (#1464)
`merge_insert` API is not shown up on
[`RemoteTable`](https://lancedb.github.io/lancedb/python/saas-python/#lancedb.remote.table.RemoteTable)
today

* Also bump `ruff` version as well
2024-07-22 10:48:16 -07:00
Cory Grinstead
69295548cc docs: minor updates for js migration guides (#1451)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-07-22 10:26:49 -07:00
Cory Grinstead
2276b114c5 docs: add installation note about yarn (#1459)
I noticed that setting up a simple project with
[Yarn](https://yarnpkg.com/) failed because unlike others [npm, pnpm,
bun], yarn does not automatically resolve peer dependencies, so i added
a quick note about it in the installation guide.
2024-07-19 18:48:24 -05:00
Magnus
dc609a337d fix: added support for trust_remote_code (#1454)
Closes #1285 

Added trust_remote_code to the SentenceTransformerEmbeddings class.
Defaults to `False`
2024-07-18 19:37:52 +05:30
Cory Grinstead
7ae327242b docs: update migration.md (#1445) 2024-07-15 18:20:23 -05:00
Ayush Chaurasia
bb2e624ff0 docs: add fine tuning section in retriever guide and minor fixes (#1438) 2024-07-11 17:34:29 +05:30
Cory Grinstead
31be9212da docs(nodejs): add @lancedb/lancedb examples everywhere (#1411)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-07-10 13:29:03 -05:00
Joan Fontanals
cef24801f4 docs: add jina reranker to index (#1427)
PR to add JinaReranker documentation page to the rerankers index
2024-07-09 14:39:35 +05:30
Joan Fontanals
08d25c5a80 feat: add Jina integration in Python for Embedding and Reranker (#1424)
Integration of Jina Embeddings and Rerankers through its API
2024-07-05 01:34:43 +05:30
Raghav Dixit
a5ff623443 docs: update lntegration docs & fixed links (#1423)
1. Updated langchain docs. 
2. Minor update to llamaindex doc.
3. Added notebook examples and linked them correctly
2024-07-03 21:50:33 +05:30
Ayush Chaurasia
ccded130ed docs: add reranking example (#1416) 2024-07-01 19:42:38 +05:30
Sidharth Rajaram
48f8d1b3b7 docs: addresses typos in HF embedding example docs (#1415)
* `table.add` requires `data` parameter on the docs page regarding use
of embedding models from HF
* also changed the name of example class from `TextModel` to `Words`
since that is what is used as parameter in the `db.create_table` call
* Per
https://lancedb.github.io/lancedb/python/python/#lancedb.table.Table.add
2024-07-01 12:14:17 +05:30
Will Jones
865ed99881 feat: dynamodb commit store support (#1410)
This allows users to specify URIs like:

```
s3+ddb://my_bucket/path?ddbTableName=myCommitTable
```

and it will support concurrent writes in S3.

* [x] Add dynamodb integration tests
* [x] Add modifications to get it working in Python sync API
* [x] Added section in documentation describing how to configure.

Closes #534

---------

Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
2024-06-28 09:30:36 -07:00
Lei Xu
d6485f1215 docs: add openapi rest api page (#1413) 2024-06-27 21:32:34 -07:00
Thomas J. Fan
a866b78a31 docs: fixes polars formatting in docs (#1400)
Currently, the whole polars section is formatted as a code block:
https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe

This PR fixes the formatting.
2024-06-25 08:46:16 -07:00
josca42
0fe844034d feat: enable stemming (#1356)
Added the ability to specify tokenizer_name, when creating a full text
search index using tantivy. This enables the use of language specific
stemming.

Also updated the [guide on full text
search](https://lancedb.github.io/lancedb/fts/) with a short section on
choosing tokenizer.

Fixes #1315
2024-06-20 14:23:55 -07:00
Raghav Dixit
96914a619b docs: llama-index integration (#1347)
Updated api refrence and usage for llama index integration.
2024-06-09 23:52:18 +05:30
Ayush Chaurasia
72f339a0b3 docs: add note about embedding api not being available on cloud (#1371) 2024-06-09 03:57:23 +05:30
Ayush Chaurasia
5e30648f45 docs: fix example path (#1367) 2024-06-07 19:40:50 -07:00