Commit Graph

974 Commits

Author SHA1 Message Date
Prashanth Rao
b014c24e66 [docs]: Fix typos and clarity in hybrid search docs (#966)
- Fixed typos and added some clarity to the hybrid search docs
- Changed "Airbnb" case to be as per the [official company
name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be
capitalized", and the text in the document aligns with this
- Fixed headers in nav bar
2024-04-05 16:30:30 -07:00
Will Jones
68115f1369 fix: wrap in BigInt to avoid upstream bug (#962)
Closes #960
2024-04-05 16:30:30 -07:00
Ayush Chaurasia
f78fe721db docs: Add setup cell for colab example (#965) 2024-04-05 16:30:30 -07:00
Ayush Chaurasia
510e8378bc feat(python): hybrid search updates, examples, & latency benchmarks (#964)
- Rename safe_import -> attempt_import_or_raise (closes
https://github.com/lancedb/lancedb/pull/923)
- Update docs
- Add Notebook example (@changhiskhan you can use it for the talk. Comes
with "open in colab" button)
- Latency benchmark & results comparison, sanity check on real-world
data
- Updates the default openai model to gpt-4
2024-04-05 16:30:30 -07:00
Will Jones
1045af6c09 chore: fix clippy lints (#963) 2024-04-05 16:30:30 -07:00
QianZhu
7afcfca10d Qian/make vector col optional (#950)
remote SDK tests were completed through lancedb_integtest
2024-04-05 16:30:29 -07:00
Will Jones
88205aba64 fix(node): statically link lzma (#961)
Fixes #956

Same changes as https://github.com/lancedb/lance/pull/1934
2024-04-05 16:30:10 -07:00
Weston Pace
da47938a43 chore: use a bigger runner for NPM publish jobs on aarch64 to avoid OOM (#955) 2024-04-05 16:30:06 -07:00
Lance Release
03e705c14c Bump version: 0.4.8 → 0.4.9 2024-04-05 16:29:58 -07:00
Lance Release
a7e60a4c3f [python] Bump version: 0.5.3 → 0.5.4 2024-04-05 16:29:58 -07:00
Weston Pace
e12bdc78bb chore: bump lance version to 0.9.15 (#949) 2024-04-05 16:29:58 -07:00
Weston Pace
41ccb48160 feat: add support for filter during merge insert when matched (#948)
Closes #940
2024-04-05 16:29:58 -07:00
QianZhu
069ad267bd added error msg to SaaS APIs (#852)
1. improved error msg for SaaS create_table and create_index

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-04-05 16:29:58 -07:00
Weston Pace
138fc3f66b feat: add a filterable count_rows to all the lancedb APIs (#913)
A `count_rows` method that takes a filter was recently added to
`LanceTable`. This PR adds it everywhere else except `RemoteTable` (that
will come soon).
2024-04-05 16:29:58 -07:00
Nitish Sharma
2c3f982f4f Minor updates to FAQ (#935)
Based on discussion over discord, adding minor updates to the FAQ
section about benchmarks, practical data size and concurrency in LanceDB
2024-04-05 16:29:58 -07:00
Ayush Chaurasia
d07817a562 feat(python): Reranker DX improvements (#904)
- Most users might not know how to use `QueryBuilder` object. Instead we
should just pass the string query.
- Add new rerankers: Colbert, openai
2024-04-05 16:29:58 -07:00
Will Jones
39cc2fd62b feat(python): add read_consistency_interval argument (#828)
This PR refactors how we handle read consistency: does the `LanceTable`
class always pick up modifications to the table made by other instance
or processes. Users have three options they can set at the connection
level:

1. (Default) `read_consistency_interval=None` means it will not check at
all. Users can call `table.checkout_latest()` to manually check for
updates.
2. `read_consistency_interval=timedelta(0)` means **always** check for
updates, giving strong read consistency.
3. `read_consistency_interval=timedelta(seconds=20)` means check for
updates every 20 seconds. This is eventual consistency, a compromise
between the two options above.

There is now an explicit difference between a `LanceTable` that tracks
the current version and one that is fixed at a historical version. We
now enforce that users cannot write if they have checked out an old
version. They are instructed to call `checkout_latest()` before calling
the write methods.

Since `conn.open_table()` doesn't have a parameter for version, users
will only get fixed references if they call `table.checkout()`.

The difference between these two can be seen in the repr: Table that are
fixed at a particular version will have a `version` displayed in the
repr. Otherwise, the version will not be shown.

```python
>>> table
LanceTable(connection=..., name="my_table")
>>> table.checkout(1)
>>> table
LanceTable(connection=..., name="my_table", version=1)
```

I decided to not create different classes for these states, because I
think we already have enough complexity with the Cloud vs OSS table
references.

Based on #812
2024-04-05 16:29:57 -07:00
Ayush Chaurasia
0f00cd0097 feat(python): add support new openai embedding functions (#912)
@PrashantDixit0

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-04-05 16:29:13 -07:00
Lei Xu
84edf56995 chore: add global cargo config to enable minimal cpu target (#925)
* Closes #895 
* Fix cargo clippy
2024-04-05 16:29:13 -07:00
QianZhu
b2efd0da53 fix hybrid search example (#922) 2024-04-05 16:29:13 -07:00
Lance Release
c101e9deed [python] Bump version: 0.5.2 → 0.5.3 2024-04-05 16:29:13 -07:00
Ayush Chaurasia
a24e16f753 fix: revert safe_import_pandas usage (#921) 2024-04-05 16:29:13 -07:00
Lance Release
eb1f02919a Bump version: 0.4.7 → 0.4.8 2024-04-05 16:29:05 -07:00
Lance Release
c8f92c2987 [python] Bump version: 0.5.1 → 0.5.2 2024-04-05 16:29:05 -07:00
Weston Pace
9d115bd507 chore: bump pylance version to latest in pyproject.toml (#918) 2024-04-05 16:29:05 -07:00
Weston Pace
18f7bad3dd feat: add merge_insert to the node and rust APIs (#915) 2024-04-05 16:29:05 -07:00
QianZhu
2e75b16403 make it explicit about the vector column data type (#916)
<img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964">
[
<img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07">
](url)

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:29:05 -07:00
Bert
3c544582f6 fix: add request retry to python client (#917)
Adds capability to the remote python SDK to retry requests (fixes #911)

This can be configured through environment:
- `LANCE_CLIENT_MAX_RETRIES`= total number of retries. Set to 0 to
disable retries. default = 3
- `LANCE_CLIENT_CONNECT_RETRIES` = number of times to retry request in
case of TCP connect failure. default = 3
- `LANCE_CLIENT_READ_RETRIES` = number of times to retry request in case
of HTTP request failure. default = 3
- `LANCE_CLIENT_RETRY_STATUSES` = http statuses for which the request
will be retried. passed as comma separated list of ints. default `500,
502, 503`
- `LANCE_CLIENT_RETRY_BACKOFF_FACTOR` = controls time between retry
requests. see
[here](23f2287eb5/src/urllib3/util/retry.py (L141-L146)).
default = 0.25

Only read requests will be retried:
- list table names
- query
- describe table
- list table indices

This does not add retry capabilities for writes as it could possibly
cause issues in the case where the retried write isn't idempotent. For
example, in the case where the LB times-out the request but the server
completes the request anyway, we might not want to blindly retry an
insert request.
2024-04-05 16:29:05 -07:00
Weston Pace
f602e07f99 docs: add cleanup_old_versions and compact_files to Table for documentation purposes (#900)
Closes #819
2024-04-05 16:29:05 -07:00
Weston Pace
4eb819072a feat: upgrade to lance 0.9.11 and expose merge_insert (#906)
This adds the python bindings requested in #870 The javascript/rust
bindings will be added in a future PR.
2024-04-05 16:29:05 -07:00
Lei Xu
bd2d187538 ci: bump to new version of python action to use node 20 gIthub action runtime (#909)
Github action is deprecating old node-16 runtime.
2024-04-05 16:29:05 -07:00
JacobLinCool
f308a0ffdb fix the repo link on npm, add links for homepage and bug report (#910)
- fix the repo link on npm
- add links for homepage and bug report
2024-04-05 16:29:05 -07:00
QianZhu
1f2eafca75 arrow table/f16 example (#907) 2024-04-05 16:29:05 -07:00
Lance Release
567c5f6d01 Bump version: 0.4.6 → 0.4.7 2024-04-05 16:28:56 -07:00
Lei Xu
8e139012e2 fix(node): pass AWS credentials to db level operations (#908)
Passed the following tests

```ts
const keyId = process.env.AWS_ACCESS_KEY_ID;
const secretKey = process.env.AWS_SECRET_ACCESS_KEY;
const sessionToken = process.env.AWS_SESSION_TOKEN;
const region = process.env.AWS_REGION;

const db = await lancedb.connect({
  uri: "s3://bucket/path",
  awsCredentials: {
    accessKeyId: keyId,
    secretKey: secretKey,
    sessionToken: sessionToken,
  },
  awsRegion: region,
} as lancedb.ConnectionOptions);

  console.log(await db.createTable("test", [{ vector: [1, 2, 3] }]));
  console.log(await db.tableNames());
  console.log(await db.dropTable("test"))
```
2024-04-05 16:28:56 -07:00
Will Jones
d5be6c7a05 docs: provide AWS S3 cleanup and permissions advice (#903)
Adding some more quick advice for how to setup AWS S3 with LanceDB.

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:28:56 -07:00
Abraham Lopez
5a12224a02 chore: update JS/TS example in README (#898)
- The JS/TS library actually expects named parameters via an object in
`.createTable()` rather than individual arguments
- Added example on how to search rows by criteria without a vector
search. TS type of `.search()` currently has the `query` parameter as
non-optional so we have to pass undefined for now.
2024-04-05 16:28:56 -07:00
Lei Xu
a617ad35ff ci: change apple silicon runner to free OSS macos-14 target (#901) 2024-04-05 16:28:56 -07:00
Raghav Dixit
61bf688e5b chore(python): GTE embedding function model name update (#902)
Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-04-05 16:28:56 -07:00
Ayush Chaurasia
a41f7be88d feat(python): Hybrid search & Reranker API (#824)
based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)


### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`

```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker

# Same as default 
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=LinearCombinationReranker()
                                     ).to_pandas()

# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=reranker
                                     ).to_pandas()
```

Cohere Reranker
```
from lancedb.rerankers import CohereReranker

# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank",  # score or rank
                                      reranker=CohereReranker()
                                     ).to_pandas()

```

CrossEncoderReranker

```
from lancedb.rerankers import CrossEncoderReranker

table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank", 
                                      reranker=CrossEncoderReranker()
                                     ).to_pandas()

```

## Using custom Reranker
```
from lancedb.reranker import Reranker

class CustomReranker(Reranker):
    def rerank_hybrid(self, vector_result, fts_result):
           combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
           # Custom rerank logic here
           
           return combined_res
```

- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:28:56 -07:00
Prashanth Rao
ecbbe185c7 Fix image bgcolor (#891)
Minor fix to change the background color for an image in the docs. It's
now readable in both light and dark modes (earlier version made it
impossible to read in dark mode).
2024-04-05 16:28:56 -07:00
Ayush Chaurasia
b326bf2ef6 doc: Add documentation chatbot for LanceDB (#890)
<img width="1258" alt="Screenshot 2024-01-29 at 10 05 52 PM"
src="https://github.com/lancedb/lancedb/assets/15766192/7c108fde-e993-415c-ad01-72010fd5fe31">
2024-04-05 16:28:56 -07:00
Raghav Dixit
472344fcb3 feat(python): Embedding fn support for gte-mlx/gte-large (#873)
have added testing and an example in the docstring, will be pushing a
separate PR in recipe repo for rag example

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-04-05 16:28:56 -07:00
Ayush Chaurasia
bca80939c2 chore(python): Temporarily extend remote connection timeout (#888)
Context - https://etoai.slack.com/archives/C05NC5YSW5V/p1706371205883149
2024-04-05 16:28:56 -07:00
Lei Xu
911d063237 doc: fix js example of create index (#886) 2024-04-05 16:28:56 -07:00
Lei Xu
12e776821a doc: use snippet for rust code example and make sure rust examples run through CI (#885) 2024-04-05 16:28:56 -07:00
Lei Xu
c6e5eb0398 fix: fix doc build to include the source snippet correctly (#883) 2024-04-05 16:28:56 -07:00
Chang She
1d0578ce25 doc(rust): minor fixes for Rust quick start. (#878) 2024-04-05 16:28:56 -07:00
Lei Xu
e7fdb931de chore: convert all js doc test to use snippet. (#881) 2024-04-05 16:28:56 -07:00
Lei Xu
d811b89de2 doc: use code snippet for typescript examples (#880)
The typescript code is in a fully function file, that will be run via the CI.
2024-04-05 16:28:56 -07:00