Commit Graph

775 Commits

Author SHA1 Message Date
Lance Release
82936c77ef [python] Bump version: 0.5.3 → 0.5.4 python-v0.5.4 2024-02-09 22:56:45 +00:00
Weston Pace
dddcddcaf9 chore: bump lance version to 0.9.15 (#949) 2024-02-09 14:55:44 -08:00
Weston Pace
a9727eb318 feat: add support for filter during merge insert when matched (#948)
Closes #940
2024-02-09 10:26:14 -08:00
QianZhu
48d55bf952 added error msg to SaaS APIs (#852)
1. improved error msg for SaaS create_table and create_index

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-02-09 10:07:47 -08:00
Weston Pace
d2e71c8b08 feat: add a filterable count_rows to all the lancedb APIs (#913)
A `count_rows` method that takes a filter was recently added to
`LanceTable`. This PR adds it everywhere else except `RemoteTable` (that
will come soon).
2024-02-08 09:40:29 -08:00
Nitish Sharma
f53aace89c Minor updates to FAQ (#935)
Based on discussion over discord, adding minor updates to the FAQ
section about benchmarks, practical data size and concurrency in LanceDB
2024-02-07 20:49:25 -08:00
Ayush Chaurasia
d982ee934a feat(python): Reranker DX improvements (#904)
- Most users might not know how to use `QueryBuilder` object. Instead we
should just pass the string query.
- Add new rerankers: Colbert, openai
2024-02-06 13:59:31 +05:30
Will Jones
57605a2d86 feat(python): add read_consistency_interval argument (#828)
This PR refactors how we handle read consistency: does the `LanceTable`
class always pick up modifications to the table made by other instance
or processes. Users have three options they can set at the connection
level:

1. (Default) `read_consistency_interval=None` means it will not check at
all. Users can call `table.checkout_latest()` to manually check for
updates.
2. `read_consistency_interval=timedelta(0)` means **always** check for
updates, giving strong read consistency.
3. `read_consistency_interval=timedelta(seconds=20)` means check for
updates every 20 seconds. This is eventual consistency, a compromise
between the two options above.

## Table reference state

There is now an explicit difference between a `LanceTable` that tracks
the current version and one that is fixed at a historical version. We
now enforce that users cannot write if they have checked out an old
version. They are instructed to call `checkout_latest()` before calling
the write methods.

Since `conn.open_table()` doesn't have a parameter for version, users
will only get fixed references if they call `table.checkout()`.

The difference between these two can be seen in the repr: Table that are
fixed at a particular version will have a `version` displayed in the
repr. Otherwise, the version will not be shown.

```python
>>> table
LanceTable(connection=..., name="my_table")
>>> table.checkout(1)
>>> table
LanceTable(connection=..., name="my_table", version=1)
```

I decided to not create different classes for these states, because I
think we already have enough complexity with the Cloud vs OSS table
references.

Based on #812
2024-02-05 08:12:19 -08:00
Ayush Chaurasia
738511c5f2 feat(python): add support new openai embedding functions (#912)
@PrashantDixit0

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-02-04 18:19:42 -08:00
Lei Xu
0b0f42537e chore: add global cargo config to enable minimal cpu target (#925)
* Closes #895 
* Fix cargo clippy
2024-02-04 14:21:27 -08:00
QianZhu
e412194008 fix hybrid search example (#922) 2024-02-03 09:26:32 +05:30
Lance Release
a9088224c5 [python] Bump version: 0.5.2 → 0.5.3 python-v0.5.3 2024-02-03 03:04:04 +00:00
Ayush Chaurasia
688c57a0d8 fix: revert safe_import_pandas usage (#921) 2024-02-02 18:57:13 -08:00
Lance Release
12a98deded Updating package-lock.json 2024-02-02 22:37:23 +00:00
Lance Release
e4bb042918 Updating package-lock.json 2024-02-02 21:57:07 +00:00
Lance Release
04e1662681 Bump version: 0.4.7 → 0.4.8 v0.4.8 2024-02-02 21:56:57 +00:00
Lance Release
ce2242e06d [python] Bump version: 0.5.1 → 0.5.2 python-v0.5.2 2024-02-02 21:33:02 +00:00
Weston Pace
778339388a chore: bump pylance version to latest in pyproject.toml (#918) 2024-02-02 13:32:12 -08:00
Weston Pace
7f8637a0b4 feat: add merge_insert to the node and rust APIs (#915) 2024-02-02 13:16:51 -08:00
QianZhu
09cd08222d make it explicit about the vector column data type (#916)
<img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964">
[
<img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07">
](url)

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-02-02 09:02:02 -08:00
Bert
a248d7feec fix: add request retry to python client (#917)
Adds capability to the remote python SDK to retry requests (fixes #911)

This can be configured through environment:
- `LANCE_CLIENT_MAX_RETRIES`= total number of retries. Set to 0 to
disable retries. default = 3
- `LANCE_CLIENT_CONNECT_RETRIES` = number of times to retry request in
case of TCP connect failure. default = 3
- `LANCE_CLIENT_READ_RETRIES` = number of times to retry request in case
of HTTP request failure. default = 3
- `LANCE_CLIENT_RETRY_STATUSES` = http statuses for which the request
will be retried. passed as comma separated list of ints. default `500,
502, 503`
- `LANCE_CLIENT_RETRY_BACKOFF_FACTOR` = controls time between retry
requests. see
[here](23f2287eb5/src/urllib3/util/retry.py (L141-L146)).
default = 0.25

Only read requests will be retried:
- list table names
- query
- describe table
- list table indices

This does not add retry capabilities for writes as it could possibly
cause issues in the case where the retried write isn't idempotent. For
example, in the case where the LB times-out the request but the server
completes the request anyway, we might not want to blindly retry an
insert request.
2024-02-02 11:27:29 -05:00
Weston Pace
cc9473a94a docs: add cleanup_old_versions and compact_files to Table for documentation purposes (#900)
Closes #819
2024-02-01 15:06:00 -08:00
Weston Pace
d77e95a4f4 feat: upgrade to lance 0.9.11 and expose merge_insert (#906)
This adds the python bindings requested in #870 The javascript/rust
bindings will be added in a future PR.
2024-02-01 11:36:29 -08:00
Lei Xu
62f053ac92 ci: bump to new version of python action to use node 20 gIthub action runtime (#909)
Github action is deprecating old node-16 runtime.
2024-02-01 11:36:03 -08:00
JacobLinCool
34e10caad2 fix the repo link on npm, add links for homepage and bug report (#910)
- fix the repo link on npm
- add links for homepage and bug report
2024-01-31 21:07:11 -08:00
QianZhu
f5726e2d0c arrow table/f16 example (#907) 2024-01-31 14:41:28 -08:00
Lance Release
12b4fb42fc Updating package-lock.json 2024-01-31 21:18:24 +00:00
Lance Release
1328cd46f1 Updating package-lock.json 2024-01-31 20:29:38 +00:00
Lance Release
0c940ed9f8 Bump version: 0.4.6 → 0.4.7 v0.4.7 2024-01-31 20:29:28 +00:00
Lei Xu
5f59e51583 fix(node): pass AWS credentials to db level operations (#908)
Passed the following tests

```ts
const keyId = process.env.AWS_ACCESS_KEY_ID;
const secretKey = process.env.AWS_SECRET_ACCESS_KEY;
const sessionToken = process.env.AWS_SESSION_TOKEN;
const region = process.env.AWS_REGION;

const db = await lancedb.connect({
  uri: "s3://bucket/path",
  awsCredentials: {
    accessKeyId: keyId,
    secretKey: secretKey,
    sessionToken: sessionToken,
  },
  awsRegion: region,
} as lancedb.ConnectionOptions);

  console.log(await db.createTable("test", [{ vector: [1, 2, 3] }]));
  console.log(await db.tableNames());
  console.log(await db.dropTable("test"))
```
2024-01-31 12:05:01 -08:00
Will Jones
8d0ea29f89 docs: provide AWS S3 cleanup and permissions advice (#903)
Adding some more quick advice for how to setup AWS S3 with LanceDB.

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-01-31 09:24:54 -08:00
Abraham Lopez
b9468bb980 chore: update JS/TS example in README (#898)
- The JS/TS library actually expects named parameters via an object in
`.createTable()` rather than individual arguments
- Added example on how to search rows by criteria without a vector
search. TS type of `.search()` currently has the `query` parameter as
non-optional so we have to pass undefined for now.
2024-01-30 11:09:45 -08:00
Lei Xu
a42df158a3 ci: change apple silicon runner to free OSS macos-14 target (#901) 2024-01-30 11:05:42 -08:00
Raghav Dixit
9df6905d86 chore(python): GTE embedding function model name update (#902)
Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-01-30 23:56:29 +05:30
Ayush Chaurasia
3ffed89793 feat(python): Hybrid search & Reranker API (#824)
based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)


### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`

```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker

# Same as default 
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=LinearCombinationReranker()
                                     ).to_pandas()

# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=reranker
                                     ).to_pandas()
```

Cohere Reranker
```
from lancedb.rerankers import CohereReranker

# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank",  # score or rank
                                      reranker=CohereReranker()
                                     ).to_pandas()

```

CrossEncoderReranker

```
from lancedb.rerankers import CrossEncoderReranker

table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank", 
                                      reranker=CrossEncoderReranker()
                                     ).to_pandas()

```

## Using custom Reranker
```
from lancedb.reranker import Reranker

class CustomReranker(Reranker):
    def rerank_hybrid(self, vector_result, fts_result):
           combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
           # Custom rerank logic here
           
           return combined_res
```

- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-01-30 19:10:33 +05:30
Prashanth Rao
f150768739 Fix image bgcolor (#891)
Minor fix to change the background color for an image in the docs. It's
now readable in both light and dark modes (earlier version made it
impossible to read in dark mode).
2024-01-30 16:50:29 +05:30
Ayush Chaurasia
b432ecf2f6 doc: Add documentation chatbot for LanceDB (#890)
<img width="1258" alt="Screenshot 2024-01-29 at 10 05 52 PM"
src="https://github.com/lancedb/lancedb/assets/15766192/7c108fde-e993-415c-ad01-72010fd5fe31">
2024-01-30 11:24:57 +05:30
Raghav Dixit
d1a7257810 feat(python): Embedding fn support for gte-mlx/gte-large (#873)
have added testing and an example in the docstring, will be pushing a
separate PR in recipe repo for rag example

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-01-30 11:21:57 +05:30
Ayush Chaurasia
5c5e23bbb9 chore(python): Temporarily extend remote connection timeout (#888)
Context - https://etoai.slack.com/archives/C05NC5YSW5V/p1706371205883149
2024-01-29 17:34:33 +05:30
Lei Xu
e5796a4836 doc: fix js example of create index (#886) 2024-01-28 17:02:36 -08:00
Lei Xu
b9c5323265 doc: use snippet for rust code example and make sure rust examples run through CI (#885) 2024-01-28 14:30:30 -08:00
Lei Xu
e41a52863a fix: fix doc build to include the source snippet correctly (#883) 2024-01-28 11:55:58 -08:00
Chang She
13acc8a480 doc(rust): minor fixes for Rust quick start. (#878) 2024-01-28 11:40:52 -08:00
Lei Xu
22b9eceb12 chore: convert all js doc test to use snippet. (#881) 2024-01-28 11:39:25 -08:00
Lei Xu
5f62302614 doc: use code snippet for typescript examples (#880)
The typescript code is in a fully function file, that will be run via the CI.
2024-01-27 22:52:37 -08:00
Ayush Chaurasia
d84e0d1db8 feat(python): Aws Bedrock embeddings integration (#822)
Supports amazon titan, cohere english & cohere multi-lingual base
models.
2024-01-28 02:04:15 +05:30
Lei Xu
ac94b2a420 chore: upgrade lance, pylance and datafusion (#879) 2024-01-27 12:31:38 -08:00
Lei Xu
b49bc113c4 chore: add one rust SDK e2e example (#876)
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-01-26 22:41:20 -08:00
Lei Xu
77b5b1cf0e doc: update quick start for full rust example (#872) 2024-01-26 16:19:43 -08:00
Lei Xu
e910809de0 chore: bump github actions to v4 due to GHA warnings of node version deprecation (#874) 2024-01-26 15:52:47 -08:00