Commit Graph

926 Commits

Author SHA1 Message Date
Lance Release
fb26f31beb [python] Bump version: 0.6.6 → 0.6.7 python-v0.6.7 2024-04-04 23:43:04 +00:00
Lance Release
7c138c54c4 Updating package-lock.json 2024-04-04 21:40:08 +00:00
Lance Release
e9011b71b1 Bump version: 0.4.15 → 0.4.16 v0.4.16 2024-04-04 21:39:58 +00:00
Will Jones
1b605ecc3b chore: upgrade to lance-0.10.9 (#1192) 2024-04-04 14:39:24 -07:00
QianZhu
bcc879b74a add a default value for search.limit to be consistent with python sdk (#1191)
Changed the default value for search.limit to be 10
2024-04-04 12:22:10 -07:00
Bert
fad0b76159 ensure table names are uri encoded for tables (#1189)
This prevents an issue where users can do something like:
```js
db.createTable('my-table#123123')
```
The server has logic to determine that '#' character is not allowed in
the table name, but currently this is being returned as 404 error
because it routes to `/v1/my-table#123123/create` and `#123123/create`
will not be parsed as part of path
2024-04-04 10:48:07 -07:00
Will Jones
8364d589ab feat: ship fp16kernels in Python wheels (#1148)
Same deal as https://github.com/lancedb/lance/pull/2098
2024-04-04 09:33:34 -07:00
Lei Xu
8687735bea chore: bump to 0.10.8 (#1187) 2024-04-03 16:52:32 -07:00
QianZhu
f0cd43da69 bug: fix the return value of countRows (#1186) 2024-04-03 16:31:49 -07:00
Lei Xu
7b954c7e3e chore: bump lance version (#1185)
Bump lance version to `0.10.7`
2024-04-03 14:46:05 -07:00
Bert
2579f29a92 fix error decoding in nodejs client (#1184)
fixes: #1183
2024-04-03 10:24:51 -04:00
QianZhu
7562b0fad1 remote count_rows need to return the number (#1181) 2024-04-02 13:12:22 -07:00
eduardjbotha
83b6b0d28a SQL Documentation includes DataFusion functions (#1179)
Show that it is possible to use the DataFusion functions in the `WHERE`
clause.

Co-authored-by: Eduard Botha <eduard.botha@inovex.de>
2024-04-02 07:49:48 -07:00
Lei Xu
46e95f2c4c chore: add social link footer (#1177) 2024-04-01 22:09:27 -07:00
Lei Xu
73810b4410 chore: pass str instead of String to build table names (#1178) 2024-04-01 21:31:07 -07:00
Lance Release
09280bc54a Updating package-lock.json 2024-04-02 03:03:07 +00:00
Lance Release
5603f1e57f Updating package-lock.json 2024-04-02 02:28:04 +00:00
QianZhu
1d67615cff feat: add filterable countRows to remote API (#1169) 2024-04-01 14:31:15 -07:00
Lance Release
05f484b716 [python] Bump version: 0.6.5 → 0.6.6 python-v0.6.6 2024-04-01 19:09:01 +00:00
Lance Release
7e92aa657a Updating package-lock.json 2024-04-01 18:36:25 +00:00
Lance Release
e5f40a4b09 Bump version: 0.4.14 → 0.4.15 v0.4.15 2024-04-01 18:36:13 +00:00
Weston Pace
6779c1c192 chore: bump lance version to 0.10.6 (#1175) 2024-04-01 11:35:47 -07:00
Bert
e0bf6d9bd0 Update LanceDB Logo in README (#1167)
<img width="1034" alt="image"
src="https://github.com/lancedb/lancedb/assets/5846846/5b8aa53c-4d93-4c0e-bed4-80c238b319ba">
2024-03-29 10:10:43 -04:00
Weston Pace
67f041be91 docs: add a reference to @lancedb/lance in the docs (#1166)
We aren't yet ready to switch over the examples since almost all JS
examples rely on embeddings and we haven't yet ported those over.
However, this makes it possible for those that are interested to start
using `@lancedb/lancedb`
2024-03-29 04:55:03 -07:00
Will Jones
d388ef2f55 ci: fix name collision in npm artifacts for vectordb (#1164)
Fixes #1163
2024-03-28 14:07:27 -05:00
Weston Pace
e52dc877e3 chore: add nodejs to bumpversion (#1161)
The previous release failed to release nodejs because the nodejs version
wasn't bumped. This should fix that.
2024-03-28 08:54:32 -07:00
Weston Pace
ca4fdf5499 chore: fix clippy (#1162) 2024-03-28 08:54:17 -07:00
Bert
0e9ad764b0 added new logo to vercel example gif (#1158) 2024-03-26 16:25:36 -04:00
Bert
cae0348c51 New logo on docs site (#1157) 2024-03-26 20:50:13 +05:30
Ayush Chaurasia
e9e0a37ca8 docs: Add all available HF/sentence transformers embedding models list (#1134)
Solves -  https://github.com/lancedb/lancedb/issues/968
2024-03-26 19:04:09 +05:30
Weston Pace
c37a28abbd docs: add the async python API to the docs (#1156) 2024-03-26 07:54:16 -05:00
Lance Release
98c1e635b3 Updating package-lock.json 2024-03-25 20:38:37 +00:00
Lance Release
9992b927fd Updating package-lock.json 2024-03-25 15:43:00 +00:00
Lance Release
80d501011c Bump version: 0.4.13 → 0.4.14 v0.4.14 2024-03-25 15:42:49 +00:00
Weston Pace
6e3a9d08e0 feat: add publish step for nodejs (#1155)
This will start publishing `@lancedb/lancedb` with the new nodejs
package on our releases.
2024-03-25 11:23:30 -04:00
Pranav Maddi
268d8e057b Adds a Ask LanceDB button to docs. (#1150)
This links out to the new [asklancedb.com](https://asklancedb.com) page.

Screenshots of the change:

![Quick start - LanceDB · 10 20am ·
03-22](https://github.com/lancedb/lancedb/assets/2371511/c45ba893-fc74-4957-bdd3-3712b351aff3)
![Quick start -
LanceDB](https://github.com/lancedb/lancedb/assets/2371511/d4762eb6-52af-4fd5-857e-3ed280716999)
2024-03-23 01:09:44 +05:30
Bert
dfc518b8fb Node SDK Client middleware for HTTP Requests (#1130)
Adds client-side middleware to LanceDB Node SDK to instrument HTTP
Requests

Example - adding `x-request-id` request header:
```js
class HttpMiddleware {
    constructor({ requestId }) {
        this.requestId = requestId
    }

    onRemoteRequest(req, next) {
        req.headers['x-request-id'] = this.requestId
        return next(req)
    }
}

const db = await lancedb.connect({
  uri: 'db://remote-123',
  apiKey: 'sk_...',
})

let tables = await db.withMiddleware(new HttpMiddleware({ requestId: '123' })).tableNames();

```

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-03-22 11:58:05 -04:00
QianZhu
98acf34ae8 remove warnings (#1147) 2024-03-21 14:49:01 -07:00
Lei Xu
25988d23cd chore: validate table name (#1146)
Closes #1129
2024-03-21 14:46:13 -07:00
Lance Release
c0dd98c798 [python] Bump version: 0.6.4 → 0.6.5 python-v0.6.5 2024-03-21 19:53:38 +00:00
Lei Xu
ee73a3bcb8 chore: bump lance to 0.10.5 (#1145) 2024-03-21 12:53:02 -07:00
QianZhu
c07989ac29 fix nodejs test (#1141)
changed the error msg for query with wrong vector dim thus need this
change to pass the nodejs tests.
2024-03-21 07:21:39 -07:00
QianZhu
8f7ef26f5f better error msg for query vector with wrong dim (#1140) 2024-03-20 21:01:05 -07:00
Ishani Ghose
e14f079fe2 feat: add to_batches API #805 (#1048)
SDK
Python

Description
Exposes pyarrow batch api during query execution - relevant when there
is no vector search query, dataset is large and the filtered result is
larger than memory.

---------

Co-authored-by: Ishani Ghose <isghose@amazon.com>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-03-20 13:38:06 -07:00
Weston Pace
7d790bd9e7 feat: introduce ArrowNative wrapper struct for adding data that is already a RecordBatchReader (#1139)
In
2de226220b
I added a new `IntoArrow` trait for adding data into a table.
Unfortunately, it seems my approach for implementing the trait for
"things that are already record batch readers" was flawed. This PR
corrects that flaw and, conveniently, removes the need to box readers at
all (though it is ok if you do).
2024-03-20 13:28:17 -07:00
natcharacter
dbdd0a7b4b Order by field support FTS (#1132)
This PR adds support for passing through a set of ordering fields at
index time (unsigned ints that tantivity can use as fast_fields) that at
query time you can sort your results on. This is useful for cases where
you want to get related hits, i.e by keyword, but order those hits by
some other score, such as popularity.

I.e search for songs descriptions that match on "sad AND jazz AND 1920"
and then order those by number of times played. Example usage can be
seen in the fts tests.

---------

Co-authored-by: Nat Roth <natroth@Nats-MacBook-Pro.local>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-03-20 01:27:37 -07:00
Chang She
befb79c5f9 feat(python): support writing huggingface dataset and dataset dict (#1110)
HuggingFace Dataset is written as arrow batches.
For DatasetDict, all splits are written with a "split" column appended.

- [x] what if the dataset schema already has a `split` column
- [x] add unit tests
2024-03-20 00:22:03 -07:00
Ayush Chaurasia
0a387a5429 feat(python): Support reranking for vector and fts (#1103)
solves https://github.com/lancedb/lancedb/issues/1086

Usage Reranking with FTS:
```
retriever = db.create_table("fine-tuning", schema=Schema, mode="overwrite")
pylist = [{"text": "Carson City is the capital city of the American state of Nevada. At the  2010 United States Census, Carson City had a population of 55,274."},
          {"text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."},
        {"text": "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas."},
        {"text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. "},
        {"text": "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."},
        {"text": "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck."},
        ]
retriever.add(pylist)
retriever.create_fts_index("text", replace=True)

query = "What is the capital of the United States?"
reranker = CohereReranker(return_score="all")
print(retriever.search(query, query_type="fts").limit(10).to_pandas())
print(retriever.search(query, query_type="fts").rerank(reranker=reranker).limit(10).to_pandas())
```
Result
```
                                                text                                             vector     score
0  Capital punishment (the death penalty) has exi...  [0.099975586, 0.047943115, -0.16723633, -0.183...  0.729602
1  Charlotte Amalie is the capital and largest ci...  [-0.021255493, 0.03363037, -0.027450562, -0.17...  0.678046
2  The Commonwealth of the Northern Mariana Islan...  [0.3684082, 0.30493164, 0.004600525, -0.049407...  0.671521
3  Carson City is the capital city of the America...  [0.13989258, 0.14990234, 0.14172363, 0.0546569...  0.667898
4  Washington, D.C. (also known as simply Washing...  [-0.0090408325, 0.42578125, 0.3798828, -0.3574...  0.653422
5  North Dakota is a state in the United States. ...  [0.55859375, -0.2109375, 0.14526367, 0.1634521...  0.639346
                                                text                                             vector     score  _relevance_score
0  Washington, D.C. (also known as simply Washing...  [-0.0090408325, 0.42578125, 0.3798828, -0.3574...  0.653422          0.979977
1  The Commonwealth of the Northern Mariana Islan...  [0.3684082, 0.30493164, 0.004600525, -0.049407...  0.671521          0.299105
2  Capital punishment (the death penalty) has exi...  [0.099975586, 0.047943115, -0.16723633, -0.183...  0.729602          0.284874
3  Carson City is the capital city of the America...  [0.13989258, 0.14990234, 0.14172363, 0.0546569...  0.667898          0.089614
4  North Dakota is a state in the United States. ...  [0.55859375, -0.2109375, 0.14526367, 0.1634521...  0.639346          0.063832
5  Charlotte Amalie is the capital and largest ci...  [-0.021255493, 0.03363037, -0.027450562, -0.17...  0.678046          0.041462
```

## Vector Search usage:
```
query = "What is the capital of the United States?"
reranker = CohereReranker(return_score="all")
print(retriever.search(query).limit(10).to_pandas())
print(retriever.search(query).rerank(reranker=reranker, query=query).limit(10).to_pandas()) # <-- Note: passing extra string query here
```

Results
```
                                                text                                             vector  _distance
0  Capital punishment (the death penalty) has exi...  [0.099975586, 0.047943115, -0.16723633, -0.183...  39.728973
1  Washington, D.C. (also known as simply Washing...  [-0.0090408325, 0.42578125, 0.3798828, -0.3574...  41.384884
2  Carson City is the capital city of the America...  [0.13989258, 0.14990234, 0.14172363, 0.0546569...  55.220200
3  Charlotte Amalie is the capital and largest ci...  [-0.021255493, 0.03363037, -0.027450562, -0.17...  58.345654
4  The Commonwealth of the Northern Mariana Islan...  [0.3684082, 0.30493164, 0.004600525, -0.049407...  60.060867
5  North Dakota is a state in the United States. ...  [0.55859375, -0.2109375, 0.14526367, 0.1634521...  64.260544
                                                text                                             vector  _distance  _relevance_score
0  Washington, D.C. (also known as simply Washing...  [-0.0090408325, 0.42578125, 0.3798828, -0.3574...  41.384884          0.979977
1  The Commonwealth of the Northern Mariana Islan...  [0.3684082, 0.30493164, 0.004600525, -0.049407...  60.060867          0.299105
2  Capital punishment (the death penalty) has exi...  [0.099975586, 0.047943115, -0.16723633, -0.183...  39.728973          0.284874
3  Carson City is the capital city of the America...  [0.13989258, 0.14990234, 0.14172363, 0.0546569...  55.220200          0.089614
4  North Dakota is a state in the United States. ...  [0.55859375, -0.2109375, 0.14526367, 0.1634521...  64.260544          0.063832
5  Charlotte Amalie is the capital and largest ci...  [-0.021255493, 0.03363037, -0.027450562, -0.17...  58.345654          0.041462
```
2024-03-19 22:20:31 +05:30
Weston Pace
5a173e1d54 fix: fix compile error in example caused by merge conflict (#1135) 2024-03-19 08:55:15 -07:00
Weston Pace
51bdbcad98 feat: change DistanceType to be independent thing instead of resuing lance_linalg (#1133)
This PR originated from a request to add `Serialize` / `Deserialize` to
`lance_linalg::distance::DistanceType`. However, that is a strange
request for `lance_linalg` which shouldn't really have to worry about
`Serialize` / `Deserialize`. The problem is that `lancedb` is re-using
`DistanceType` and things in `lancedb` do need to worry about
`Serialize`/`Deserialize` (because `lancedb` needs to support remote
client).

On the bright side, separating the two types allows us to independently
document distance type and allows `lance_linalg` to make changes to
`DistanceType` in the future without having to worry about backwards
compatibility concerns.
2024-03-19 07:27:51 -07:00