Compare commits

...

48 Commits

Author SHA1 Message Date
Lance Release
8bcdc81fd3 [python] Bump version: 0.4.4 → 0.5.0 2024-01-18 01:53:15 +00:00
Chang She
39e14c70c5 chore(python): turn off lazy frame ingestion (#821) 2024-01-16 19:11:16 -08:00
Chang She
af8263af94 feat(python): allow the entire table to be converted a polars dataframe (#814) 2024-01-15 15:49:16 -08:00
Chang She
be4ab9eef3 feat(python): add exist_ok option to create table (#813)
This mimics CREATE TABLE IF NOT EXISTS behavior.
We add `db.create_table(..., exist_ok=True)` parameter.
By default it is set to False, so trying to create
a table with the same name will raise an exception.
If set to True, then it only opens the table if it
already exists. If you pass in a schema, it will
be checked against the existing table to make sure
you get what you want. If you pass in data, it will
NOT be added to the existing table.
2024-01-15 11:09:18 -08:00
Ayush Chaurasia
184d2bc969 chore(python): get rid of Pydantic deprication warning in embedding fcn (#816)
```
UserWarning: Valid config keys have changed in V2:
* 'keep_untouched' has been renamed to 'ignored_types' warnings.warn(message, UserWarning)
```
2024-01-15 12:19:51 +05:30
Anton Shevtsov
ff6f005336 Add openai api key not found help (#815)
This pull request adds check for the presence of an environment variable
`OPENAI_API_KEY` and removes an unused parameter in
`retry_with_exponential_backoff` function.
2024-01-15 02:44:09 +05:30
Chang She
49333e522c feat(python): basic polars integration (#811)
We should now be able to directly ingest polars dataframes and return
results as polars dataframes


![image](https://github.com/lancedb/lancedb/assets/759245/828b1260-c791-45f1-a047-aa649575e798)
2024-01-13 16:38:16 -08:00
Ayush Chaurasia
4568df422d feat(python): Add gemini text embedding function (#806)
Named it Gemini-text for now. Not sure how complicated it will be to
support both text and multimodal embeddings under the same class
"gemini"..But its not something to worry about for now I guess.
2024-01-12 22:38:55 -08:00
Lance Release
986891db98 Updating package-lock.json 2024-01-11 22:21:42 +00:00
Lance Release
036bf02901 Updating package-lock.json 2024-01-11 21:34:04 +00:00
Lance Release
4e31f0cc7a Bump version: 0.4.2 → 0.4.3 2024-01-11 21:33:55 +00:00
Lance Release
0a16e29b93 [python] Bump version: 0.4.3 → 0.4.4 2024-01-11 21:29:00 +00:00
Will Jones
cf7d7a19f5 upgrade lance (#809) 2024-01-11 13:28:10 -08:00
Lei Xu
fe2fb91a8b chore: remove black as dependency (#808)
We use `ruff` in CI and dev workflow now.
2024-01-11 10:58:49 -08:00
Chang She
81af350d85 feat(node): align incoming data to table schema (#802) 2024-01-10 16:44:00 -08:00
Sebastian Law
99adfe065a use requests instead of aiohttp for underlying http client (#803)
instead of starting and stopping the current thread's event loop on
every http call, just make an http call.
2024-01-10 00:07:50 -05:00
Chang She
277406509e chore(python): add docstring for limit behavior (#800)
Closes #796
2024-01-09 20:20:13 -08:00
Chang She
63411b4d8b feat(python): add phrase query option for fts (#798)
addresses #797 

Problem: tantivy does not expose option to explicitly

Proposed solution here: 

1. Add a `.phrase_query()` option
2. Under the hood, LanceDB takes care of wrapping the input in quotes
and replace nested double quotes with single quotes

I've also filed an upstream issue, if they support phrase queries
natively then we can get rid of our manual custom processing here.
2024-01-09 19:41:31 -08:00
Chang She
d998f80b04 feat(python): add count_rows with filter option (#801)
Closes #795
2024-01-09 19:33:03 -08:00
Chang She
629379a532 fix(rust): not sure why clippy is suddenly unhappy (#794)
should fix the error on top of main


https://github.com/lancedb/lancedb/actions/runs/7457190471/job/20288985725
2024-01-09 19:27:38 -08:00
Chang She
99ba5331f0 feat(python): support new style optional syntax (#793) 2024-01-09 07:03:29 -08:00
Chang She
121687231c chore(python): document phrase queries in fts (#788)
closes #769 

Add unit test and documentation on using quotes to perform a phrase
query
2024-01-08 21:49:31 -08:00
Chang She
ac40d4b235 feat(node): support table.schema for LocalTable (#789)
Close #773 

we pass an empty table over IPC so we don't need to manually deal with
serde. Then we just return the schema attribute from the empty table.

---------

Co-authored-by: albertlockett <albert.lockett@gmail.com>
2024-01-08 21:12:48 -08:00
Lei Xu
c5a52565ac chore: bump lance to 0.9.5 (#790) 2024-01-07 19:27:47 -08:00
Chang She
b0a88a7286 feat(python): Set heap size to get faster fts indexing performance (#762)
By default tantivy-py uses 128MB heapsize. We change the default to 1GB
and we allow the user to customize this

locally this makes `test_fts.py` run 10x faster
2024-01-07 15:15:13 -08:00
lucasiscovici
d41d849e0e raise exception if fts index does not exist (#776)
raise exception if fts index does not exist

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-01-07 14:34:04 -08:00
sudhir
bf5202f196 Make examples work with current version of Openai api's (#779)
These examples don't work because of changes in openai api from version
1+
2024-01-07 14:27:56 -08:00
Chris
8be2861061 Minor Fixes to Ingest Embedding Functions Docs (#777)
Addressed minor typos and grammatical issues to improve readability

---------

Co-authored-by: Christopher Correa <chris.correa@gmail.com>
2024-01-07 14:27:40 -08:00
Vladimir Varankin
0560e3a0e5 Minor corrections for docs of embedding_functions (#780)
In addition to #777, this pull request fixes more typos in the
documentation for "Ingest Embedding Functions".
2024-01-07 14:26:35 -08:00
QianZhu
b83fbfc344 small bug fix for example code in SaaS JS doc (#770) 2024-01-04 14:30:34 -08:00
Chang She
60b22d84bf chore(python): handle NaN input in fts ingestion (#763)
If the input text is None, Tantivy raises an error
complaining it cannot add a NoneType. We handle this
upstream so None's are not added to the document.
If all of the indexed fields are None then we skip
this document.
2024-01-04 11:45:12 -08:00
Bengsoon Chuah
7d55a94efd Add relevant imports for each step (#764)
I found that it was quite incoherent to have to read through the
documentation and having to search which submodule that each class
should be imported from.

For example, it is cumbersome to have to navigate to another
documentation page to find out that `EmbeddingFunctionRegistry` is from
`lancedb.embeddings`
2024-01-04 11:15:42 -08:00
QianZhu
4d8e401d34 SaaS JS API sdk doc (#740)
Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>
2024-01-03 16:24:21 -08:00
Chang She
684eb8b087 feat(js): support list of string input (#755)
Add support for adding lists of string input (e.g., list of categorical
labels)

Follow-up items: #757 #758
2024-01-02 20:55:33 -08:00
Lance Release
4e3b82feaa Updating package-lock.json 2023-12-30 03:16:41 +00:00
Lance Release
8e248a9d67 Updating package-lock.json 2023-12-30 00:53:51 +00:00
Lance Release
065ffde443 Bump version: 0.4.1 → 0.4.2 2023-12-30 00:53:30 +00:00
Lance Release
c3059dc689 [python] Bump version: 0.4.2 → 0.4.3 2023-12-30 00:52:54 +00:00
Lei Xu
a9caa5f2d4 chore: bump pylance to 0.9.2 (#754) 2023-12-29 16:39:45 -08:00
Xin Hao
8411c36b96 docs: fix link (#752) 2023-12-29 15:33:24 -08:00
Chang She
7773bda7ee feat(python): first cut batch queries for remote api (#753)
issue separate requests under the hood and concatenate results
2023-12-29 15:33:03 -08:00
Lance Release
392777952f [python] Bump version: 0.4.1 → 0.4.2 2023-12-29 00:19:21 +00:00
Chang She
7e75e50d3a chore(python): update embedding API to use openai 1.6.1 (#751)
API has changed significantly, namely `openai.Embedding.create` no
longer exists.
https://github.com/openai/openai-python/discussions/742

Update the OpenAI embedding function and put a minimum on the openai sdk
version.
2023-12-28 15:05:57 -08:00
Chang She
4b8af261a3 feat: add timezone handling for datetime in pydantic (#578)
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
2023-12-28 11:02:56 -08:00
Chang She
c8728d4ca1 feat(python): add post filtering for full text search (#739)
Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2023-12-27 09:31:04 -08:00
Aidan
446f837335 fix: createIndex index cache size (#741) 2023-12-27 09:25:13 -08:00
Chang She
8f9ad978f5 feat(python): support list of list fields from pydantic schema (#747)
For object detection, each row may correspond to an image and each image
can have multiple bounding boxes of x-y coordinates. This means that a
`bbox` field is potentially "list of list of float". This adds support
in our pydantic-pyarrow conversion for nested lists.
2023-12-27 09:10:09 -08:00
Lance Release
0df38341d5 Updating package-lock.json 2023-12-26 17:21:51 +00:00
53 changed files with 2167 additions and 577 deletions

View File

@@ -1,5 +1,5 @@
[bumpversion] [bumpversion]
current_version = 0.4.1 current_version = 0.4.3
commit = True commit = True
message = Bump version: {current_version} → {new_version} message = Bump version: {current_version} → {new_version}
tag = True tag = True

View File

@@ -88,6 +88,9 @@ jobs:
cd docs/test cd docs/test
node md_testing.js node md_testing.js
- name: Test - name: Test
env:
LANCEDB_URI: ${{ secrets.LANCEDB_URI }}
LANCEDB_DEV_API_KEY: ${{ secrets.LANCEDB_DEV_API_KEY }}
run: | run: |
cd docs/test/node cd docs/test/node
for d in *; do cd "$d"; echo "$d".js; node "$d".js; cd ..; done for d in *; do cd "$d"; echo "$d".js; node "$d".js; cd ..; done

View File

@@ -74,7 +74,7 @@ jobs:
run: | run: |
pip install -e .[tests] pip install -e .[tests]
pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985 pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985
pip install pytest pytest-mock black pip install pytest pytest-mock
- name: Run tests - name: Run tests
run: pytest -m "not slow" -x -v --durations=30 tests run: pytest -m "not slow" -x -v --durations=30 tests
pydantic1x: pydantic1x:

View File

@@ -5,10 +5,10 @@ exclude = ["python"]
resolver = "2" resolver = "2"
[workspace.dependencies] [workspace.dependencies]
lance = { "version" = "=0.9.1", "features" = ["dynamodb"] } lance = { "version" = "=0.9.6", "features" = ["dynamodb"] }
lance-index = { "version" = "=0.9.1" } lance-index = { "version" = "=0.9.6" }
lance-linalg = { "version" = "=0.9.1" } lance-linalg = { "version" = "=0.9.6" }
lance-testing = { "version" = "=0.9.1" } lance-testing = { "version" = "=0.9.6" }
# Note that this one does not include pyarrow # Note that this one does not include pyarrow
arrow = { version = "49.0.0", optional = false } arrow = { version = "49.0.0", optional = false }
arrow-array = "49.0" arrow-array = "49.0"

View File

@@ -149,6 +149,7 @@ nav:
- OSS Python API: python/python.md - OSS Python API: python/python.md
- SaaS Python API: python/saas-python.md - SaaS Python API: python/saas-python.md
- Javascript API: javascript/modules.md - Javascript API: javascript/modules.md
- SaaS Javascript API: javascript/saas-modules.md
- LanceDB Cloud↗: https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms - LanceDB Cloud↗: https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms
extra_css: extra_css:

View File

@@ -164,6 +164,7 @@ You can further filter the elements returned by a search using a where clause.
const results_2 = await table const results_2 = await table
.search(Array(1536).fill(1.2)) .search(Array(1536).fill(1.2))
.where("id != '1141'") .where("id != '1141'")
.limit(2)
.execute() .execute()
``` ```
@@ -187,6 +188,7 @@ You can select the columns returned by the query using a select clause.
const results_3 = await table const results_3 = await table
.search(Array(1536).fill(1.2)) .search(Array(1536).fill(1.2))
.select(["id"]) .select(["id"])
.limit(2)
.execute() .execute()
``` ```

View File

@@ -67,7 +67,7 @@ We'll cover the basics of using LanceDB on your local machine in this section.
!!! warning !!! warning
If the table already exists, LanceDB will raise an error by default. If the table already exists, LanceDB will raise an error by default.
If you want to overwrite the table, you can pass in `mode="overwrite"` If you want to make sure you overwrite the table, pass in `mode="overwrite"`
to the `createTable` function. to the `createTable` function.
=== "Javascript" === "Javascript"

View File

@@ -118,6 +118,42 @@ texts = [{"text": "Capitalism has been dominant in the Western world since the e
tbl.add(texts) tbl.add(texts)
``` ```
## Gemini Embedding Function
With Google's Gemini, you can represent text (words, sentences, and blocks of text) in a vectorized form, making it easier to compare and contrast embeddings. For example, two texts that share a similar subject matter or sentiment should have similar embeddings, which can be identified through mathematical comparison techniques such as cosine similarity. For more on how and why you should use embeddings, refer to the Embeddings guide.
The Gemini Embedding Model API supports various task types:
| Task Type | Description |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| "`retrieval_query`" | Specifies the given text is a query in a search/retrieval setting. |
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically proided by Embeddings API |
| "`semantic_similarity`" | Specifies the given text will be used for Semantic Textual Similarity (STS). |
| "`classification`" | Specifies that the embeddings will be used for classification. |
| "`clusering`" | Specifies that the embeddings will be used for clustering. |
Usage Example:
```python
import lancedb
import pandas as pd
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
model = get_registry().get("gemini-text").create()
class TextModel(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect("~/.lancedb")
tbl = db.create_table("test", schema=TextModel, mode="overwrite")
tbl.add(df)
rs = tbl.search("hello").limit(1).to_pandas()
```
## Multi-modal embedding functions ## Multi-modal embedding functions
Multi-modal embedding functions allow you to query your table using both images and text. Multi-modal embedding functions allow you to query your table using both images and text.

View File

@@ -1,13 +1,14 @@
Representing multi-modal data as vector embeddings is becoming a standard practice. Embedding functions themselves be thought of as a part of the processing pipeline that each request(input) has to be passed through. After initial setup these components are not expected to change for a particular project. Representing multi-modal data as vector embeddings is becoming a standard practice. Embedding functions themselves can be thought of as a part of the processing pipeline that each request(input) has to be passed through. After initial setup these components are not expected to change for a particular project.
This is main motivation behind our new embedding functions API, that allow you simply set it up once and the table remembers it, effectively making the **embedding functions disappear in the background** so you don't have to worry about modelling and simply focus on the DB aspects of VectorDB.
Our new embedding functions API allow you simply set it up once and the table remembers it, effectively making the **embedding functions disappear in the background** so you don't have to worry about modelling and can simply focus on the DB aspects of VectorDB.
You can simply follow these steps and forget about the details of your embedding functions as long as you don't intend to change it. You can simply follow these steps and forget about the details of your embedding functions as long as you don't intend to change it.
### Step 1 - Define the embedding function ### Step 1 - Define the embedding function
We have some pre-defined embedding functions in the global registry with more coming soon. Here's let's an implementation of CLIP as example. We have some pre-defined embedding functions in the global registry with more coming soon. Here's let's an implementation of CLIP as example.
``` ```
from lancedb.embeddings import EmbeddingFunctionRegistry
registry = EmbeddingFunctionRegistry.get_instance() registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create() clip = registry.get("open-clip").create()
@@ -15,9 +16,11 @@ clip = registry.get("open-clip").create()
You can also define your own embedding function by implementing the `EmbeddingFunction` abstract base interface. It subclasses PyDantic Model which can be utilized to write complex schemas simply as we'll see next! You can also define your own embedding function by implementing the `EmbeddingFunction` abstract base interface. It subclasses PyDantic Model which can be utilized to write complex schemas simply as we'll see next!
### Step 2 - Define the Data Model or Schema ### Step 2 - Define the Data Model or Schema
Our embedding function from the previous section abstracts away all the details about the models and dimensions required to define the schema. You can simply set a feild as **source** or **vector** column. Here's how Our embedding function from the previous section abstracts away all the details about the models and dimensions required to define the schema. You can simply set a field as **source** or **vector** column. Here's how
```python ```python
from lancedb.pydantic import LanceModel, Vector
class Pets(LanceModel): class Pets(LanceModel):
vector: Vector(clip.ndims) = clip.VectorField() vector: Vector(clip.ndims) = clip.VectorField()
image_uri: str = clip.SourceField() image_uri: str = clip.SourceField()
@@ -30,11 +33,13 @@ class Pets(LanceModel):
Now that we have chosen/defined our embedding function and the schema, we can create the table Now that we have chosen/defined our embedding function and the schema, we can create the table
```python ```python
import lancedb
db = lancedb.connect("~/lancedb") db = lancedb.connect("~/lancedb")
table = db.create_table("pets", schema=Pets) table = db.create_table("pets", schema=Pets)
``` ```
That's it! We have ingested all the information needed to embed source and query inputs. We can now forget about the model and dimension details and start to build or VectorDB
That's it! We have ingested all the information needed to embed source and query inputs. We can now forget about the model and dimension details and start to build our VectorDB.
### Step 4 - Ingest lots of data and run vector search! ### Step 4 - Ingest lots of data and run vector search!
Now you can just add the data and it'll be vectorized automatically Now you can just add the data and it'll be vectorized automatically
@@ -52,29 +57,32 @@ result = table.search("dog")
Let's query an image Let's query an image
```python ```python
from pathlib import Path
p = Path("path/to/images/samoyed_100.jpg") p = Path("path/to/images/samoyed_100.jpg")
query_image = Image.open(p) query_image = Image.open(p)
table.search(query_image) table.search(query_image)
``` ```
### Rate limit Handling ### Rate limit Handling
`EmbeddingFunction` class wraps the calls for source and query embedding generation inside a rate limit handler that retries the requests with exponential backoff after successive failures. By default the maximum retires is set to 7. You can tune it by setting it to a different number or disable it by setting it to 0. `EmbeddingFunction` class wraps the calls for source and query embedding generation inside a rate limit handler that retries the requests with exponential backoff after successive failures. By default the maximum retires is set to 7. You can tune it by setting it to a different number or disable it by setting it to 0. Example:
Example
----
```python ```python
clip = registry.get("open-clip").create() # Defaults to 7 max retries clip = registry.get("open-clip").create() # Defaults to 7 max retries
clip = registry.get("open-clip").create(max_retries=10) # Increase max retries to 10 clip = registry.get("open-clip").create(max_retries=10) # Increase max retries to 10
clip = registry.get("open-clip").create(max_retries=0) # Retries disabled clip = registry.get("open-clip").create(max_retries=0) # Retries disabled
```` ```
NOTE: NOTE:
Embedding functions can also fail due to other errors that have nothing to do with rate limits. This is why the error is also logged. Embedding functions can also fail due to other errors that have nothing to do with rate limits. This is why the errors are also logged.
### A little fun with PyDantic ### A little fun with PyDantic
LanceDB is integrated with PyDantic. Infact we've used the integration in the above example to define the schema. It is also being used behing the scene by the embdding function API to ingest useful information as table metadata. LanceDB is integrated with PyDantic. In fact, we've used the integration in the above example to define the schema. It is also being used behind the scene by the embedding function API to ingest useful information as table metadata.
You can also use it for adding utility operations in the schema. For example, in our multi-modal example, you can search images using text or another image. Let us define a utility function to plot the image. You can also use it for adding utility operations in the schema. For example, in our multi-modal example, you can search images using text or another image. Let's define a utility function to plot the image.
```python ```python
from lancedb.pydantic import LanceModel, Vector
class Pets(LanceModel): class Pets(LanceModel):
vector: Vector(clip.ndims) = clip.VectorField() vector: Vector(clip.ndims) = clip.VectorField()
image_uri: str = clip.SourceField() image_uri: str = clip.SourceField()
@@ -83,7 +91,8 @@ class Pets(LanceModel):
def image(self): def image(self):
return Image.open(self.image_uri) return Image.open(self.image_uri)
``` ```
Now, you can covert your search results to pydantic model and use this property.
Now, you can covert your search results to PyDantic model and use its property.
```python ```python
rs = table.search(query_image).limit(3).to_pydantic(Pets) rs = table.search(query_image).limit(3).to_pydantic(Pets)
@@ -92,4 +101,4 @@ rs[2].image
![](../assets/dog_clip_output.png) ![](../assets/dog_clip_output.png)
Now that you've the basic idea about LanceDB embedding function, let us now dive deeper into the API that you can use to implement your own embedding functions! Now that you have the basic idea about LanceDB embedding function, let us dive deeper into the API that you can use to implement your own embedding functions!

View File

@@ -29,8 +29,9 @@ uri = "data/sample-lancedb"
db = lancedb.connect(uri) db = lancedb.connect(uri)
table = db.create_table("my_table", table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "text": "Frodo was a happy puppy"}, data=[{"vector": [3.1, 4.1], "text": "Frodo was a happy puppy", "meta": "foo"},
{"vector": [5.9, 26.5], "text": "There are several kittens playing"}]) {"vector": [5.9, 26.5], "text": "Sam was a loyal puppy", "meta": "bar"},
{"vector": [15.9, 6.5], "text": "There are several kittens playing"}])
``` ```
@@ -64,10 +65,51 @@ table.create_fts_index(["text1", "text2"])
Note that the search API call does not change - you can search over all indexed columns at once. Note that the search API call does not change - you can search over all indexed columns at once.
## Filtering
Currently the LanceDB full text search feature supports *post-filtering*, meaning filters are
applied on top of the full text search results. This can be invoked via the familiar
`where` syntax:
```python
table.search("puppy").limit(10).where("meta='foo'").to_list()
```
## Syntax
For full-text search you can perform either a phrase query like "the old man and the sea",
or a structured search query like "(Old AND Man) AND Sea".
Double quotes are used to disambiguate.
For example:
If you intended "they could have been dogs OR cats" as a phrase query, this actually
raises a syntax error since `OR` is a recognized operator. If you make `or` lower case,
this avoids the syntax error. However, it is cumbersome to have to remember what will
conflict with the query syntax. Instead, if you search using
`table.search('"they could have been dogs OR cats"')`, then the syntax checker avoids
checking inside the quotes.
## Configurations
By default, LanceDB configures a 1GB heap size limit for creating the index. You can
reduce this if running on a smaller node, or increase this for faster performance while
indexing a larger corpus.
```python
# configure a 512MB heap size
heap = 1024 * 1024 * 512
table.create_fts_index(["text1", "text2"], writer_heap_size=heap, replace=True)
```
## Current limitations ## Current limitations
1. Currently we do not yet support incremental writes. 1. Currently we do not yet support incremental writes.
If you add data after fts index creation, it won't be reflected If you add data after fts index creation, it won't be reflected
in search results until you do a full reindex. in search results until you do a full reindex.
2. We currently only support local filesystem paths for the fts index. 2. We currently only support local filesystem paths for the fts index.
This is a tantivy limitation. We've implemented an object store plugin
but there's no way in tantivy-py to specify to use it.

View File

@@ -31,13 +31,23 @@ This guide will show how to create tables, insert data into them, and update the
``` ```
!!! info "Note" !!! info "Note"
If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you can pass in mode="overwrite" to the createTable function. If the table already exists, LanceDB will raise an error by default.
`create_table` supports an optional `exist_ok` parameter. When set to True
and the table exists, then it simply opens the existing table. The data you
passed in will NOT be appended to the table in that case.
```python
db.create_table("name", data, exist_ok=True)
```
Sometimes you want to make sure that you start fresh. If you want to
overwrite the table, you can pass in mode="overwrite" to the createTable function.
```python ```python
db.create_table("name", data, mode="overwrite") db.create_table("name", data, mode="overwrite")
``` ```
### From pandas DataFrame ### From pandas DataFrame
```python ```python
@@ -118,6 +128,84 @@ This guide will show how to create tables, insert data into them, and update the
table = db.create_table(table_name, schema=Content) table = db.create_table(table_name, schema=Content)
``` ```
#### Nested schemas
Sometimes your data model may contain nested objects.
For example, you may want to store the document string
and the document soure name as a nested Document object:
```python
class Document(BaseModel):
content: str
source: str
```
This can be used as the type of a LanceDB table column:
```python
class NestedSchema(LanceModel):
id: str
vector: Vector(1536)
document: Document
tbl = db.create_table("nested_table", schema=NestedSchema, mode="overwrite")
```
This creates a struct column called "document" that has two subfields
called "content" and "source":
```
In [28]: tbl.schema
Out[28]:
id: string not null
vector: fixed_size_list<item: float>[1536] not null
child 0, item: float
document: struct<content: string not null, source: string not null> not null
child 0, content: string not null
child 1, source: string not null
```
#### Validators
Note that neither pydantic nor pyarrow automatically validates that input data
is of the *correct* timezone, but this is easy to add as a custom field validator:
```python
from datetime import datetime
from zoneinfo import ZoneInfo
from lancedb.pydantic import LanceModel
from pydantic import Field, field_validator, ValidationError, ValidationInfo
tzname = "America/New_York"
tz = ZoneInfo(tzname)
class TestModel(LanceModel):
dt_with_tz: datetime = Field(json_schema_extra={"tz": tzname})
@field_validator('dt_with_tz')
@classmethod
def tz_must_match(cls, dt: datetime) -> datetime:
assert dt.tzinfo == tz
return dt
ok = TestModel(dt_with_tz=datetime.now(tz))
try:
TestModel(dt_with_tz=datetime.now(ZoneInfo("Asia/Shanghai")))
assert 0 == 1, "this should raise ValidationError"
except ValidationError:
print("A ValidationError was raised.")
pass
```
When you run this code it should print "A ValidationError was raised."
#### Pydantic custom types
LanceDB does NOT yet support converting pydantic custom types. If this is something you need,
please file a feature request on the [LanceDB Github repo](https://github.com/lancedb/lancedb/issues/new).
### Using Iterators / Writing Large Datasets ### Using Iterators / Writing Large Datasets
It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()` It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`
@@ -153,7 +241,7 @@ This guide will show how to create tables, insert data into them, and update the
You can also use iterators of other types like Pandas dataframe or Pylists directly in the above example. You can also use iterators of other types like Pandas dataframe or Pylists directly in the above example.
## Creating Empty Table ## Creating Empty Table
You can also create empty tables in python. Initialize it with schema and later ingest data into it. You can create empty tables in python. Initialize it with schema and later ingest data into it.
```python ```python
import lancedb import lancedb

View File

@@ -0,0 +1,226 @@
[vectordb](../README.md) / [Exports](../saas-modules.md) / RemoteConnection
# Class: RemoteConnection
A connection to a remote LanceDB database. The class RemoteConnection implements interface Connection
## Implements
- [`Connection`](../interfaces/Connection.md)
## Table of contents
### Constructors
- [constructor](RemoteConnection.md#constructor)
### Methods
- [createTable](RemoteConnection.md#createtable)
- [tableNames](RemoteConnection.md#tablenames)
- [openTable](RemoteConnection.md#opentable)
- [dropTable](RemoteConnection.md#droptable)
## Constructors
### constructor
**new RemoteConnection**(`client`, `dbName`)
#### Parameters
| Name | Type |
| :------ | :------ |
| `client` | `HttpLancedbClient` |
| `dbName` | `string` |
#### Defined in
[remote/index.ts:37](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L37)
## Methods
### createTable
**createTable**(`name`, `data`, `mode?`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
Creates a new Table and initialize it with new data.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the Table |
| `mode?` | [`WriteMode`](../enums/WriteMode.md) | The write mode to use when creating the table. |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
[Connection](../interfaces/Connection.md).[createTable](../interfaces/Connection.md#createtable)
#### Defined in
[remote/index.ts:75](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L75)
**createTable**(`name`, `data`, `mode`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `data` | `Record`<`string`, `unknown`\>[] |
| `mode` | [`WriteMode`](../enums/WriteMode.md) |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
Connection.createTable
#### Defined in
[remote/index.ts:231](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L231)
___
### dropTable
**dropTable**(`name`): `Promise`<`void`\>
Drop an existing table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table to drop. |
#### Returns
`Promise`<`void`\>
#### Implementation of
[Connection](../interfaces/Connection.md).[dropTable](../interfaces/Connection.md#droptable)
#### Defined in
[remote/index.ts:131](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L131)
___
### openTable
**openTable**(`name`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
Open a table in the database.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
[Connection](../interfaces/Connection.md).[openTable](../interfaces/Connection.md#opentable)
#### Defined in
[remote/index.ts:65](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L65)
**openTable**<`T`\>(`name`, `embeddings`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
Open a table in the database.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.openTable
#### Defined in
[remote/index.ts:66](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L66)
**openTable**<`T`\>(`name`, `embeddings?`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `embeddings?` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.openTable
#### Defined in
[remote/index.ts:67](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L67)
___
### tableNames
**tableNames**(): `Promise`<`string`[]\>
Get the names of all tables in the database, with pagination.
#### Parameters
| Name | Type |
| :------ | :------ |
| `pageToken` | `string` |
| `limit` | `int` |
#### Returns
`Promise`<`string`[]\>
#### Implementation of
[Connection](../interfaces/Connection.md).[tableNames](../interfaces/Connection.md#tablenames)
#### Defined in
[remote/index.ts:60](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L60)

View File

@@ -0,0 +1,76 @@
[vectordb](../README.md) / [Exports](../saas-modules.md) / RemoteQuery
# Class: Query<T\>
A builder for nearest neighbor queries for LanceDB.
## Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
## Table of contents
### Constructors
- [constructor](RemoteQuery.md#constructor)
### Properties
- [\_embeddings](RemoteQuery.md#_embeddings)
- [\_query](RemoteQuery.md#_query)
- [\_name](RemoteQuery.md#_name)
- [\_client](RemoteQuery.md#_client)
### Methods
- [execute](RemoteQuery.md#execute)
## Constructors
### constructor
**new Query**<`T`\>(`name`, `client`, `query`, `embeddings?`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `client` | `HttpLancedbClient` |
| `query` | `T` |
| `embeddings?` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> |
#### Defined in
[remote/index.ts:137](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L137)
## Methods
### execute
**execute**<`T`\>(): `Promise`<`T`[]\>
Execute the query and return the results as an Array of Objects
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `Record`<`string`, `unknown`\> |
#### Returns
`Promise`<`T`[]\>
#### Defined in
[remote/index.ts:143](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L143)

View File

@@ -0,0 +1,355 @@
[vectordb](../README.md) / [Exports](../saas-modules.md) / RemoteTable
# Class: RemoteTable<T\>
A LanceDB Table is the collection of Records. Each Record has one or more vector fields.
## Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
## Implements
- [`Table`](../interfaces/Table.md)<`T`\>
## Table of contents
### Constructors
- [constructor](RemoteTable.md#constructor)
### Properties
- [\_name](RemoteTable.md#_name)
- [\_client](RemoteTable.md#_client)
- [\_embeddings](RemoteTable.md#_embeddings)
### Accessors
- [name](RemoteTable.md#name)
### Methods
- [add](RemoteTable.md#add)
- [countRows](RemoteTable.md#countrows)
- [createIndex](RemoteTable.md#createindex)
- [delete](RemoteTable.md#delete)
- [listIndices](classes/RemoteTable.md#listindices)
- [indexStats](classes/RemoteTable.md#liststats)
- [overwrite](RemoteTable.md#overwrite)
- [search](RemoteTable.md#search)
- [schema](classes/RemoteTable.md#schema)
- [update](RemoteTable.md#update)
## Constructors
### constructor
**new RemoteTable**<`T`\>(`client`, `name`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type |
| :------ | :------ |
| `client` | `HttpLancedbClient` |
| `name` | `string` |
#### Defined in
[remote/index.ts:186](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L186)
**new RemoteTable**<`T`\>(`client`, `name`, `embeddings`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `client` | `HttpLancedbClient` | |
| `name` | `string` | |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use when interacting with this table |
#### Defined in
[remote/index.ts:187](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L187)
## Accessors
### name
`get` **name**(): `string`
#### Returns
`string`
#### Implementation of
[Table](../interfaces/Table.md).[name](../interfaces/Table.md#name)
#### Defined in
[remote/index.ts:194](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L194)
## Methods
### add
**add**(`data`): `Promise`<`number`\>
Insert records into this Table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Implementation of
[Table](../interfaces/Table.md).[add](../interfaces/Table.md#add)
#### Defined in
[remote/index.ts:293](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L293)
___
### countRows
**countRows**(): `Promise`<`number`\>
Returns the number of rows in this table.
#### Returns
`Promise`<`number`\>
#### Implementation of
[Table](../interfaces/Table.md).[countRows](../interfaces/Table.md#countrows)
#### Defined in
[remote/index.ts:290](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L290)
___
### createIndex
**createIndex**(`metric_type`, `column`, `index_cache_size`): `Promise`<`any`\>
Create an ANN index on this Table vector index.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `metric_type` | `string` | distance metric type, L2 or cosine or dot |
| `column` | `string` | the name of the column to be indexed |
#### Returns
`Promise`<`any`\>
#### Implementation of
[Table](../interfaces/Table.md).[createIndex](../interfaces/Table.md#createindex)
#### Defined in
[remote/index.ts:249](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L249)
___
### delete
**delete**(`filter`): `Promise`<`void`\>
Delete rows from this table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `filter` | `string` | A filter in the same format used by a sql WHERE clause. |
#### Returns
`Promise`<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[delete](../interfaces/Table.md#delete)
#### Defined in
[remote/index.ts:295](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L295)
___
### overwrite
**overwrite**(`data`): `Promise`<`number`\>
Insert records into this Table, replacing its contents.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Implementation of
[Table](../interfaces/Table.md).[overwrite](../interfaces/Table.md#overwrite)
#### Defined in
[remote/index.ts:231](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L231)
___
### search
**search**(`query`): [`Query`](Query.md)<`T`\>
Creates a search query to find the nearest neighbors of the given search term
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `query` | `T` | The query search term |
#### Returns
[`Query`](Query.md)<`T`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#search)
#### Defined in
[remote/index.ts:209](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L209)
___
### update
**update**(`args`): `Promise`<`void`\>
Update zero to all rows depending on how many rows match the where clause.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `args` | `UpdateArgs` or `UpdateSqlArgs` | The query search arguments |
#### Returns
`Promise`<`any`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#update)
#### Defined in
[remote/index.ts:299](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L299)
___
### schema
**schema**(): `Promise`<`void`\>
Get the schema of the table
#### Returns
`Promise`<`any`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#schema)
#### Defined in
[remote/index.ts:198](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L198)
___
### listIndices
**listIndices**(): `Promise`<`void`\>
List the indices of the table
#### Returns
`Promise`<`any`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#listIndices)
#### Defined in
[remote/index.ts:319](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L319)
___
### indexStats
**indexStats**(`indexUuid`): `Promise`<`void`\>
Get the indexed/unindexed of rows from the table
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `indexUuid` | `string` | the uuid of the index |
#### Returns
`Promise`<`numIndexedRows`\>
`Promise`<`numUnindexedRows`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#indexStats)
#### Defined in
[remote/index.ts:328](https://github.com/lancedb/lancedb/blob/main/node/src/remote/index.ts#L328)

View File

@@ -0,0 +1,92 @@
# Table of contents
## Installation
```bash
npm install vectordb
```
This will download the appropriate native library for your platform. We currently
support x86_64 Linux, aarch64 Linux, Intel MacOS, and ARM (M1/M2) MacOS. We do not
yet support Windows or musl-based Linux (such as Alpine Linux).
## Classes
- [RemoteConnection](classes/RemoteConnection.md)
- [RemoteTable](classes/RemoteTable.md)
- [RemoteQuery](classes/RemoteQuery.md)
## Methods
- [add](classes/RemoteTable.md#add)
- [countRows](classes/RemoteTable.md#countrows)
- [createIndex](classes/RemoteTable.md#createindex)
- [createTable](classes/RemoteConnection.md#createtable)
- [delete](classes/RemoteTable.md#delete)
- [dropTable](classes/RemoteConnection.md#droptable)
- [listIndices](classes/RemoteTable.md#listindices)
- [indexStats](classes/RemoteTable.md#liststats)
- [openTable](classes/RemoteConnection.md#opentable)
- [overwrite](classes/RemoteTable.md#overwrite)
- [schema](classes/RemoteTable.md#schema)
- [search](classes/RemoteTable.md#search)
- [tableNames](classes/RemoteConnection.md#tablenames)
- [update](classes/RemoteTable.md#update)
## Example code
```javascript
const lancedb = require('vectordb');
const { Schema, Field, Int32, Float32, Utf8, FixedSizeList } = require ("apache-arrow/Arrow.node")
// connect to a remote DB
const devApiKey = process.env.LANCEDB_DEV_API_KEY
const dbURI = process.env.LANCEDB_URI
const db = await lancedb.connect({
uri: dbURI, // replace dbURI with your project, e.g. "db://your-project-name"
apiKey: devApiKey, // replace dbURI with your api key
region: "us-east-1-dev"
});
// create a new table
const tableName = "my_table_000"
const data = [
{ id: 1, vector: [0.1, 1.0], item: "foo", price: 10.0 },
{ id: 2, vector: [3.9, 0.5], item: "bar", price: 20.0 }
]
const schema = new Schema(
[
new Field('id', new Int32()),
new Field('vector', new FixedSizeList(2, new Field('float32', new Float32()))),
new Field('item', new Utf8()),
new Field('price', new Float32())
]
)
const table = await db.createTable({
name: tableName,
schema,
}, data)
// list the table
const tableNames_1 = await db.tableNames('')
// add some data and search should be okay
const newData = [
{ id: 3, vector: [10.3, 1.9], item: "test1", price: 30.0 },
{ id: 4, vector: [6.2, 9.2], item: "test2", price: 40.0 }
]
await table.add(newData)
// create the index for the table
await table.createIndex({
metric_type: "L2",
column: "vector"
})
let result = await table.search([2.8, 4.3]).select(["vector", "price"]).limit(1).execute()
// update the data
await table.update({
where: "id == 1",
values: { item: "foo1" }
})
//drop the table
await db.dropTable(tableName)
```

View File

@@ -44,15 +44,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import openai\n", "from openai import OpenAI\n",
"import os\n", "import os\n",
"\n", "\n",
"# Configuring the environment variable OPENAI_API_KEY\n", "# Configuring the environment variable OPENAI_API_KEY\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n", "if \"OPENAI_API_KEY\" not in os.environ:\n",
" # OR set the key here as a variable\n", " os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
" openai.api_key = \"sk-...\"\n", "client = OpenAI()\n",
" \n", "assert len(client.models.list().data) > 0"
"assert len(openai.Model.list()[\"data\"]) > 0"
] ]
}, },
{ {

View File

@@ -27,11 +27,11 @@
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"\n", "\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.1\u001B[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"\n", "\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.1\u001B[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n" "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
] ]
} }
], ],
@@ -206,15 +206,16 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import openai\n", "from openai import OpenAI\n",
"import os\n", "import os\n",
"\n", "\n",
"# Configuring the environment variable OPENAI_API_KEY\n", "# Configuring the environment variable OPENAI_API_KEY\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n", "if \"OPENAI_API_KEY\" not in os.environ:\n",
" # OR set the key here as a variable\n", " # OR set the key here as a variable\n",
" openai.api_key = \"sk-...\"\n", " os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
" \n", " \n",
"assert len(openai.Model.list()[\"data\"]) > 0" "client = OpenAI()\n",
"assert len(client.models.list().data) > 0"
] ]
}, },
{ {
@@ -234,8 +235,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"def embed_func(c): \n", "def embed_func(c): \n",
" rs = openai.Embedding.create(input=c, engine=\"text-embedding-ada-002\")\n", " rs = client.embeddings.create(input=c, model=\"text-embedding-ada-002\")\n",
" return [record[\"embedding\"] for record in rs[\"data\"]]" " return [rs.data[0].embedding]"
] ]
}, },
{ {
@@ -536,9 +537,8 @@
], ],
"source": [ "source": [
"def complete(prompt):\n", "def complete(prompt):\n",
" # query text-davinci-003\n", " res = client.completions.create(\n",
" res = openai.Completion.create(\n", " model='text-davinci-003',\n",
" engine='text-davinci-003',\n",
" prompt=prompt,\n", " prompt=prompt,\n",
" temperature=0,\n", " temperature=0,\n",
" max_tokens=400,\n", " max_tokens=400,\n",
@@ -547,7 +547,7 @@
" presence_penalty=0,\n", " presence_penalty=0,\n",
" stop=None\n", " stop=None\n",
" )\n", " )\n",
" return res['choices'][0]['text'].strip()\n", " return res.choices[0].text\n",
"\n", "\n",
"# check that it works\n", "# check that it works\n",
"query = \"who was the 12th person on the moon and when did they land?\"\n", "query = \"who was the 12th person on the moon and when did they land?\"\n",

View File

@@ -7,7 +7,7 @@ LanceDB integrates with Pydantic for schema inference, data ingestion, and query
LanceDB supports to create Apache Arrow Schema from a LanceDB supports to create Apache Arrow Schema from a
[Pydantic BaseModel](https://docs.pydantic.dev/latest/api/main/#pydantic.main.BaseModel) [Pydantic BaseModel](https://docs.pydantic.dev/latest/api/main/#pydantic.main.BaseModel)
via [pydantic_to_schema()](python.md##lancedb.pydantic.pydantic_to_schema) method. via [pydantic_to_schema()](python.md#lancedb.pydantic.pydantic_to_schema) method.
::: lancedb.pydantic.pydantic_to_schema ::: lancedb.pydantic.pydantic_to_schema

594
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{ {
"name": "vectordb", "name": "vectordb",
"version": "0.4.0", "version": "0.4.3",
"lockfileVersion": 2, "lockfileVersion": 2,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "vectordb", "name": "vectordb",
"version": "0.4.0", "version": "0.4.3",
"cpu": [ "cpu": [
"x64", "x64",
"arm64" "arm64"
@@ -18,9 +18,9 @@
"win32" "win32"
], ],
"dependencies": { "dependencies": {
"@apache-arrow/ts": "^12.0.0", "@apache-arrow/ts": "^14.0.2",
"@neon-rs/load": "^0.0.74", "@neon-rs/load": "^0.0.74",
"apache-arrow": "^12.0.0", "apache-arrow": "^14.0.2",
"axios": "^1.4.0" "axios": "^1.4.0"
}, },
"devDependencies": { "devDependencies": {
@@ -53,39 +53,59 @@
"uuid": "^9.0.0" "uuid": "^9.0.0"
}, },
"optionalDependencies": { "optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.4.0", "@lancedb/vectordb-darwin-arm64": "0.4.3",
"@lancedb/vectordb-darwin-x64": "0.4.0", "@lancedb/vectordb-darwin-x64": "0.4.3",
"@lancedb/vectordb-linux-arm64-gnu": "0.4.0", "@lancedb/vectordb-linux-arm64-gnu": "0.4.3",
"@lancedb/vectordb-linux-x64-gnu": "0.4.0", "@lancedb/vectordb-linux-x64-gnu": "0.4.3",
"@lancedb/vectordb-win32-x64-msvc": "0.4.0" "@lancedb/vectordb-win32-x64-msvc": "0.4.3"
}
},
"node_modules/@75lb/deep-merge": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/@75lb/deep-merge/-/deep-merge-1.1.1.tgz",
"integrity": "sha512-xvgv6pkMGBA6GwdyJbNAnDmfAIR/DfWhrj9jgWh3TY7gRm3KO46x/GPjRg6wJ0nOepwqrNxFfojebh0Df4h4Tw==",
"dependencies": {
"lodash.assignwith": "^4.2.0",
"typical": "^7.1.1"
},
"engines": {
"node": ">=12.17"
}
},
"node_modules/@75lb/deep-merge/node_modules/typical": {
"version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA==",
"engines": {
"node": ">=12.17"
} }
}, },
"node_modules/@apache-arrow/ts": { "node_modules/@apache-arrow/ts": {
"version": "12.0.0", "version": "14.0.2",
"resolved": "https://registry.npmjs.org/@apache-arrow/ts/-/ts-12.0.0.tgz", "resolved": "https://registry.npmjs.org/@apache-arrow/ts/-/ts-14.0.2.tgz",
"integrity": "sha512-ArJ3Fw5W9RAeNWuyCU2CdjL/nEAZSVDG1p3jz/ZtLo/q3NTz2w7HUCOJeszejH/5alGX+QirYrJ5c6BW++/P7g==", "integrity": "sha512-CtwAvLkK0CZv7xsYeCo91ml6PvlfzAmAJZkRYuz2GNBwfYufj5SVi0iuSMwIMkcU/szVwvLdzORSLa5PlF/2ug==",
"dependencies": { "dependencies": {
"@types/command-line-args": "5.2.0", "@types/command-line-args": "5.2.0",
"@types/command-line-usage": "5.0.2", "@types/command-line-usage": "5.0.2",
"@types/node": "18.14.5", "@types/node": "20.3.0",
"@types/pad-left": "2.1.1", "@types/pad-left": "2.1.1",
"command-line-args": "5.2.1", "command-line-args": "5.2.1",
"command-line-usage": "6.1.3", "command-line-usage": "7.0.1",
"flatbuffers": "23.3.3", "flatbuffers": "23.5.26",
"json-bignum": "^0.0.3", "json-bignum": "^0.0.3",
"pad-left": "^2.1.0", "pad-left": "^2.1.0",
"tslib": "^2.5.0" "tslib": "^2.5.3"
} }
}, },
"node_modules/@apache-arrow/ts/node_modules/@types/node": { "node_modules/@apache-arrow/ts/node_modules/@types/node": {
"version": "18.14.5", "version": "20.3.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.14.5.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.3.0.tgz",
"integrity": "sha512-CRT4tMK/DHYhw1fcCEBwME9CSaZNclxfzVMe7GsO6ULSwsttbj70wSiX6rZdIjGblu93sTJxLdhNIT85KKI7Qw==" "integrity": "sha512-cumHmIAf6On83X7yP+LrsEyUOf/YlociZelmpRYaGFydoaPdxdt80MAbu6vWerQT2COCp2nPvHdsbD7tHn/YlQ=="
}, },
"node_modules/@apache-arrow/ts/node_modules/tslib": { "node_modules/@apache-arrow/ts/node_modules/tslib": {
"version": "2.5.0", "version": "2.6.2",
"resolved": "https://registry.npmjs.org/tslib/-/tslib-2.5.0.tgz", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz",
"integrity": "sha512-336iVw3rtn2BUK7ORdIAHTyxHGRIHVReokCR3XjbckJMK7ms8FysBfhLR8IXnAgy7T0PTPNBWKiH514FOW/WSg==" "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q=="
}, },
"node_modules/@cargo-messages/android-arm-eabi": { "node_modules/@cargo-messages/android-arm-eabi": {
"version": "0.0.160", "version": "0.0.160",
@@ -317,9 +337,9 @@
} }
}, },
"node_modules/@lancedb/vectordb-darwin-arm64": { "node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.4.3.tgz",
"integrity": "sha512-cP6zGtBWXEcJHCI4uLNIP5ILtRvexvwmL8Uri1dnHG8dT8g12Ykug3BHO6Wt6wp/xASd2jJRIF/VAJsN9IeP1A==", "integrity": "sha512-47CvvSaV1EdUsFEpXUJApTk+hMzAhCxVizipCFUlXCgcmzpCDL86wNgJij/X9a+j6zADhIX//Lsu0qd/an/Bpw==",
"cpu": [ "cpu": [
"arm64" "arm64"
], ],
@@ -329,9 +349,9 @@
] ]
}, },
"node_modules/@lancedb/vectordb-darwin-x64": { "node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.4.3.tgz",
"integrity": "sha512-ig0gV5ol1sFe2lb1HOatK0rizyj9I91WbnH79i7OdUl3nAQIcWm70CnxrPLtx0DS2NTGh2kFJbYCWcaUlu6YfA==", "integrity": "sha512-UlZZv8CmJIuRJNJG+Y1VmFsGyPR8W/72Q5EwgMMsSES6zpMQ9pNdBDWhL3UGX6nMRgnbprkwYiWJ3xHhJvtqtw==",
"cpu": [ "cpu": [
"x64" "x64"
], ],
@@ -341,9 +361,9 @@
] ]
}, },
"node_modules/@lancedb/vectordb-linux-arm64-gnu": { "node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.4.3.tgz",
"integrity": "sha512-gMXIDT2kriAPDwWIRKXdaTCNdOeFGEok1S9Y30AOruHXddW1vCIo4JNJIYbBqHnwAeI4wI3ae6GRCFaf1UxO3g==", "integrity": "sha512-L6NVJr/lKEd8+904FzZNpT8BGQMs2cHNYbGJMIaVvGnMiIJgKAFKtOyGtdDjoe1xRZoEw21yjRGksGbnRO5wHQ==",
"cpu": [ "cpu": [
"arm64" "arm64"
], ],
@@ -353,9 +373,9 @@
] ]
}, },
"node_modules/@lancedb/vectordb-linux-x64-gnu": { "node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.4.3.tgz",
"integrity": "sha512-ZQ3lDrDSz1IKdx/mS9Lz08agFO+OD5oSFrrcFNCoT1+H93eS1mCLdmCoEARu3jKbx0tMs38l5J9yXZ2QmJye3w==", "integrity": "sha512-OBx3WF3pK0xNfFJeErmuD9R2QWLa3XdeZspyTsIrQmBDeKj3HKh8y7Scpx4NH5Y09+9JNqRRKRZN7OqWTYhITg==",
"cpu": [ "cpu": [
"x64" "x64"
], ],
@@ -365,9 +385,9 @@
] ]
}, },
"node_modules/@lancedb/vectordb-win32-x64-msvc": { "node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.4.3.tgz",
"integrity": "sha512-toNcNwBRE1sdsSf5hr7W8QiqZ33csc/knVEek4CyvYkZHJGh4Z6WI+DJUIASo5wzUez4TX7qUPpRPL9HuaPMCg==", "integrity": "sha512-n9IvR81NXZKnSN91mrgeXbEyCiGM+YLJpOgbdHoEtMP04VDnS+iSU4jGOtQBKErvWeCJQaGFQ9qzdcVchpRGyw==",
"cpu": [ "cpu": [
"x64" "x64"
], ],
@@ -866,7 +886,6 @@
"version": "4.3.0", "version": "4.3.0",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz", "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
"integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==", "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
"dev": true,
"dependencies": { "dependencies": {
"color-convert": "^2.0.1" "color-convert": "^2.0.1"
}, },
@@ -891,34 +910,34 @@
} }
}, },
"node_modules/apache-arrow": { "node_modules/apache-arrow": {
"version": "12.0.0", "version": "14.0.2",
"resolved": "https://registry.npmjs.org/apache-arrow/-/apache-arrow-12.0.0.tgz", "resolved": "https://registry.npmjs.org/apache-arrow/-/apache-arrow-14.0.2.tgz",
"integrity": "sha512-uI+hnZZsGfNJiR/wG8j5yPQuDjmOHx4hZpkA743G4x3TlFrCpA3MMX7KUkIOIw0e/CwZ8NYuaMzaQsblA47qVA==", "integrity": "sha512-EBO2xJN36/XoY81nhLcwCJgFwkboDZeyNQ+OPsG7bCoQjc2BT0aTyH/MR6SrL+LirSNz+cYqjGRlupMMlP1aEg==",
"dependencies": { "dependencies": {
"@types/command-line-args": "5.2.0", "@types/command-line-args": "5.2.0",
"@types/command-line-usage": "5.0.2", "@types/command-line-usage": "5.0.2",
"@types/node": "18.14.5", "@types/node": "20.3.0",
"@types/pad-left": "2.1.1", "@types/pad-left": "2.1.1",
"command-line-args": "5.2.1", "command-line-args": "5.2.1",
"command-line-usage": "6.1.3", "command-line-usage": "7.0.1",
"flatbuffers": "23.3.3", "flatbuffers": "23.5.26",
"json-bignum": "^0.0.3", "json-bignum": "^0.0.3",
"pad-left": "^2.1.0", "pad-left": "^2.1.0",
"tslib": "^2.5.0" "tslib": "^2.5.3"
}, },
"bin": { "bin": {
"arrow2csv": "bin/arrow2csv.js" "arrow2csv": "bin/arrow2csv.js"
} }
}, },
"node_modules/apache-arrow/node_modules/@types/node": { "node_modules/apache-arrow/node_modules/@types/node": {
"version": "18.14.5", "version": "20.3.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.14.5.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.3.0.tgz",
"integrity": "sha512-CRT4tMK/DHYhw1fcCEBwME9CSaZNclxfzVMe7GsO6ULSwsttbj70wSiX6rZdIjGblu93sTJxLdhNIT85KKI7Qw==" "integrity": "sha512-cumHmIAf6On83X7yP+LrsEyUOf/YlociZelmpRYaGFydoaPdxdt80MAbu6vWerQT2COCp2nPvHdsbD7tHn/YlQ=="
}, },
"node_modules/apache-arrow/node_modules/tslib": { "node_modules/apache-arrow/node_modules/tslib": {
"version": "2.5.0", "version": "2.6.2",
"resolved": "https://registry.npmjs.org/tslib/-/tslib-2.5.0.tgz", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz",
"integrity": "sha512-336iVw3rtn2BUK7ORdIAHTyxHGRIHVReokCR3XjbckJMK7ms8FysBfhLR8IXnAgy7T0PTPNBWKiH514FOW/WSg==" "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q=="
}, },
"node_modules/arg": { "node_modules/arg": {
"version": "4.1.3", "version": "4.1.3",
@@ -1170,7 +1189,6 @@
"version": "4.1.2", "version": "4.1.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz", "resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz",
"integrity": "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==", "integrity": "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==",
"dev": true,
"dependencies": { "dependencies": {
"ansi-styles": "^4.1.0", "ansi-styles": "^4.1.0",
"supports-color": "^7.1.0" "supports-color": "^7.1.0"
@@ -1182,11 +1200,24 @@
"url": "https://github.com/chalk/chalk?sponsor=1" "url": "https://github.com/chalk/chalk?sponsor=1"
} }
}, },
"node_modules/chalk-template": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/chalk-template/-/chalk-template-0.4.0.tgz",
"integrity": "sha512-/ghrgmhfY8RaSdeo43hNXxpoHAtxdbskUHjPpfqUWGttFgycUhYPGx3YZBCnUCvOa7Doivn1IZec3DEGFoMgLg==",
"dependencies": {
"chalk": "^4.1.2"
},
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/chalk/chalk-template?sponsor=1"
}
},
"node_modules/chalk/node_modules/supports-color": { "node_modules/chalk/node_modules/supports-color": {
"version": "7.2.0", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz", "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz",
"integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==", "integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==",
"dev": true,
"dependencies": { "dependencies": {
"has-flag": "^4.0.0" "has-flag": "^4.0.0"
}, },
@@ -1245,7 +1276,6 @@
"version": "2.0.1", "version": "2.0.1",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz", "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz",
"integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==", "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==",
"dev": true,
"dependencies": { "dependencies": {
"color-name": "~1.1.4" "color-name": "~1.1.4"
}, },
@@ -1256,8 +1286,7 @@
"node_modules/color-name": { "node_modules/color-name": {
"version": "1.1.4", "version": "1.1.4",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz", "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz",
"integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==", "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA=="
"dev": true
}, },
"node_modules/combined-stream": { "node_modules/combined-stream": {
"version": "1.0.8", "version": "1.0.8",
@@ -1285,97 +1314,33 @@
} }
}, },
"node_modules/command-line-usage": { "node_modules/command-line-usage": {
"version": "6.1.3", "version": "7.0.1",
"resolved": "https://registry.npmjs.org/command-line-usage/-/command-line-usage-6.1.3.tgz", "resolved": "https://registry.npmjs.org/command-line-usage/-/command-line-usage-7.0.1.tgz",
"integrity": "sha512-sH5ZSPr+7UStsloltmDh7Ce5fb8XPlHyoPzTpyyMuYCtervL65+ubVZ6Q61cFtFl62UyJlc8/JwERRbAFPUqgw==", "integrity": "sha512-NCyznE//MuTjwi3y84QVUGEOT+P5oto1e1Pk/jFPVdPPfsG03qpTIl3yw6etR+v73d0lXsoojRpvbru2sqePxQ==",
"dependencies": { "dependencies": {
"array-back": "^4.0.2", "array-back": "^6.2.2",
"chalk": "^2.4.2", "chalk-template": "^0.4.0",
"table-layout": "^1.0.2", "table-layout": "^3.0.0",
"typical": "^5.2.0" "typical": "^7.1.1"
}, },
"engines": { "engines": {
"node": ">=8.0.0" "node": ">=12.20.0"
}
},
"node_modules/command-line-usage/node_modules/ansi-styles": {
"version": "3.2.1",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz",
"integrity": "sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
"dependencies": {
"color-convert": "^1.9.0"
},
"engines": {
"node": ">=4"
} }
}, },
"node_modules/command-line-usage/node_modules/array-back": { "node_modules/command-line-usage/node_modules/array-back": {
"version": "4.0.2", "version": "6.2.2",
"resolved": "https://registry.npmjs.org/array-back/-/array-back-4.0.2.tgz", "resolved": "https://registry.npmjs.org/array-back/-/array-back-6.2.2.tgz",
"integrity": "sha512-NbdMezxqf94cnNfWLL7V/im0Ub+Anbb0IoZhvzie8+4HJ4nMQuzHuy49FkGYCJK2yAloZ3meiB6AVMClbrI1vg==", "integrity": "sha512-gUAZ7HPyb4SJczXAMUXMGAvI976JoK3qEx9v1FTmeYuJj0IBiaKttG1ydtGKdkfqWkIkouke7nG8ufGy77+Cvw==",
"engines": { "engines": {
"node": ">=8" "node": ">=12.17"
}
},
"node_modules/command-line-usage/node_modules/chalk": {
"version": "2.4.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz",
"integrity": "sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
"dependencies": {
"ansi-styles": "^3.2.1",
"escape-string-regexp": "^1.0.5",
"supports-color": "^5.3.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/command-line-usage/node_modules/color-convert": {
"version": "1.9.3",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz",
"integrity": "sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==",
"dependencies": {
"color-name": "1.1.3"
}
},
"node_modules/command-line-usage/node_modules/color-name": {
"version": "1.1.3",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.3.tgz",
"integrity": "sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw=="
},
"node_modules/command-line-usage/node_modules/escape-string-regexp": {
"version": "1.0.5",
"resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-1.0.5.tgz",
"integrity": "sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==",
"engines": {
"node": ">=0.8.0"
}
},
"node_modules/command-line-usage/node_modules/has-flag": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz",
"integrity": "sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==",
"engines": {
"node": ">=4"
}
},
"node_modules/command-line-usage/node_modules/supports-color": {
"version": "5.5.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-5.5.0.tgz",
"integrity": "sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==",
"dependencies": {
"has-flag": "^3.0.0"
},
"engines": {
"node": ">=4"
} }
}, },
"node_modules/command-line-usage/node_modules/typical": { "node_modules/command-line-usage/node_modules/typical": {
"version": "5.2.0", "version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz", "resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg==", "integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA==",
"engines": { "engines": {
"node": ">=8" "node": ">=12.17"
} }
}, },
"node_modules/concat-map": { "node_modules/concat-map": {
@@ -1451,14 +1416,6 @@
"node": ">=6" "node": ">=6"
} }
}, },
"node_modules/deep-extend": {
"version": "0.6.0",
"resolved": "https://registry.npmjs.org/deep-extend/-/deep-extend-0.6.0.tgz",
"integrity": "sha512-LOHxIOaPYdHlJRtCQfDIVZtfw/ufM8+rVj649RIHzcm/vGwQRXFt6OPqIFWsm2XEMrNIEtWR64sY1LEKD2vAOA==",
"engines": {
"node": ">=4.0.0"
}
},
"node_modules/deep-is": { "node_modules/deep-is": {
"version": "0.1.4", "version": "0.1.4",
"resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz",
@@ -2237,9 +2194,9 @@
} }
}, },
"node_modules/flatbuffers": { "node_modules/flatbuffers": {
"version": "23.3.3", "version": "23.5.26",
"resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-23.3.3.tgz", "resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-23.5.26.tgz",
"integrity": "sha512-jmreOaAT1t55keaf+Z259Tvh8tR/Srry9K8dgCgvizhKSEr6gLGgaOJI2WFL5fkOpGOGRZwxUrlFn0GCmXUy6g==" "integrity": "sha512-vE+SI9vrJDwi1oETtTIFldC/o9GsVKRM+s6EL0nQgxXlYV1Vc4Tk30hj4xGICftInKQKj1F3up2n8UbIVobISQ=="
}, },
"node_modules/flatted": { "node_modules/flatted": {
"version": "3.2.7", "version": "3.2.7",
@@ -2535,7 +2492,6 @@
"version": "4.0.0", "version": "4.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz", "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz",
"integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==", "integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==",
"dev": true,
"engines": { "engines": {
"node": ">=8" "node": ">=8"
} }
@@ -3048,6 +3004,11 @@
"url": "https://github.com/sponsors/sindresorhus" "url": "https://github.com/sponsors/sindresorhus"
} }
}, },
"node_modules/lodash.assignwith": {
"version": "4.2.0",
"resolved": "https://registry.npmjs.org/lodash.assignwith/-/lodash.assignwith-4.2.0.tgz",
"integrity": "sha512-ZznplvbvtjK2gMvnQ1BR/zqPFZmS6jbK4p+6Up4xcRYA7yMIwxHCfbTcrYxXKzzqLsQ05eJPVznEW3tuwV7k1g=="
},
"node_modules/lodash.camelcase": { "node_modules/lodash.camelcase": {
"version": "4.3.0", "version": "4.3.0",
"resolved": "https://registry.npmjs.org/lodash.camelcase/-/lodash.camelcase-4.3.0.tgz", "resolved": "https://registry.npmjs.org/lodash.camelcase/-/lodash.camelcase-4.3.0.tgz",
@@ -3668,14 +3629,6 @@
"node": ">=8.10.0" "node": ">=8.10.0"
} }
}, },
"node_modules/reduce-flatten": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/reduce-flatten/-/reduce-flatten-2.0.0.tgz",
"integrity": "sha512-EJ4UNY/U1t2P/2k6oqotuX2Cc3T6nxJwsM0N0asT7dhrtH1ltUxDn4NalSYmPE2rCkVpcf/X6R0wDwcFpzhd4w==",
"engines": {
"node": ">=6"
}
},
"node_modules/regexp.prototype.flags": { "node_modules/regexp.prototype.flags": {
"version": "1.5.0", "version": "1.5.0",
"resolved": "https://registry.npmjs.org/regexp.prototype.flags/-/regexp.prototype.flags-1.5.0.tgz", "resolved": "https://registry.npmjs.org/regexp.prototype.flags/-/regexp.prototype.flags-1.5.0.tgz",
@@ -3965,6 +3918,14 @@
"source-map": "^0.6.0" "source-map": "^0.6.0"
} }
}, },
"node_modules/stream-read-all": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/stream-read-all/-/stream-read-all-3.0.1.tgz",
"integrity": "sha512-EWZT9XOceBPlVJRrYcykW8jyRSZYbkb/0ZK36uLEmoWVO5gxBOnntNTseNzfREsqxqdfEGQrD8SXQ3QWbBmq8A==",
"engines": {
"node": ">=10"
}
},
"node_modules/string-width": { "node_modules/string-width": {
"version": "4.2.3", "version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz", "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
@@ -4082,33 +4043,39 @@
} }
}, },
"node_modules/table-layout": { "node_modules/table-layout": {
"version": "1.0.2", "version": "3.0.2",
"resolved": "https://registry.npmjs.org/table-layout/-/table-layout-1.0.2.tgz", "resolved": "https://registry.npmjs.org/table-layout/-/table-layout-3.0.2.tgz",
"integrity": "sha512-qd/R7n5rQTRFi+Zf2sk5XVVd9UQl6ZkduPFC3S7WEGJAmetDTjY3qPN50eSKzwuzEyQKy5TN2TiZdkIjos2L6A==", "integrity": "sha512-rpyNZYRw+/C+dYkcQ3Pr+rLxW4CfHpXjPDnG7lYhdRoUcZTUt+KEsX+94RGp/aVp/MQU35JCITv2T/beY4m+hw==",
"dependencies": { "dependencies": {
"array-back": "^4.0.1", "@75lb/deep-merge": "^1.1.1",
"deep-extend": "~0.6.0", "array-back": "^6.2.2",
"typical": "^5.2.0", "command-line-args": "^5.2.1",
"wordwrapjs": "^4.0.0" "command-line-usage": "^7.0.0",
"stream-read-all": "^3.0.1",
"typical": "^7.1.1",
"wordwrapjs": "^5.1.0"
},
"bin": {
"table-layout": "bin/cli.js"
}, },
"engines": { "engines": {
"node": ">=8.0.0" "node": ">=12.17"
} }
}, },
"node_modules/table-layout/node_modules/array-back": { "node_modules/table-layout/node_modules/array-back": {
"version": "4.0.2", "version": "6.2.2",
"resolved": "https://registry.npmjs.org/array-back/-/array-back-4.0.2.tgz", "resolved": "https://registry.npmjs.org/array-back/-/array-back-6.2.2.tgz",
"integrity": "sha512-NbdMezxqf94cnNfWLL7V/im0Ub+Anbb0IoZhvzie8+4HJ4nMQuzHuy49FkGYCJK2yAloZ3meiB6AVMClbrI1vg==", "integrity": "sha512-gUAZ7HPyb4SJczXAMUXMGAvI976JoK3qEx9v1FTmeYuJj0IBiaKttG1ydtGKdkfqWkIkouke7nG8ufGy77+Cvw==",
"engines": { "engines": {
"node": ">=8" "node": ">=12.17"
} }
}, },
"node_modules/table-layout/node_modules/typical": { "node_modules/table-layout/node_modules/typical": {
"version": "5.2.0", "version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz", "resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg==", "integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA==",
"engines": { "engines": {
"node": ">=8" "node": ">=12.17"
} }
}, },
"node_modules/temp": { "node_modules/temp": {
@@ -4553,23 +4520,11 @@
"dev": true "dev": true
}, },
"node_modules/wordwrapjs": { "node_modules/wordwrapjs": {
"version": "4.0.1", "version": "5.1.0",
"resolved": "https://registry.npmjs.org/wordwrapjs/-/wordwrapjs-4.0.1.tgz", "resolved": "https://registry.npmjs.org/wordwrapjs/-/wordwrapjs-5.1.0.tgz",
"integrity": "sha512-kKlNACbvHrkpIw6oPeYDSmdCTu2hdMHoyXLTcUKala++lx5Y+wjJ/e474Jqv5abnVmwxw08DiTuHmw69lJGksA==", "integrity": "sha512-JNjcULU2e4KJwUNv6CHgI46UvDGitb6dGryHajXTDiLgg1/RiGoPSDw4kZfYnwGtEXf2ZMeIewDQgFGzkCB2Sg==",
"dependencies": {
"reduce-flatten": "^2.0.0",
"typical": "^5.2.0"
},
"engines": { "engines": {
"node": ">=8.0.0" "node": ">=12.17"
}
},
"node_modules/wordwrapjs/node_modules/typical": {
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg==",
"engines": {
"node": ">=8"
} }
}, },
"node_modules/workerpool": { "node_modules/workerpool": {
@@ -4690,32 +4645,48 @@
} }
}, },
"dependencies": { "dependencies": {
"@75lb/deep-merge": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/@75lb/deep-merge/-/deep-merge-1.1.1.tgz",
"integrity": "sha512-xvgv6pkMGBA6GwdyJbNAnDmfAIR/DfWhrj9jgWh3TY7gRm3KO46x/GPjRg6wJ0nOepwqrNxFfojebh0Df4h4Tw==",
"requires": {
"lodash.assignwith": "^4.2.0",
"typical": "^7.1.1"
},
"dependencies": {
"typical": {
"version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA=="
}
}
},
"@apache-arrow/ts": { "@apache-arrow/ts": {
"version": "12.0.0", "version": "14.0.2",
"resolved": "https://registry.npmjs.org/@apache-arrow/ts/-/ts-12.0.0.tgz", "resolved": "https://registry.npmjs.org/@apache-arrow/ts/-/ts-14.0.2.tgz",
"integrity": "sha512-ArJ3Fw5W9RAeNWuyCU2CdjL/nEAZSVDG1p3jz/ZtLo/q3NTz2w7HUCOJeszejH/5alGX+QirYrJ5c6BW++/P7g==", "integrity": "sha512-CtwAvLkK0CZv7xsYeCo91ml6PvlfzAmAJZkRYuz2GNBwfYufj5SVi0iuSMwIMkcU/szVwvLdzORSLa5PlF/2ug==",
"requires": { "requires": {
"@types/command-line-args": "5.2.0", "@types/command-line-args": "5.2.0",
"@types/command-line-usage": "5.0.2", "@types/command-line-usage": "5.0.2",
"@types/node": "18.14.5", "@types/node": "20.3.0",
"@types/pad-left": "2.1.1", "@types/pad-left": "2.1.1",
"command-line-args": "5.2.1", "command-line-args": "5.2.1",
"command-line-usage": "6.1.3", "command-line-usage": "7.0.1",
"flatbuffers": "23.3.3", "flatbuffers": "23.5.26",
"json-bignum": "^0.0.3", "json-bignum": "^0.0.3",
"pad-left": "^2.1.0", "pad-left": "^2.1.0",
"tslib": "^2.5.0" "tslib": "^2.5.3"
}, },
"dependencies": { "dependencies": {
"@types/node": { "@types/node": {
"version": "18.14.5", "version": "20.3.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.14.5.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.3.0.tgz",
"integrity": "sha512-CRT4tMK/DHYhw1fcCEBwME9CSaZNclxfzVMe7GsO6ULSwsttbj70wSiX6rZdIjGblu93sTJxLdhNIT85KKI7Qw==" "integrity": "sha512-cumHmIAf6On83X7yP+LrsEyUOf/YlociZelmpRYaGFydoaPdxdt80MAbu6vWerQT2COCp2nPvHdsbD7tHn/YlQ=="
}, },
"tslib": { "tslib": {
"version": "2.5.0", "version": "2.6.2",
"resolved": "https://registry.npmjs.org/tslib/-/tslib-2.5.0.tgz", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz",
"integrity": "sha512-336iVw3rtn2BUK7ORdIAHTyxHGRIHVReokCR3XjbckJMK7ms8FysBfhLR8IXnAgy7T0PTPNBWKiH514FOW/WSg==" "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q=="
} }
} }
}, },
@@ -4869,33 +4840,33 @@
} }
}, },
"@lancedb/vectordb-darwin-arm64": { "@lancedb/vectordb-darwin-arm64": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.4.3.tgz",
"integrity": "sha512-cP6zGtBWXEcJHCI4uLNIP5ILtRvexvwmL8Uri1dnHG8dT8g12Ykug3BHO6Wt6wp/xASd2jJRIF/VAJsN9IeP1A==", "integrity": "sha512-47CvvSaV1EdUsFEpXUJApTk+hMzAhCxVizipCFUlXCgcmzpCDL86wNgJij/X9a+j6zADhIX//Lsu0qd/an/Bpw==",
"optional": true "optional": true
}, },
"@lancedb/vectordb-darwin-x64": { "@lancedb/vectordb-darwin-x64": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.4.3.tgz",
"integrity": "sha512-ig0gV5ol1sFe2lb1HOatK0rizyj9I91WbnH79i7OdUl3nAQIcWm70CnxrPLtx0DS2NTGh2kFJbYCWcaUlu6YfA==", "integrity": "sha512-UlZZv8CmJIuRJNJG+Y1VmFsGyPR8W/72Q5EwgMMsSES6zpMQ9pNdBDWhL3UGX6nMRgnbprkwYiWJ3xHhJvtqtw==",
"optional": true "optional": true
}, },
"@lancedb/vectordb-linux-arm64-gnu": { "@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.4.3.tgz",
"integrity": "sha512-gMXIDT2kriAPDwWIRKXdaTCNdOeFGEok1S9Y30AOruHXddW1vCIo4JNJIYbBqHnwAeI4wI3ae6GRCFaf1UxO3g==", "integrity": "sha512-L6NVJr/lKEd8+904FzZNpT8BGQMs2cHNYbGJMIaVvGnMiIJgKAFKtOyGtdDjoe1xRZoEw21yjRGksGbnRO5wHQ==",
"optional": true "optional": true
}, },
"@lancedb/vectordb-linux-x64-gnu": { "@lancedb/vectordb-linux-x64-gnu": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.4.3.tgz",
"integrity": "sha512-ZQ3lDrDSz1IKdx/mS9Lz08agFO+OD5oSFrrcFNCoT1+H93eS1mCLdmCoEARu3jKbx0tMs38l5J9yXZ2QmJye3w==", "integrity": "sha512-OBx3WF3pK0xNfFJeErmuD9R2QWLa3XdeZspyTsIrQmBDeKj3HKh8y7Scpx4NH5Y09+9JNqRRKRZN7OqWTYhITg==",
"optional": true "optional": true
}, },
"@lancedb/vectordb-win32-x64-msvc": { "@lancedb/vectordb-win32-x64-msvc": {
"version": "0.4.0", "version": "0.4.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.4.0.tgz", "resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.4.3.tgz",
"integrity": "sha512-toNcNwBRE1sdsSf5hr7W8QiqZ33csc/knVEek4CyvYkZHJGh4Z6WI+DJUIASo5wzUez4TX7qUPpRPL9HuaPMCg==", "integrity": "sha512-n9IvR81NXZKnSN91mrgeXbEyCiGM+YLJpOgbdHoEtMP04VDnS+iSU4jGOtQBKErvWeCJQaGFQ9qzdcVchpRGyw==",
"optional": true "optional": true
}, },
"@neon-rs/cli": { "@neon-rs/cli": {
@@ -5268,7 +5239,6 @@
"version": "4.3.0", "version": "4.3.0",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz", "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
"integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==", "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
"dev": true,
"requires": { "requires": {
"color-convert": "^2.0.1" "color-convert": "^2.0.1"
} }
@@ -5284,31 +5254,31 @@
} }
}, },
"apache-arrow": { "apache-arrow": {
"version": "12.0.0", "version": "14.0.2",
"resolved": "https://registry.npmjs.org/apache-arrow/-/apache-arrow-12.0.0.tgz", "resolved": "https://registry.npmjs.org/apache-arrow/-/apache-arrow-14.0.2.tgz",
"integrity": "sha512-uI+hnZZsGfNJiR/wG8j5yPQuDjmOHx4hZpkA743G4x3TlFrCpA3MMX7KUkIOIw0e/CwZ8NYuaMzaQsblA47qVA==", "integrity": "sha512-EBO2xJN36/XoY81nhLcwCJgFwkboDZeyNQ+OPsG7bCoQjc2BT0aTyH/MR6SrL+LirSNz+cYqjGRlupMMlP1aEg==",
"requires": { "requires": {
"@types/command-line-args": "5.2.0", "@types/command-line-args": "5.2.0",
"@types/command-line-usage": "5.0.2", "@types/command-line-usage": "5.0.2",
"@types/node": "18.14.5", "@types/node": "20.3.0",
"@types/pad-left": "2.1.1", "@types/pad-left": "2.1.1",
"command-line-args": "5.2.1", "command-line-args": "5.2.1",
"command-line-usage": "6.1.3", "command-line-usage": "7.0.1",
"flatbuffers": "23.3.3", "flatbuffers": "23.5.26",
"json-bignum": "^0.0.3", "json-bignum": "^0.0.3",
"pad-left": "^2.1.0", "pad-left": "^2.1.0",
"tslib": "^2.5.0" "tslib": "^2.5.3"
}, },
"dependencies": { "dependencies": {
"@types/node": { "@types/node": {
"version": "18.14.5", "version": "20.3.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.14.5.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.3.0.tgz",
"integrity": "sha512-CRT4tMK/DHYhw1fcCEBwME9CSaZNclxfzVMe7GsO6ULSwsttbj70wSiX6rZdIjGblu93sTJxLdhNIT85KKI7Qw==" "integrity": "sha512-cumHmIAf6On83X7yP+LrsEyUOf/YlociZelmpRYaGFydoaPdxdt80MAbu6vWerQT2COCp2nPvHdsbD7tHn/YlQ=="
}, },
"tslib": { "tslib": {
"version": "2.5.0", "version": "2.6.2",
"resolved": "https://registry.npmjs.org/tslib/-/tslib-2.5.0.tgz", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz",
"integrity": "sha512-336iVw3rtn2BUK7ORdIAHTyxHGRIHVReokCR3XjbckJMK7ms8FysBfhLR8IXnAgy7T0PTPNBWKiH514FOW/WSg==" "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q=="
} }
} }
}, },
@@ -5505,7 +5475,6 @@
"version": "4.1.2", "version": "4.1.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz", "resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz",
"integrity": "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==", "integrity": "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==",
"dev": true,
"requires": { "requires": {
"ansi-styles": "^4.1.0", "ansi-styles": "^4.1.0",
"supports-color": "^7.1.0" "supports-color": "^7.1.0"
@@ -5515,13 +5484,20 @@
"version": "7.2.0", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz", "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz",
"integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==", "integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==",
"dev": true,
"requires": { "requires": {
"has-flag": "^4.0.0" "has-flag": "^4.0.0"
} }
} }
} }
}, },
"chalk-template": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/chalk-template/-/chalk-template-0.4.0.tgz",
"integrity": "sha512-/ghrgmhfY8RaSdeo43hNXxpoHAtxdbskUHjPpfqUWGttFgycUhYPGx3YZBCnUCvOa7Doivn1IZec3DEGFoMgLg==",
"requires": {
"chalk": "^4.1.2"
}
},
"check-error": { "check-error": {
"version": "1.0.2", "version": "1.0.2",
"resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz", "resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz",
@@ -5559,7 +5535,6 @@
"version": "2.0.1", "version": "2.0.1",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz", "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz",
"integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==", "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==",
"dev": true,
"requires": { "requires": {
"color-name": "~1.1.4" "color-name": "~1.1.4"
} }
@@ -5567,8 +5542,7 @@
"color-name": { "color-name": {
"version": "1.1.4", "version": "1.1.4",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz", "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz",
"integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==", "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA=="
"dev": true
}, },
"combined-stream": { "combined-stream": {
"version": "1.0.8", "version": "1.0.8",
@@ -5590,74 +5564,25 @@
} }
}, },
"command-line-usage": { "command-line-usage": {
"version": "6.1.3", "version": "7.0.1",
"resolved": "https://registry.npmjs.org/command-line-usage/-/command-line-usage-6.1.3.tgz", "resolved": "https://registry.npmjs.org/command-line-usage/-/command-line-usage-7.0.1.tgz",
"integrity": "sha512-sH5ZSPr+7UStsloltmDh7Ce5fb8XPlHyoPzTpyyMuYCtervL65+ubVZ6Q61cFtFl62UyJlc8/JwERRbAFPUqgw==", "integrity": "sha512-NCyznE//MuTjwi3y84QVUGEOT+P5oto1e1Pk/jFPVdPPfsG03qpTIl3yw6etR+v73d0lXsoojRpvbru2sqePxQ==",
"requires": { "requires": {
"array-back": "^4.0.2", "array-back": "^6.2.2",
"chalk": "^2.4.2", "chalk-template": "^0.4.0",
"table-layout": "^1.0.2", "table-layout": "^3.0.0",
"typical": "^5.2.0" "typical": "^7.1.1"
}, },
"dependencies": { "dependencies": {
"ansi-styles": {
"version": "3.2.1",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz",
"integrity": "sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
"requires": {
"color-convert": "^1.9.0"
}
},
"array-back": { "array-back": {
"version": "4.0.2", "version": "6.2.2",
"resolved": "https://registry.npmjs.org/array-back/-/array-back-4.0.2.tgz", "resolved": "https://registry.npmjs.org/array-back/-/array-back-6.2.2.tgz",
"integrity": "sha512-NbdMezxqf94cnNfWLL7V/im0Ub+Anbb0IoZhvzie8+4HJ4nMQuzHuy49FkGYCJK2yAloZ3meiB6AVMClbrI1vg==" "integrity": "sha512-gUAZ7HPyb4SJczXAMUXMGAvI976JoK3qEx9v1FTmeYuJj0IBiaKttG1ydtGKdkfqWkIkouke7nG8ufGy77+Cvw=="
},
"chalk": {
"version": "2.4.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz",
"integrity": "sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
"requires": {
"ansi-styles": "^3.2.1",
"escape-string-regexp": "^1.0.5",
"supports-color": "^5.3.0"
}
},
"color-convert": {
"version": "1.9.3",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz",
"integrity": "sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==",
"requires": {
"color-name": "1.1.3"
}
},
"color-name": {
"version": "1.1.3",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.3.tgz",
"integrity": "sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw=="
},
"escape-string-regexp": {
"version": "1.0.5",
"resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-1.0.5.tgz",
"integrity": "sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg=="
},
"has-flag": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz",
"integrity": "sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw=="
},
"supports-color": {
"version": "5.5.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-5.5.0.tgz",
"integrity": "sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==",
"requires": {
"has-flag": "^3.0.0"
}
}, },
"typical": { "typical": {
"version": "5.2.0", "version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz", "resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg==" "integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA=="
} }
} }
}, },
@@ -5716,11 +5641,6 @@
"type-detect": "^4.0.0" "type-detect": "^4.0.0"
} }
}, },
"deep-extend": {
"version": "0.6.0",
"resolved": "https://registry.npmjs.org/deep-extend/-/deep-extend-0.6.0.tgz",
"integrity": "sha512-LOHxIOaPYdHlJRtCQfDIVZtfw/ufM8+rVj649RIHzcm/vGwQRXFt6OPqIFWsm2XEMrNIEtWR64sY1LEKD2vAOA=="
},
"deep-is": { "deep-is": {
"version": "0.1.4", "version": "0.1.4",
"resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz",
@@ -6297,9 +6217,9 @@
} }
}, },
"flatbuffers": { "flatbuffers": {
"version": "23.3.3", "version": "23.5.26",
"resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-23.3.3.tgz", "resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-23.5.26.tgz",
"integrity": "sha512-jmreOaAT1t55keaf+Z259Tvh8tR/Srry9K8dgCgvizhKSEr6gLGgaOJI2WFL5fkOpGOGRZwxUrlFn0GCmXUy6g==" "integrity": "sha512-vE+SI9vrJDwi1oETtTIFldC/o9GsVKRM+s6EL0nQgxXlYV1Vc4Tk30hj4xGICftInKQKj1F3up2n8UbIVobISQ=="
}, },
"flatted": { "flatted": {
"version": "3.2.7", "version": "3.2.7",
@@ -6502,8 +6422,7 @@
"has-flag": { "has-flag": {
"version": "4.0.0", "version": "4.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz", "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz",
"integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==", "integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ=="
"dev": true
}, },
"has-property-descriptors": { "has-property-descriptors": {
"version": "1.0.0", "version": "1.0.0",
@@ -6856,6 +6775,11 @@
"p-locate": "^5.0.0" "p-locate": "^5.0.0"
} }
}, },
"lodash.assignwith": {
"version": "4.2.0",
"resolved": "https://registry.npmjs.org/lodash.assignwith/-/lodash.assignwith-4.2.0.tgz",
"integrity": "sha512-ZznplvbvtjK2gMvnQ1BR/zqPFZmS6jbK4p+6Up4xcRYA7yMIwxHCfbTcrYxXKzzqLsQ05eJPVznEW3tuwV7k1g=="
},
"lodash.camelcase": { "lodash.camelcase": {
"version": "4.3.0", "version": "4.3.0",
"resolved": "https://registry.npmjs.org/lodash.camelcase/-/lodash.camelcase-4.3.0.tgz", "resolved": "https://registry.npmjs.org/lodash.camelcase/-/lodash.camelcase-4.3.0.tgz",
@@ -7323,11 +7247,6 @@
"picomatch": "^2.2.1" "picomatch": "^2.2.1"
} }
}, },
"reduce-flatten": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/reduce-flatten/-/reduce-flatten-2.0.0.tgz",
"integrity": "sha512-EJ4UNY/U1t2P/2k6oqotuX2Cc3T6nxJwsM0N0asT7dhrtH1ltUxDn4NalSYmPE2rCkVpcf/X6R0wDwcFpzhd4w=="
},
"regexp.prototype.flags": { "regexp.prototype.flags": {
"version": "1.5.0", "version": "1.5.0",
"resolved": "https://registry.npmjs.org/regexp.prototype.flags/-/regexp.prototype.flags-1.5.0.tgz", "resolved": "https://registry.npmjs.org/regexp.prototype.flags/-/regexp.prototype.flags-1.5.0.tgz",
@@ -7523,6 +7442,11 @@
"source-map": "^0.6.0" "source-map": "^0.6.0"
} }
}, },
"stream-read-all": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/stream-read-all/-/stream-read-all-3.0.1.tgz",
"integrity": "sha512-EWZT9XOceBPlVJRrYcykW8jyRSZYbkb/0ZK36uLEmoWVO5gxBOnntNTseNzfREsqxqdfEGQrD8SXQ3QWbBmq8A=="
},
"string-width": { "string-width": {
"version": "4.2.3", "version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz", "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
@@ -7604,25 +7528,28 @@
"dev": true "dev": true
}, },
"table-layout": { "table-layout": {
"version": "1.0.2", "version": "3.0.2",
"resolved": "https://registry.npmjs.org/table-layout/-/table-layout-1.0.2.tgz", "resolved": "https://registry.npmjs.org/table-layout/-/table-layout-3.0.2.tgz",
"integrity": "sha512-qd/R7n5rQTRFi+Zf2sk5XVVd9UQl6ZkduPFC3S7WEGJAmetDTjY3qPN50eSKzwuzEyQKy5TN2TiZdkIjos2L6A==", "integrity": "sha512-rpyNZYRw+/C+dYkcQ3Pr+rLxW4CfHpXjPDnG7lYhdRoUcZTUt+KEsX+94RGp/aVp/MQU35JCITv2T/beY4m+hw==",
"requires": { "requires": {
"array-back": "^4.0.1", "@75lb/deep-merge": "^1.1.1",
"deep-extend": "~0.6.0", "array-back": "^6.2.2",
"typical": "^5.2.0", "command-line-args": "^5.2.1",
"wordwrapjs": "^4.0.0" "command-line-usage": "^7.0.0",
"stream-read-all": "^3.0.1",
"typical": "^7.1.1",
"wordwrapjs": "^5.1.0"
}, },
"dependencies": { "dependencies": {
"array-back": { "array-back": {
"version": "4.0.2", "version": "6.2.2",
"resolved": "https://registry.npmjs.org/array-back/-/array-back-4.0.2.tgz", "resolved": "https://registry.npmjs.org/array-back/-/array-back-6.2.2.tgz",
"integrity": "sha512-NbdMezxqf94cnNfWLL7V/im0Ub+Anbb0IoZhvzie8+4HJ4nMQuzHuy49FkGYCJK2yAloZ3meiB6AVMClbrI1vg==" "integrity": "sha512-gUAZ7HPyb4SJczXAMUXMGAvI976JoK3qEx9v1FTmeYuJj0IBiaKttG1ydtGKdkfqWkIkouke7nG8ufGy77+Cvw=="
}, },
"typical": { "typical": {
"version": "5.2.0", "version": "7.1.1",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz", "resolved": "https://registry.npmjs.org/typical/-/typical-7.1.1.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg==" "integrity": "sha512-T+tKVNs6Wu7IWiAce5BgMd7OZfNYUndHwc5MknN+UHOudi7sGZzuHdCadllRuqJ3fPtgFtIH9+lt9qRv6lmpfA=="
} }
} }
}, },
@@ -7940,20 +7867,9 @@
"dev": true "dev": true
}, },
"wordwrapjs": { "wordwrapjs": {
"version": "4.0.1", "version": "5.1.0",
"resolved": "https://registry.npmjs.org/wordwrapjs/-/wordwrapjs-4.0.1.tgz", "resolved": "https://registry.npmjs.org/wordwrapjs/-/wordwrapjs-5.1.0.tgz",
"integrity": "sha512-kKlNACbvHrkpIw6oPeYDSmdCTu2hdMHoyXLTcUKala++lx5Y+wjJ/e474Jqv5abnVmwxw08DiTuHmw69lJGksA==", "integrity": "sha512-JNjcULU2e4KJwUNv6CHgI46UvDGitb6dGryHajXTDiLgg1/RiGoPSDw4kZfYnwGtEXf2ZMeIewDQgFGzkCB2Sg=="
"requires": {
"reduce-flatten": "^2.0.0",
"typical": "^5.2.0"
},
"dependencies": {
"typical": {
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/typical/-/typical-5.2.0.tgz",
"integrity": "sha512-dvdQgNDNJo+8B2uBQoqdb11eUCE1JQXhvjC/CZtgvZseVd5TYMXnq0+vuUemXbd/Se29cTaUuPX3YIc2xgbvIg=="
}
}
}, },
"workerpool": { "workerpool": {
"version": "6.2.1", "version": "6.2.1",

View File

@@ -1,6 +1,6 @@
{ {
"name": "vectordb", "name": "vectordb",
"version": "0.4.1", "version": "0.4.3",
"description": " Serverless, low-latency vector database for AI applications", "description": " Serverless, low-latency vector database for AI applications",
"main": "dist/index.js", "main": "dist/index.js",
"types": "dist/index.d.ts", "types": "dist/index.d.ts",
@@ -57,9 +57,9 @@
"uuid": "^9.0.0" "uuid": "^9.0.0"
}, },
"dependencies": { "dependencies": {
"@apache-arrow/ts": "^12.0.0", "@apache-arrow/ts": "^14.0.2",
"@neon-rs/load": "^0.0.74", "@neon-rs/load": "^0.0.74",
"apache-arrow": "^12.0.0", "apache-arrow": "^14.0.2",
"axios": "^1.4.0" "axios": "^1.4.0"
}, },
"os": [ "os": [
@@ -81,10 +81,10 @@
} }
}, },
"optionalDependencies": { "optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.4.1", "@lancedb/vectordb-darwin-arm64": "0.4.3",
"@lancedb/vectordb-darwin-x64": "0.4.1", "@lancedb/vectordb-darwin-x64": "0.4.3",
"@lancedb/vectordb-linux-arm64-gnu": "0.4.1", "@lancedb/vectordb-linux-arm64-gnu": "0.4.3",
"@lancedb/vectordb-linux-x64-gnu": "0.4.1", "@lancedb/vectordb-linux-x64-gnu": "0.4.3",
"@lancedb/vectordb-win32-x64-msvc": "0.4.1" "@lancedb/vectordb-win32-x64-msvc": "0.4.3"
} }
} }

View File

@@ -17,10 +17,9 @@ import {
Float32, Float32,
makeBuilder, makeBuilder,
RecordBatchFileWriter, RecordBatchFileWriter,
Utf8, Utf8, type Vector,
type Vector,
FixedSizeList, FixedSizeList,
vectorFromArray, type Schema, Table as ArrowTable, RecordBatchStreamWriter vectorFromArray, type Schema, Table as ArrowTable, RecordBatchStreamWriter, List, Float64, RecordBatch, makeData, Struct
} from 'apache-arrow' } from 'apache-arrow'
import { type EmbeddingFunction } from './index' import { type EmbeddingFunction } from './index'
@@ -59,7 +58,26 @@ export async function convertToTable<T> (data: Array<Record<string, unknown>>, e
if (typeof values[0] === 'string') { if (typeof values[0] === 'string') {
// `vectorFromArray` converts strings into dictionary vectors, forcing it back to a string column // `vectorFromArray` converts strings into dictionary vectors, forcing it back to a string column
records[columnsKey] = vectorFromArray(values, new Utf8()) records[columnsKey] = vectorFromArray(values, new Utf8())
} else if (Array.isArray(values[0])) {
const elementType = getElementType(values[0])
let innerType
if (elementType === 'string') {
innerType = new Utf8()
} else if (elementType === 'number') {
innerType = new Float64()
} else { } else {
// TODO: pass in schema if it exists, else keep going to the next element
throw new Error(`Unsupported array element type ${elementType}`)
}
const listBuilder = makeBuilder({
type: new List(new Field('item', innerType, true))
})
for (const value of values) {
listBuilder.append(value)
}
records[columnsKey] = listBuilder.finish().toVector()
} else {
// TODO if this is a struct field then recursively align the subfields
records[columnsKey] = vectorFromArray(values) records[columnsKey] = vectorFromArray(values)
} }
} }
@@ -68,6 +86,14 @@ export async function convertToTable<T> (data: Array<Record<string, unknown>>, e
return new ArrowTable(records) return new ArrowTable(records)
} }
function getElementType (arr: any[]): string {
if (arr.length === 0) {
return 'undefined'
}
return typeof arr[0]
}
// Creates a new Arrow ListBuilder that stores a Vector column // Creates a new Arrow ListBuilder that stores a Vector column
function newVectorBuilder (dim: number): FixedSizeListBuilder<Float32> { function newVectorBuilder (dim: number): FixedSizeListBuilder<Float32> {
return makeBuilder({ return makeBuilder({
@@ -84,21 +110,27 @@ function newVectorType (dim: number): FixedSizeList<Float32> {
} }
// Converts an Array of records into Arrow IPC format // Converts an Array of records into Arrow IPC format
export async function fromRecordsToBuffer<T> (data: Array<Record<string, unknown>>, embeddings?: EmbeddingFunction<T>): Promise<Buffer> { export async function fromRecordsToBuffer<T> (data: Array<Record<string, unknown>>, embeddings?: EmbeddingFunction<T>, schema?: Schema): Promise<Buffer> {
const table = await convertToTable(data, embeddings) let table = await convertToTable(data, embeddings)
if (schema !== undefined) {
table = alignTable(table, schema)
}
const writer = RecordBatchFileWriter.writeAll(table) const writer = RecordBatchFileWriter.writeAll(table)
return Buffer.from(await writer.toUint8Array()) return Buffer.from(await writer.toUint8Array())
} }
// Converts an Array of records into Arrow IPC stream format // Converts an Array of records into Arrow IPC stream format
export async function fromRecordsToStreamBuffer<T> (data: Array<Record<string, unknown>>, embeddings?: EmbeddingFunction<T>): Promise<Buffer> { export async function fromRecordsToStreamBuffer<T> (data: Array<Record<string, unknown>>, embeddings?: EmbeddingFunction<T>, schema?: Schema): Promise<Buffer> {
const table = await convertToTable(data, embeddings) let table = await convertToTable(data, embeddings)
if (schema !== undefined) {
table = alignTable(table, schema)
}
const writer = RecordBatchStreamWriter.writeAll(table) const writer = RecordBatchStreamWriter.writeAll(table)
return Buffer.from(await writer.toUint8Array()) return Buffer.from(await writer.toUint8Array())
} }
// Converts an Arrow Table into Arrow IPC format // Converts an Arrow Table into Arrow IPC format
export async function fromTableToBuffer<T> (table: ArrowTable, embeddings?: EmbeddingFunction<T>): Promise<Buffer> { export async function fromTableToBuffer<T> (table: ArrowTable, embeddings?: EmbeddingFunction<T>, schema?: Schema): Promise<Buffer> {
if (embeddings !== undefined) { if (embeddings !== undefined) {
const source = table.getChild(embeddings.sourceColumn) const source = table.getChild(embeddings.sourceColumn)
@@ -110,12 +142,15 @@ export async function fromTableToBuffer<T> (table: ArrowTable, embeddings?: Embe
const column = vectorFromArray(vectors, newVectorType(vectors[0].length)) const column = vectorFromArray(vectors, newVectorType(vectors[0].length))
table = table.assign(new ArrowTable({ vector: column })) table = table.assign(new ArrowTable({ vector: column }))
} }
if (schema !== undefined) {
table = alignTable(table, schema)
}
const writer = RecordBatchFileWriter.writeAll(table) const writer = RecordBatchFileWriter.writeAll(table)
return Buffer.from(await writer.toUint8Array()) return Buffer.from(await writer.toUint8Array())
} }
// Converts an Arrow Table into Arrow IPC stream format // Converts an Arrow Table into Arrow IPC stream format
export async function fromTableToStreamBuffer<T> (table: ArrowTable, embeddings?: EmbeddingFunction<T>): Promise<Buffer> { export async function fromTableToStreamBuffer<T> (table: ArrowTable, embeddings?: EmbeddingFunction<T>, schema?: Schema): Promise<Buffer> {
if (embeddings !== undefined) { if (embeddings !== undefined) {
const source = table.getChild(embeddings.sourceColumn) const source = table.getChild(embeddings.sourceColumn)
@@ -127,10 +162,36 @@ export async function fromTableToStreamBuffer<T> (table: ArrowTable, embeddings?
const column = vectorFromArray(vectors, newVectorType(vectors[0].length)) const column = vectorFromArray(vectors, newVectorType(vectors[0].length))
table = table.assign(new ArrowTable({ vector: column })) table = table.assign(new ArrowTable({ vector: column }))
} }
if (schema !== undefined) {
table = alignTable(table, schema)
}
const writer = RecordBatchStreamWriter.writeAll(table) const writer = RecordBatchStreamWriter.writeAll(table)
return Buffer.from(await writer.toUint8Array()) return Buffer.from(await writer.toUint8Array())
} }
function alignBatch (batch: RecordBatch, schema: Schema): RecordBatch {
const alignedChildren = []
for (const field of schema.fields) {
const indexInBatch = batch.schema.fields?.findIndex((f) => f.name === field.name)
if (indexInBatch < 0) {
throw new Error(`The column ${field.name} was not found in the Arrow Table`)
}
alignedChildren.push(batch.data.children[indexInBatch])
}
const newData = makeData({
type: new Struct(schema.fields),
length: batch.numRows,
nullCount: batch.nullCount,
children: alignedChildren
})
return new RecordBatch(schema, newData)
}
function alignTable (table: ArrowTable, schema: Schema): ArrowTable {
const alignedBatches = table.batches.map(batch => alignBatch(batch, schema))
return new ArrowTable(schema, alignedBatches)
}
// Creates an empty Arrow Table // Creates an empty Arrow Table
export function createEmptyTable (schema: Schema): ArrowTable { export function createEmptyTable (schema: Schema): ArrowTable {
return new ArrowTable(schema) return new ArrowTable(schema)

View File

@@ -14,7 +14,8 @@
import { import {
type Schema, type Schema,
Table as ArrowTable Table as ArrowTable,
tableFromIPC
} from 'apache-arrow' } from 'apache-arrow'
import { createEmptyTable, fromRecordsToBuffer, fromTableToBuffer } from './arrow' import { createEmptyTable, fromRecordsToBuffer, fromTableToBuffer } from './arrow'
import type { EmbeddingFunction } from './embedding/embedding_function' import type { EmbeddingFunction } from './embedding/embedding_function'
@@ -24,7 +25,7 @@ import { isEmbeddingFunction } from './embedding/embedding_function'
import { type Literal, toSQL } from './util' import { type Literal, toSQL } from './util'
// eslint-disable-next-line @typescript-eslint/no-var-requires // eslint-disable-next-line @typescript-eslint/no-var-requires
const { databaseNew, databaseTableNames, databaseOpenTable, databaseDropTable, tableCreate, tableAdd, tableCreateScalarIndex, tableCreateVectorIndex, tableCountRows, tableDelete, tableUpdate, tableCleanupOldVersions, tableCompactFiles, tableListIndices, tableIndexStats } = require('../native.js') const { databaseNew, databaseTableNames, databaseOpenTable, databaseDropTable, tableCreate, tableAdd, tableCreateScalarIndex, tableCreateVectorIndex, tableCountRows, tableDelete, tableUpdate, tableCleanupOldVersions, tableCompactFiles, tableListIndices, tableIndexStats, tableSchema } = require('../native.js')
export { Query } export { Query }
export type { EmbeddingFunction } export type { EmbeddingFunction }
@@ -354,6 +355,8 @@ export interface Table<T = number[]> {
* Get statistics about an index. * Get statistics about an index.
*/ */
indexStats: (indexUuid: string) => Promise<IndexStats> indexStats: (indexUuid: string) => Promise<IndexStats>
schema: Promise<Schema>
} }
export interface UpdateArgs { export interface UpdateArgs {
@@ -482,10 +485,10 @@ export class LocalConnection implements Connection {
} }
buffer = await fromTableToBuffer(createEmptyTable(schema)) buffer = await fromTableToBuffer(createEmptyTable(schema))
} else if (data instanceof ArrowTable) { } else if (data instanceof ArrowTable) {
buffer = await fromTableToBuffer(data, embeddingFunction) buffer = await fromTableToBuffer(data, embeddingFunction, schema)
} else { } else {
// data is Array<Record<...>> // data is Array<Record<...>>
buffer = await fromRecordsToBuffer(data, embeddingFunction) buffer = await fromRecordsToBuffer(data, embeddingFunction, schema)
} }
const tbl = await tableCreate.call(this._db, name, buffer, writeOptions?.writeMode?.toString(), ...getAwsArgs(this._options())) const tbl = await tableCreate.call(this._db, name, buffer, writeOptions?.writeMode?.toString(), ...getAwsArgs(this._options()))
@@ -508,6 +511,7 @@ export class LocalConnection implements Connection {
export class LocalTable<T = number[]> implements Table<T> { export class LocalTable<T = number[]> implements Table<T> {
private _tbl: any private _tbl: any
private readonly _name: string private readonly _name: string
private readonly _isElectron: boolean
private readonly _embeddings?: EmbeddingFunction<T> private readonly _embeddings?: EmbeddingFunction<T>
private readonly _options: () => ConnectionOptions private readonly _options: () => ConnectionOptions
@@ -524,6 +528,7 @@ export class LocalTable<T = number[]> implements Table<T> {
this._name = name this._name = name
this._embeddings = embeddings this._embeddings = embeddings
this._options = () => options this._options = () => options
this._isElectron = this.checkElectron()
} }
get name (): string { get name (): string {
@@ -555,9 +560,10 @@ export class LocalTable<T = number[]> implements Table<T> {
* @return The number of rows added to the table * @return The number of rows added to the table
*/ */
async add (data: Array<Record<string, unknown>>): Promise<number> { async add (data: Array<Record<string, unknown>>): Promise<number> {
const schema = await this.schema
return tableAdd.call( return tableAdd.call(
this._tbl, this._tbl,
await fromRecordsToBuffer(data, this._embeddings), await fromRecordsToBuffer(data, this._embeddings, schema),
WriteMode.Append.toString(), WriteMode.Append.toString(),
...getAwsArgs(this._options()) ...getAwsArgs(this._options())
).then((newTable: any) => { this._tbl = newTable }) ).then((newTable: any) => { this._tbl = newTable })
@@ -682,6 +688,27 @@ export class LocalTable<T = number[]> implements Table<T> {
async indexStats (indexUuid: string): Promise<IndexStats> { async indexStats (indexUuid: string): Promise<IndexStats> {
return tableIndexStats.call(this._tbl, indexUuid) return tableIndexStats.call(this._tbl, indexUuid)
} }
get schema (): Promise<Schema> {
// empty table
return this.getSchema()
}
private async getSchema (): Promise<Schema> {
const buffer = await tableSchema.call(this._tbl, this._isElectron)
const table = tableFromIPC(buffer)
return table.schema
}
// See https://github.com/electron/electron/issues/2288
private checkElectron (): boolean {
try {
// eslint-disable-next-line no-prototype-builtins
return (process?.versions?.hasOwnProperty('electron') || navigator?.userAgent?.toLowerCase()?.includes(' electron'))
} catch (e) {
return false
}
}
} }
export interface CleanupStats { export interface CleanupStats {

View File

@@ -267,7 +267,7 @@ export class RemoteTable<T = number[]> implements Table<T> {
const column = indexParams.column ?? 'vector' const column = indexParams.column ?? 'vector'
const indexType = 'vector' // only vector index is supported for remote connections const indexType = 'vector' // only vector index is supported for remote connections
const metricType = indexParams.metric_type ?? 'L2' const metricType = indexParams.metric_type ?? 'L2'
const indexCacheSize = indexParams ?? null const indexCacheSize = indexParams.index_cache_size ?? null
const data = { const data = {
column, column,

View File

@@ -176,6 +176,26 @@ describe('LanceDB client', function () {
assert.deepEqual(await con.tableNames(), ['vectors']) assert.deepEqual(await con.tableNames(), ['vectors'])
}) })
it('create a table with a schema and records', async function () {
const dir = await track().mkdir('lancejs')
const con = await lancedb.connect(dir)
const schema = new Schema(
[new Field('id', new Int32()),
new Field('name', new Utf8()),
new Field('vector', new FixedSizeList(2, new Field('item', new Float32(), true)), false)
]
)
const data = [
{ vector: [0.5, 0.2], name: 'foo', id: 0 },
{ vector: [0.3, 0.1], name: 'bar', id: 1 }
]
// even thought the keys in data is out of order it should still work
const table = await con.createTable({ name: 'vectors', data, schema })
assert.equal(table.name, 'vectors')
assert.deepEqual(await con.tableNames(), ['vectors'])
})
it('create a table with a empty data array', async function () { it('create a table with a empty data array', async function () {
const dir = await track().mkdir('lancejs') const dir = await track().mkdir('lancejs')
const con = await lancedb.connect(dir) const con = await lancedb.connect(dir)
@@ -218,6 +238,25 @@ describe('LanceDB client', function () {
assert.equal(await table.countRows(), 2) assert.equal(await table.countRows(), 2)
}) })
it('creates a new table from javascript objects with variable sized list', async function () {
const dir = await track().mkdir('lancejs')
const con = await lancedb.connect(dir)
const data = [
{ id: 1, vector: [0.1, 0.2], list_of_str: ['a', 'b', 'c'], list_of_num: [1, 2, 3] },
{ id: 2, vector: [1.1, 1.2], list_of_str: ['x', 'y'], list_of_num: [4, 5, 6] }
]
const tableName = 'with_variable_sized_list'
const table = await con.createTable(tableName, data) as LocalTable
assert.equal(table.name, tableName)
assert.equal(await table.countRows(), 2)
const rs = await table.filter('id>1').execute()
assert.equal(rs.length, 1)
assert.deepEqual(rs[0].list_of_str, ['x', 'y'])
assert.isTrue(rs[0].list_of_num instanceof Float64Array)
})
it('fails to create a new table when the vector column is missing', async function () { it('fails to create a new table when the vector column is missing', async function () {
const dir = await track().mkdir('lancejs') const dir = await track().mkdir('lancejs')
const con = await lancedb.connect(dir) const con = await lancedb.connect(dir)
@@ -275,6 +314,25 @@ describe('LanceDB client', function () {
assert.equal(await table.countRows(), 4) assert.equal(await table.countRows(), 4)
}) })
it('appends records with fields in a different order', async function () {
const dir = await track().mkdir('lancejs')
const con = await lancedb.connect(dir)
const data = [
{ id: 1, vector: [0.1, 0.2], price: 10, name: 'a' },
{ id: 2, vector: [1.1, 1.2], price: 50, name: 'b' }
]
const table = await con.createTable('vectors', data)
const dataAdd = [
{ id: 3, vector: [2.1, 2.2], name: 'c', price: 10 },
{ id: 4, vector: [3.1, 3.2], name: 'd', price: 50 }
]
await table.add(dataAdd)
assert.equal(await table.countRows(), 4)
})
it('overwrite all records in a table', async function () { it('overwrite all records in a table', async function () {
const uri = await createTestDB() const uri = await createTestDB()
const con = await lancedb.connect(uri) const con = await lancedb.connect(uri)
@@ -479,6 +537,27 @@ describe('LanceDB client', function () {
assert.equal(results.length, 2) assert.equal(results.length, 2)
}) })
}) })
describe('when inspecting the schema', function () {
it('should return the schema', async function () {
const uri = await createTestDB()
const db = await lancedb.connect(uri)
// the fsl inner field must be named 'item' and be nullable
const expectedSchema = new Schema(
[
new Field('id', new Int32()),
new Field('vector', new FixedSizeList(128, new Field('item', new Float32(), true))),
new Field('s', new Utf8())
]
)
const table = await db.createTable({
name: 'some_table',
schema: expectedSchema
})
const schema = await table.schema
assert.deepEqual(expectedSchema, schema)
})
})
}) })
describe('Remote LanceDB client', function () { describe('Remote LanceDB client', function () {

View File

@@ -1,5 +1,5 @@
[bumpversion] [bumpversion]
current_version = 0.4.1 current_version = 0.5.0
commit = True commit = True
message = [python] Bump version: {current_version} → {new_version} message = [python] Bump version: {current_version} → {new_version}
tag = True tag = True

View File

@@ -45,8 +45,8 @@ pytest
To run linter and automatically fix all errors: To run linter and automatically fix all errors:
```bash ```bash
black . ruff format python
isort . ruff --fix python
``` ```
If any packages are missing, install them with: If any packages are missing, install them with:

View File

@@ -56,6 +56,7 @@ class DBConnection(EnforceOverrides):
data: Optional[DATA] = None, data: Optional[DATA] = None,
schema: Optional[Union[pa.Schema, LanceModel]] = None, schema: Optional[Union[pa.Schema, LanceModel]] = None,
mode: str = "create", mode: str = "create",
exist_ok: bool = False,
on_bad_vectors: str = "error", on_bad_vectors: str = "error",
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
@@ -86,6 +87,11 @@ class DBConnection(EnforceOverrides):
Can be either "create" or "overwrite". Can be either "create" or "overwrite".
By default, if the table already exists, an exception is raised. By default, if the table already exists, an exception is raised.
If you want to overwrite the table, use mode="overwrite". If you want to overwrite the table, use mode="overwrite".
exist_ok: bool, default False
If a table by the same name already exists, then raise an exception
if exist_ok=False. If exist_ok=True, then open the existing table;
it will not add the provided data but will validate against any
schema that's specified.
on_bad_vectors: str, default "error" on_bad_vectors: str, default "error"
What to do if any of the vectors are not the same size or contains NaNs. What to do if any of the vectors are not the same size or contains NaNs.
One of "error", "drop", "fill". One of "error", "drop", "fill".
@@ -319,6 +325,7 @@ class LanceDBConnection(DBConnection):
data: Optional[DATA] = None, data: Optional[DATA] = None,
schema: Optional[Union[pa.Schema, LanceModel]] = None, schema: Optional[Union[pa.Schema, LanceModel]] = None,
mode: str = "create", mode: str = "create",
exist_ok: bool = False,
on_bad_vectors: str = "error", on_bad_vectors: str = "error",
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
@@ -338,6 +345,7 @@ class LanceDBConnection(DBConnection):
data, data,
schema, schema,
mode=mode, mode=mode,
exist_ok=exist_ok,
on_bad_vectors=on_bad_vectors, on_bad_vectors=on_bad_vectors,
fill_value=fill_value, fill_value=fill_value,
embedding_functions=embedding_functions, embedding_functions=embedding_functions,

View File

@@ -19,4 +19,5 @@ from .open_clip import OpenClipEmbeddings
from .openai import OpenAIEmbeddings from .openai import OpenAIEmbeddings
from .registry import EmbeddingFunctionRegistry, get_registry from .registry import EmbeddingFunctionRegistry, get_registry
from .sentence_transformers import SentenceTransformerEmbeddings from .sentence_transformers import SentenceTransformerEmbeddings
from .gemini_text import GeminiText
from .utils import with_embeddings from .utils import with_embeddings

View File

@@ -0,0 +1,131 @@
# Copyright (c) 2023. LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from functools import cached_property
from typing import List, Union, Any
import numpy as np
from .base import TextEmbeddingFunction
from .registry import register
from .utils import api_key_not_found_help, TEXT
from lancedb.pydantic import PYDANTIC_VERSION
@register("gemini-text")
class GeminiText(TextEmbeddingFunction):
"""
An embedding function that uses the Google's Gemini API. Requires GOOGLE_API_KEY to be set.
https://ai.google.dev/docs/embeddings_guide
Supports various tasks types:
| Task Type | Description |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| "`retrieval_query`" | Specifies the given text is a query in a search/retrieval setting. |
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically proided by Embeddings API |
| "`semantic_similarity`" | Specifies the given text will be used for Semantic Textual Similarity (STS). |
| "`classification`" | Specifies that the embeddings will be used for classification. |
| "`clusering`" | Specifies that the embeddings will be used for clustering. |
Note: The supported task types might change in the Gemini API, but as long as a supported task type and its argument set is provided,
those will be delegated to the API calls.
Parameters
----------
name: str, default "models/embedding-001"
The name of the model to use. See the Gemini documentation for a list of available models.
query_task_type: str, default "retrieval_query"
Sets the task type for the queries.
source_task_type: str, default "retrieval_document"
Sets the task type for ingestion.
Examples
--------
import lancedb
import pandas as pd
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
model = get_registry().get("gemini-text").create()
class TextModel(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect("~/.lancedb")
tbl = db.create_table("test", schema=TextModel, mode="overwrite")
tbl.add(df)
rs = tbl.search("hello").limit(1).to_pandas()
"""
name: str = "models/embedding-001"
query_task_type: str = "retrieval_query"
source_task_type: str = "retrieval_document"
if PYDANTIC_VERSION < (2, 0): # Pydantic 1.x compat
class Config:
keep_untouched = (cached_property,)
def ndims(self):
# TODO: fix hardcoding
return 768
def compute_query_embeddings(self, query: str, *args, **kwargs) -> List[np.array]:
return self.compute_source_embeddings(query, task_type=self.query_task_type)
def compute_source_embeddings(self, texts: TEXT, *args, **kwargs) -> List[np.array]:
texts = self.sanitize_input(texts)
task_type = (
kwargs.get("task_type") or self.source_task_type
) # assume source task type if not passed by `compute_query_embeddings`
return self.generate_embeddings(texts, task_type=task_type)
def generate_embeddings(
self, texts: Union[List[str], np.ndarray], *args, **kwargs
) -> List[np.array]:
"""
Get the embeddings for the given texts
Parameters
----------
texts: list[str] or np.ndarray (of str)
The texts to embed
"""
if (
kwargs.get("task_type") == "retrieval_document"
): # Provide a title to use existing API design
title = "Embedding of a document"
kwargs["title"] = title
return [
self.client.embed_content(model=self.name, content=text, **kwargs)[
"embedding"
]
for text in texts
]
@cached_property
def client(self):
genai = self.safe_import("google.generativeai", "google.generativeai")
if not os.environ.get("GOOGLE_API_KEY"):
api_key_not_found_help("google")
return genai

View File

@@ -10,12 +10,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os
from functools import cached_property
from typing import List, Union from typing import List, Union
import numpy as np import numpy as np
from .base import TextEmbeddingFunction from .base import TextEmbeddingFunction
from .registry import register from .registry import register
from .utils import api_key_not_found_help
@register("openai") @register("openai")
@@ -44,6 +47,13 @@ class OpenAIEmbeddings(TextEmbeddingFunction):
The texts to embed The texts to embed
""" """
# TODO retry, rate limit, token limit # TODO retry, rate limit, token limit
rs = self._openai_client.embeddings.create(input=texts, model=self.name)
return [v.embedding for v in rs.data]
@cached_property
def _openai_client(self):
openai = self.safe_import("openai") openai = self.safe_import("openai")
rs = openai.Embedding.create(input=texts, model=self.name)["data"]
return [v["embedding"] for v in rs] if not os.environ.get("OPENAI_API_KEY"):
api_key_not_found_help("openai")
return openai.OpenAI()

View File

@@ -216,7 +216,6 @@ def retry_with_exponential_backoff(
exponential_base: float = 2, exponential_base: float = 2,
jitter: bool = True, jitter: bool = True,
max_retries: int = 7, max_retries: int = 7,
# errors: tuple = (),
): ):
"""Retry a function with exponential backoff. """Retry a function with exponential backoff.
@@ -226,7 +225,6 @@ def retry_with_exponential_backoff(
exponential_base (float): The base for exponential backoff (default is 2). exponential_base (float): The base for exponential backoff (default is 2).
jitter (bool): Whether to add jitter to the delay (default is True). jitter (bool): Whether to add jitter to the delay (default is True).
max_retries (int): Maximum number of retries (default is 10). max_retries (int): Maximum number of retries (default is 10).
errors (tuple): Tuple of specific exceptions to retry on (default is (openai.error.RateLimitError,)).
Returns: Returns:
function: The decorated function. function: The decorated function.
@@ -249,7 +247,7 @@ def retry_with_exponential_backoff(
if num_retries > max_retries: if num_retries > max_retries:
raise Exception( raise Exception(
f"Maximum number of retries ({max_retries}) exceeded." f"Maximum number of retries ({max_retries}) exceeded.", e
) )
delay *= exponential_base * (1 + jitter * random.random()) delay *= exponential_base * (1 + jitter * random.random())

View File

@@ -13,7 +13,7 @@
"""Full text search index using tantivy-py""" """Full text search index using tantivy-py"""
import os import os
from typing import List, Tuple from typing import List, Optional, Tuple
import pyarrow as pa import pyarrow as pa
@@ -56,7 +56,12 @@ def create_index(index_path: str, text_fields: List[str]) -> tantivy.Index:
return index return index
def populate_index(index: tantivy.Index, table: LanceTable, fields: List[str]) -> int: def populate_index(
index: tantivy.Index,
table: LanceTable,
fields: List[str],
writer_heap_size: int = 1024 * 1024 * 1024,
) -> int:
""" """
Populate an index with data from a LanceTable Populate an index with data from a LanceTable
@@ -68,6 +73,8 @@ def populate_index(index: tantivy.Index, table: LanceTable, fields: List[str]) -
The table to index The table to index
fields : List[str] fields : List[str]
List of fields to index List of fields to index
writer_heap_size : int
The writer heap size in bytes, defaults to 1GB
Returns Returns
------- -------
@@ -87,7 +94,7 @@ def populate_index(index: tantivy.Index, table: LanceTable, fields: List[str]) -
raise TypeError(f"Field {name} is not a string type") raise TypeError(f"Field {name} is not a string type")
# create a tantivy writer # create a tantivy writer
writer = index.writer() writer = index.writer(heap_size=writer_heap_size)
# write data into index # write data into index
dataset = table.to_lance() dataset = table.to_lance()
row_id = 0 row_id = 0
@@ -103,9 +110,12 @@ def populate_index(index: tantivy.Index, table: LanceTable, fields: List[str]) -
b = b.flatten() b = b.flatten()
for i in range(b.num_rows): for i in range(b.num_rows):
doc = tantivy.Document() doc = tantivy.Document()
doc.add_integer("doc_id", row_id)
for name in fields: for name in fields:
doc.add_text(name, b[name][i].as_py()) value = b[name][i].as_py()
if value is not None:
doc.add_text(name, value)
if not doc.is_empty:
doc.add_integer("doc_id", row_id)
writer.add_document(doc) writer.add_document(doc)
row_id += 1 row_id += 1
# commit changes # commit changes

View File

@@ -26,6 +26,7 @@ import numpy as np
import pyarrow as pa import pyarrow as pa
import pydantic import pydantic
import semver import semver
from pydantic.fields import FieldInfo
from .embeddings import EmbeddingFunctionRegistry from .embeddings import EmbeddingFunctionRegistry
@@ -142,8 +143,8 @@ def Vector(
return FixedSizeList return FixedSizeList
def _py_type_to_arrow_type(py_type: Type[Any]) -> pa.DataType: def _py_type_to_arrow_type(py_type: Type[Any], field: FieldInfo) -> pa.DataType:
"""Convert Python Type to Arrow DataType. """Convert a field with native Python type to Arrow data type.
Raises Raises
------ ------
@@ -163,9 +164,13 @@ def _py_type_to_arrow_type(py_type: Type[Any]) -> pa.DataType:
elif py_type == date: elif py_type == date:
return pa.date32() return pa.date32()
elif py_type == datetime: elif py_type == datetime:
return pa.timestamp("us") tz = get_extras(field, "tz")
return pa.timestamp("us", tz=tz)
elif getattr(py_type, "__origin__", None) in (list, tuple):
child = py_type.__args__[0]
return pa.list_(_py_type_to_arrow_type(child, field))
raise TypeError( raise TypeError(
f"Converting Pydantic type to Arrow Type: unsupported type {py_type}" f"Converting Pydantic type to Arrow Type: unsupported type {py_type}."
) )
@@ -187,6 +192,7 @@ else:
def _pydantic_to_arrow_type(field: pydantic.fields.FieldInfo) -> pa.DataType: def _pydantic_to_arrow_type(field: pydantic.fields.FieldInfo) -> pa.DataType:
"""Convert a Pydantic FieldInfo to Arrow DataType""" """Convert a Pydantic FieldInfo to Arrow DataType"""
if isinstance(field.annotation, _GenericAlias) or ( if isinstance(field.annotation, _GenericAlias) or (
sys.version_info > (3, 9) and isinstance(field.annotation, types.GenericAlias) sys.version_info > (3, 9) and isinstance(field.annotation, types.GenericAlias)
): ):
@@ -194,10 +200,17 @@ def _pydantic_to_arrow_type(field: pydantic.fields.FieldInfo) -> pa.DataType:
args = field.annotation.__args__ args = field.annotation.__args__
if origin == list: if origin == list:
child = args[0] child = args[0]
return pa.list_(_py_type_to_arrow_type(child)) return pa.list_(_py_type_to_arrow_type(child, field))
elif origin == Union: elif origin == Union:
if len(args) == 2 and args[1] == type(None): if len(args) == 2 and args[1] == type(None):
return _py_type_to_arrow_type(args[0]) return _py_type_to_arrow_type(args[0], field)
elif sys.version_info >= (3, 10) and isinstance(field.annotation, types.UnionType):
args = field.annotation.__args__
if len(args) == 2:
for typ in args:
if typ == type(None):
continue
return _py_type_to_arrow_type(typ, field)
elif inspect.isclass(field.annotation): elif inspect.isclass(field.annotation):
if issubclass(field.annotation, pydantic.BaseModel): if issubclass(field.annotation, pydantic.BaseModel):
# Struct # Struct
@@ -205,7 +218,7 @@ def _pydantic_to_arrow_type(field: pydantic.fields.FieldInfo) -> pa.DataType:
return pa.struct(fields) return pa.struct(fields)
elif issubclass(field.annotation, FixedSizeListMixin): elif issubclass(field.annotation, FixedSizeListMixin):
return pa.list_(field.annotation.value_arrow_type(), field.annotation.dim()) return pa.list_(field.annotation.value_arrow_type(), field.annotation.dim())
return _py_type_to_arrow_type(field.annotation) return _py_type_to_arrow_type(field.annotation, field)
def is_nullable(field: pydantic.fields.FieldInfo) -> bool: def is_nullable(field: pydantic.fields.FieldInfo) -> bool:
@@ -216,6 +229,11 @@ def is_nullable(field: pydantic.fields.FieldInfo) -> bool:
if origin == Union: if origin == Union:
if len(args) == 2 and args[1] == type(None): if len(args) == 2 and args[1] == type(None):
return True return True
elif sys.version_info >= (3, 10) and isinstance(field.annotation, types.UnionType):
args = field.annotation.__args__
for typ in args:
if typ == type(None):
return True
return False return False

View File

@@ -14,6 +14,7 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from pathlib import Path
from typing import TYPE_CHECKING, List, Literal, Optional, Type, Union from typing import TYPE_CHECKING, List, Literal, Optional, Type, Union
import deprecation import deprecation
@@ -70,7 +71,7 @@ class Query(pydantic.BaseModel):
vector_column: str = VECTOR_COLUMN_NAME vector_column: str = VECTOR_COLUMN_NAME
# vector to search for # vector to search for
vector: List[float] vector: Union[List[float], List[List[float]]]
# sql filter to refine the query with # sql filter to refine the query with
filter: Optional[str] = None filter: Optional[str] = None
@@ -259,19 +260,40 @@ class LanceQueryBuilder(ABC):
for row in self.to_arrow().to_pylist() for row in self.to_arrow().to_pylist()
] ]
def limit(self, limit: int) -> LanceQueryBuilder: def to_polars(self) -> "pl.DataFrame":
"""
Execute the query and return the results as a Polars DataFrame.
In addition to the selected columns, LanceDB also returns a vector
and also the "_distance" column which is the distance between the query
vector and the returned vector.
"""
import polars as pl
return pl.from_arrow(self.to_arrow())
def limit(self, limit: Union[int, None]) -> LanceQueryBuilder:
"""Set the maximum number of results to return. """Set the maximum number of results to return.
Parameters Parameters
---------- ----------
limit: int limit: int
The maximum number of results to return. The maximum number of results to return.
By default the query is limited to the first 10.
Call this method and pass 0, a negative value,
or None to remove the limit.
*WARNING* if you have a large dataset, removing
the limit can potentially result in reading a
large amount of data into memory and cause
out of memory issues.
Returns Returns
------- -------
LanceQueryBuilder LanceQueryBuilder
The LanceQueryBuilder object. The LanceQueryBuilder object.
""" """
if limit is None or limit <= 0:
self._limit = None
else:
self._limit = limit self._limit = limit
return self return self
@@ -421,6 +443,8 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
vector and the returned vectors. vector and the returned vectors.
""" """
vector = self._query if isinstance(self._query, list) else self._query.tolist() vector = self._query if isinstance(self._query, list) else self._query.tolist()
if isinstance(vector[0], np.ndarray):
vector = [v.tolist() for v in vector]
query = Query( query = Query(
vector=vector, vector=vector,
filter=self._where, filter=self._where,
@@ -465,6 +489,24 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
def __init__(self, table: "lancedb.table.Table", query: str): def __init__(self, table: "lancedb.table.Table", query: str):
super().__init__(table) super().__init__(table)
self._query = query self._query = query
self._phrase_query = False
def phrase_query(self, phrase_query: bool = True) -> LanceFtsQueryBuilder:
"""Set whether to use phrase query.
Parameters
----------
phrase_query: bool, default True
If True, then the query will be wrapped in quotes and
double quotes replaced by single quotes.
Returns
-------
LanceFtsQueryBuilder
The LanceFtsQueryBuilder object.
"""
self._phrase_query = phrase_query
return self
def to_arrow(self) -> pa.Table: def to_arrow(self) -> pa.Table:
try: try:
@@ -478,16 +520,47 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
# get the index path # get the index path
index_path = self._table._get_fts_index_path() index_path = self._table._get_fts_index_path()
# check if the index exist
if not Path(index_path).exists():
raise FileNotFoundError(
"Fts index does not exist."
f"Please first call table.create_fts_index(['<field_names>']) to create the fts index."
)
# open the index # open the index
index = tantivy.Index.open(index_path) index = tantivy.Index.open(index_path)
# get the scores and doc ids # get the scores and doc ids
row_ids, scores = search_index(index, self._query, self._limit) query = self._query
if self._phrase_query:
query = query.replace('"', "'")
query = f'"{query}"'
row_ids, scores = search_index(index, query, self._limit)
if len(row_ids) == 0: if len(row_ids) == 0:
empty_schema = pa.schema([pa.field("score", pa.float32())]) empty_schema = pa.schema([pa.field("score", pa.float32())])
return pa.Table.from_pylist([], schema=empty_schema) return pa.Table.from_pylist([], schema=empty_schema)
scores = pa.array(scores) scores = pa.array(scores)
output_tbl = self._table.to_lance().take(row_ids, columns=self._columns) output_tbl = self._table.to_lance().take(row_ids, columns=self._columns)
output_tbl = output_tbl.append_column("score", scores) output_tbl = output_tbl.append_column("score", scores)
if self._where is not None:
try:
# TODO would be great to have Substrait generate pyarrow compute expressions
# or conversely have pyarrow support SQL expressions using Substrait
import duckdb
output_tbl = (
duckdb.sql(f"SELECT * FROM output_tbl")
.filter(self._where)
.to_arrow_table()
)
except ImportError:
import lance
import tempfile
# TODO Use "memory://" instead once that's supported
with tempfile.TemporaryDirectory() as tmp:
ds = lance.write_dataset(output_tbl, tmp)
output_tbl = ds.to_table(filter=self._where)
return output_tbl return output_tbl

View File

@@ -13,9 +13,10 @@
import functools import functools
from typing import Any, Callable, Dict, Iterable, Optional, Union from typing import Any, Callable, Dict, Iterable, List, Optional, Union
from urllib.parse import urljoin
import aiohttp import requests
import attrs import attrs
import pyarrow as pa import pyarrow as pa
from pydantic import BaseModel from pydantic import BaseModel
@@ -37,8 +38,8 @@ def _check_not_closed(f):
return wrapped return wrapped
async def _read_ipc(resp: aiohttp.ClientResponse) -> pa.Table: def _read_ipc(resp: requests.Response) -> pa.Table:
resp_body = await resp.read() resp_body = resp.content
with pa.ipc.open_file(pa.BufferReader(resp_body)) as reader: with pa.ipc.open_file(pa.BufferReader(resp_body)) as reader:
return reader.read_all() return reader.read_all()
@@ -53,15 +54,18 @@ class RestfulLanceDBClient:
closed: bool = attrs.field(default=False, init=False) closed: bool = attrs.field(default=False, init=False)
@functools.cached_property @functools.cached_property
def session(self) -> aiohttp.ClientSession: def session(self) -> requests.Session:
url = ( return requests.Session()
@property
def url(self) -> str:
return (
self.host_override self.host_override
or f"https://{self.db_name}.{self.region}.api.lancedb.com" or f"https://{self.db_name}.{self.region}.api.lancedb.com"
) )
return aiohttp.ClientSession(url)
async def close(self): def close(self):
await self.session.close() self.session.close()
self.closed = True self.closed = True
@functools.cached_property @functools.cached_property
@@ -76,38 +80,38 @@ class RestfulLanceDBClient:
return headers return headers
@staticmethod @staticmethod
async def _check_status(resp: aiohttp.ClientResponse): def _check_status(resp: requests.Response):
if resp.status == 404: if resp.status_code == 404:
raise LanceDBClientError(f"Not found: {await resp.text()}") raise LanceDBClientError(f"Not found: {resp.text}")
elif 400 <= resp.status < 500: elif 400 <= resp.status_code < 500:
raise LanceDBClientError( raise LanceDBClientError(
f"Bad Request: {resp.status}, error: {await resp.text()}" f"Bad Request: {resp.status_code}, error: {resp.text}"
) )
elif 500 <= resp.status < 600: elif 500 <= resp.status_code < 600:
raise LanceDBClientError( raise LanceDBClientError(
f"Internal Server Error: {resp.status}, error: {await resp.text()}" f"Internal Server Error: {resp.status_code}, error: {resp.text}"
) )
elif resp.status != 200: elif resp.status_code != 200:
raise LanceDBClientError( raise LanceDBClientError(
f"Unknown Error: {resp.status}, error: {await resp.text()}" f"Unknown Error: {resp.status_code}, error: {resp.text}"
) )
@_check_not_closed @_check_not_closed
async def get(self, uri: str, params: Union[Dict[str, Any], BaseModel] = None): def get(self, uri: str, params: Union[Dict[str, Any], BaseModel] = None):
"""Send a GET request and returns the deserialized response payload.""" """Send a GET request and returns the deserialized response payload."""
if isinstance(params, BaseModel): if isinstance(params, BaseModel):
params: Dict[str, Any] = params.dict(exclude_none=True) params: Dict[str, Any] = params.dict(exclude_none=True)
async with self.session.get( with self.session.get(
uri, urljoin(self.url, uri),
params=params, params=params,
headers=self.headers, headers=self.headers,
timeout=aiohttp.ClientTimeout(total=30), timeout=(5.0, 30.0),
) as resp: ) as resp:
await self._check_status(resp) self._check_status(resp)
return await resp.json() return resp.json()
@_check_not_closed @_check_not_closed
async def post( def post(
self, self,
uri: str, uri: str,
data: Optional[Union[Dict[str, Any], BaseModel, bytes]] = None, data: Optional[Union[Dict[str, Any], BaseModel, bytes]] = None,
@@ -139,31 +143,26 @@ class RestfulLanceDBClient:
headers["content-type"] = content_type headers["content-type"] = content_type
if request_id is not None: if request_id is not None:
headers["x-request-id"] = request_id headers["x-request-id"] = request_id
async with self.session.post( with self.session.post(
uri, urljoin(self.url, uri),
headers=headers, headers=headers,
params=params, params=params,
timeout=aiohttp.ClientTimeout(total=30), timeout=(5.0, 30.0),
**req_kwargs, **req_kwargs,
) as resp: ) as resp:
resp: aiohttp.ClientResponse = resp self._check_status(resp)
await self._check_status(resp) return deserialize(resp)
return await deserialize(resp)
@_check_not_closed @_check_not_closed
async def list_tables( def list_tables(self, limit: int, page_token: Optional[str] = None) -> List[str]:
self, limit: int, page_token: Optional[str] = None
) -> Iterable[str]:
"""List all tables in the database.""" """List all tables in the database."""
if page_token is None: if page_token is None:
page_token = "" page_token = ""
json = await self.get("/v1/table/", {"limit": limit, "page_token": page_token}) json = self.get("/v1/table/", {"limit": limit, "page_token": page_token})
return json["tables"] return json["tables"]
@_check_not_closed @_check_not_closed
async def query(self, table_name: str, query: VectorQuery) -> VectorQueryResult: def query(self, table_name: str, query: VectorQuery) -> VectorQueryResult:
"""Query a table.""" """Query a table."""
tbl = await self.post( tbl = self.post(f"/v1/table/{table_name}/query/", query, deserialize=_read_ipc)
f"/v1/table/{table_name}/query/", query, deserialize=_read_ipc
)
return VectorQueryResult(tbl) return VectorQueryResult(tbl)

View File

@@ -50,10 +50,6 @@ class RemoteDBConnection(DBConnection):
self._client = RestfulLanceDBClient( self._client = RestfulLanceDBClient(
self.db_name, region, api_key, host_override self.db_name, region, api_key, host_override
) )
try:
self._loop = asyncio.get_running_loop()
except RuntimeError:
self._loop = asyncio.get_event_loop()
def __repr__(self) -> str: def __repr__(self) -> str:
return f"RemoteConnect(name={self.db_name})" return f"RemoteConnect(name={self.db_name})"
@@ -76,9 +72,8 @@ class RemoteDBConnection(DBConnection):
An iterator of table names. An iterator of table names.
""" """
while True: while True:
result = self._loop.run_until_complete( result = self._client.list_tables(limit, page_token)
self._client.list_tables(limit, page_token)
)
if len(result) > 0: if len(result) > 0:
page_token = result[len(result) - 1] page_token = result[len(result) - 1]
else: else:
@@ -103,9 +98,7 @@ class RemoteDBConnection(DBConnection):
# check if table exists # check if table exists
try: try:
self._loop.run_until_complete(
self._client.post(f"/v1/table/{name}/describe/") self._client.post(f"/v1/table/{name}/describe/")
)
except LanceDBClientError as err: except LanceDBClientError as err:
if str(err).startswith("Not found"): if str(err).startswith("Not found"):
logging.error( logging.error(
@@ -248,14 +241,13 @@ class RemoteDBConnection(DBConnection):
data = to_ipc_binary(data) data = to_ipc_binary(data)
request_id = uuid.uuid4().hex request_id = uuid.uuid4().hex
self._loop.run_until_complete(
self._client.post( self._client.post(
f"/v1/table/{name}/create/", f"/v1/table/{name}/create/",
data=data, data=data,
request_id=request_id, request_id=request_id,
content_type=ARROW_STREAM_CONTENT_TYPE, content_type=ARROW_STREAM_CONTENT_TYPE,
) )
)
return RemoteTable(self, name) return RemoteTable(self, name)
@override @override
@@ -267,13 +259,11 @@ class RemoteDBConnection(DBConnection):
name: str name: str
The name of the table. The name of the table.
""" """
self._loop.run_until_complete(
self._client.post( self._client.post(
f"/v1/table/{name}/drop/", f"/v1/table/{name}/drop/",
) )
)
async def close(self): async def close(self):
"""Close the connection to the database.""" """Close the connection to the database."""
self._loop.close() self._client.close()
await self._client.close()

View File

@@ -11,6 +11,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import asyncio
import uuid import uuid
from functools import cached_property from functools import cached_property
from typing import Dict, Optional, Union from typing import Dict, Optional, Union
@@ -42,18 +43,14 @@ class RemoteTable(Table):
of this Table of this Table
""" """
resp = self._conn._loop.run_until_complete( resp = self._conn._client.post(f"/v1/table/{self._name}/describe/")
self._conn._client.post(f"/v1/table/{self._name}/describe/")
)
schema = json_to_schema(resp["schema"]) schema = json_to_schema(resp["schema"])
return schema return schema
@property @property
def version(self) -> int: def version(self) -> int:
"""Get the current version of the table""" """Get the current version of the table"""
resp = self._conn._loop.run_until_complete( resp = self._conn._client.post(f"/v1/table/{self._name}/describe/")
self._conn._client.post(f"/v1/table/{self._name}/describe/")
)
return resp["version"] return resp["version"]
def to_arrow(self) -> pa.Table: def to_arrow(self) -> pa.Table:
@@ -115,9 +112,10 @@ class RemoteTable(Table):
"metric_type": metric, "metric_type": metric,
"index_cache_size": index_cache_size, "index_cache_size": index_cache_size,
} }
resp = self._conn._loop.run_until_complete( resp = self._conn._client.post(
self._conn._client.post(f"/v1/table/{self._name}/create_index/", data=data) f"/v1/table/{self._name}/create_index/", data=data
) )
return resp return resp
def add( def add(
@@ -160,14 +158,12 @@ class RemoteTable(Table):
request_id = uuid.uuid4().hex request_id = uuid.uuid4().hex
self._conn._loop.run_until_complete(
self._conn._client.post( self._conn._client.post(
f"/v1/table/{self._name}/insert/", f"/v1/table/{self._name}/insert/",
data=payload, data=payload,
params={"request_id": request_id, "mode": mode}, params={"request_id": request_id, "mode": mode},
content_type=ARROW_STREAM_CONTENT_TYPE, content_type=ARROW_STREAM_CONTENT_TYPE,
) )
)
def search( def search(
self, query: Union[VEC, str], vector_column_name: str = VECTOR_COLUMN_NAME self, query: Union[VEC, str], vector_column_name: str = VECTOR_COLUMN_NAME
@@ -227,8 +223,24 @@ class RemoteTable(Table):
return LanceVectorQueryBuilder(self, query, vector_column_name) return LanceVectorQueryBuilder(self, query, vector_column_name)
def _execute_query(self, query: Query) -> pa.Table: def _execute_query(self, query: Query) -> pa.Table:
if (
query.vector is not None
and len(query.vector) > 0
and not isinstance(query.vector[0], float)
):
results = []
for v in query.vector:
v = list(v)
q = query.copy()
q.vector = v
results.append(self._conn._client.query(self._name, q))
return pa.concat_tables(
[add_index(r.to_arrow(), i) for i, r in enumerate(results)]
)
else:
result = self._conn._client.query(self._name, query) result = self._conn._client.query(self._name, query)
return self._conn._loop.run_until_complete(result).to_arrow() return result.to_arrow()
def delete(self, predicate: str): def delete(self, predicate: str):
"""Delete rows from the table. """Delete rows from the table.
@@ -277,9 +289,7 @@ class RemoteTable(Table):
0 2 [3.0, 4.0] 85.0 # doctest: +SKIP 0 2 [3.0, 4.0] 85.0 # doctest: +SKIP
""" """
payload = {"predicate": predicate} payload = {"predicate": predicate}
self._conn._loop.run_until_complete(
self._conn._client.post(f"/v1/table/{self._name}/delete/", data=payload) self._conn._client.post(f"/v1/table/{self._name}/delete/", data=payload)
)
def update( def update(
self, self,
@@ -339,6 +349,12 @@ class RemoteTable(Table):
updates = [[k, v] for k, v in values_sql.items()] updates = [[k, v] for k, v in values_sql.items()]
payload = {"predicate": where, "updates": updates} payload = {"predicate": where, "updates": updates}
self._conn._loop.run_until_complete(
self._conn._client.post(f"/v1/table/{self._name}/update/", data=payload) self._conn._client.post(f"/v1/table/{self._name}/update/", data=payload)
def add_index(tbl: pa.Table, i: int) -> pa.Table:
return tbl.add_column(
0,
pa.field("query_index", pa.uint32()),
pa.array([i] * len(tbl), pa.uint32()),
) )

View File

@@ -31,7 +31,13 @@ from .common import DATA, VEC, VECTOR_COLUMN_NAME
from .embeddings import EmbeddingFunctionConfig, EmbeddingFunctionRegistry from .embeddings import EmbeddingFunctionConfig, EmbeddingFunctionRegistry
from .pydantic import LanceModel, model_to_dict from .pydantic import LanceModel, model_to_dict
from .query import LanceQueryBuilder, Query from .query import LanceQueryBuilder, Query
from .util import fs_from_uri, safe_import_pandas, value_to_sql, join_uri from .util import (
fs_from_uri,
safe_import_pandas,
safe_import_polars,
value_to_sql,
join_uri,
)
from .utils.events import register_event from .utils.events import register_event
if TYPE_CHECKING: if TYPE_CHECKING:
@@ -41,6 +47,7 @@ if TYPE_CHECKING:
pd = safe_import_pandas() pd = safe_import_pandas()
pl = safe_import_polars()
def _sanitize_data( def _sanitize_data(
@@ -66,6 +73,8 @@ def _sanitize_data(
meta = data.schema.metadata if data.schema.metadata is not None else {} meta = data.schema.metadata if data.schema.metadata is not None else {}
meta = {k: v for k, v in meta.items() if k != b"pandas"} meta = {k: v for k, v in meta.items() if k != b"pandas"}
data = data.replace_schema_metadata(meta) data = data.replace_schema_metadata(meta)
elif pl is not None and isinstance(data, pl.DataFrame):
data = data.to_arrow()
if isinstance(data, pa.Table): if isinstance(data, pa.Table):
if metadata: if metadata:
@@ -647,8 +656,19 @@ class LanceTable(Table):
self._dataset.restore() self._dataset.restore()
self._reset_dataset() self._reset_dataset()
def count_rows(self, filter: Optional[str] = None) -> int:
"""
Count the number of rows in the table.
Parameters
----------
filter: str, optional
A SQL where clause to filter the rows to count.
"""
return self._dataset.count_rows(filter)
def __len__(self): def __len__(self):
return self._dataset.count_rows() return self.count_rows()
def __repr__(self) -> str: def __repr__(self) -> str:
return f"LanceTable({self.name})" return f"LanceTable({self.name})"
@@ -677,6 +697,30 @@ class LanceTable(Table):
pa.Table""" pa.Table"""
return self._dataset.to_table() return self._dataset.to_table()
def to_polars(self, batch_size=None) -> "pl.LazyFrame":
"""Return the table as a polars LazyFrame.
Parameters
----------
batch_size: int, optional
Passed to polars. This is the maximum row count for
scanned pyarrow record batches
Note
----
1. This requires polars to be installed separately
2. Currently we've disabled push-down of the filters from polars
because polars pushdown into pyarrow uses pyarrow compute
expressions rather than SQl strings (which LanceDB supports)
Returns
-------
pl.LazyFrame
"""
return pl.scan_pyarrow_dataset(
self.to_lance(), allow_pyarrow_filter=False, batch_size=batch_size
)
@property @property
def _dataset_uri(self) -> str: def _dataset_uri(self) -> str:
return join_uri(self._conn.uri, f"{self.name}.lance") return join_uri(self._conn.uri, f"{self.name}.lance")
@@ -709,7 +753,11 @@ class LanceTable(Table):
self._dataset.create_scalar_index(column, index_type="BTREE", replace=replace) self._dataset.create_scalar_index(column, index_type="BTREE", replace=replace)
def create_fts_index( def create_fts_index(
self, field_names: Union[str, List[str]], *, replace: bool = False self,
field_names: Union[str, List[str]],
*,
replace: bool = False,
writer_heap_size: Optional[int] = 1024 * 1024 * 1024,
): ):
"""Create a full-text search index on the table. """Create a full-text search index on the table.
@@ -724,6 +772,7 @@ class LanceTable(Table):
If True, replace the existing index if it exists. Note that this is If True, replace the existing index if it exists. Note that this is
not yet an atomic operation; the index will be temporarily not yet an atomic operation; the index will be temporarily
unavailable while the new index is being created. unavailable while the new index is being created.
writer_heap_size: int, default 1GB
""" """
from .fts import create_index, populate_index from .fts import create_index, populate_index
@@ -740,7 +789,7 @@ class LanceTable(Table):
fs.delete_dir(path) fs.delete_dir(path)
index = create_index(self._get_fts_index_path(), field_names) index = create_index(self._get_fts_index_path(), field_names)
populate_index(index, self, field_names) populate_index(index, self, field_names, writer_heap_size=writer_heap_size)
register_event("create_fts_index") register_event("create_fts_index")
def _get_fts_index_path(self): def _get_fts_index_path(self):
@@ -947,6 +996,7 @@ class LanceTable(Table):
data=None, data=None,
schema=None, schema=None,
mode="create", mode="create",
exist_ok=False,
on_bad_vectors: str = "error", on_bad_vectors: str = "error",
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: List[EmbeddingFunctionConfig] = None, embedding_functions: List[EmbeddingFunctionConfig] = None,
@@ -986,6 +1036,10 @@ class LanceTable(Table):
mode: str, default "create" mode: str, default "create"
The mode to use when writing the data. Valid values are The mode to use when writing the data. Valid values are
"create", "overwrite", and "append". "create", "overwrite", and "append".
exist_ok: bool, default False
If the table already exists then raise an error if False,
otherwise just open the table, it will not add the provided
data but will validate against any schema that's specified.
on_bad_vectors: str, default "error" on_bad_vectors: str, default "error"
What to do if any of the vectors are not the same size or contains NaNs. What to do if any of the vectors are not the same size or contains NaNs.
One of "error", "drop", "fill". One of "error", "drop", "fill".
@@ -1036,14 +1090,24 @@ class LanceTable(Table):
schema = schema.with_metadata(metadata) schema = schema.with_metadata(metadata)
empty = pa.Table.from_pylist([], schema=schema) empty = pa.Table.from_pylist([], schema=schema)
try:
lance.write_dataset(empty, tbl._dataset_uri, schema=schema, mode=mode) lance.write_dataset(empty, tbl._dataset_uri, schema=schema, mode=mode)
table = LanceTable(db, name) except OSError as err:
if "Dataset already exists" in str(err) and exist_ok:
if tbl.schema != schema:
raise ValueError(
f"Table {name} already exists with a different schema"
)
return tbl
raise
new_table = LanceTable(db, name)
if data is not None: if data is not None:
table.add(data) new_table.add(data)
register_event("create_table") register_event("create_table")
return table return new_table
@classmethod @classmethod
def open(cls, db, name): def open(cls, db, name):
@@ -1260,7 +1324,8 @@ def _sanitize_vector_column(
""" """
# ChunkedArray is annoying to work with, so we combine chunks here # ChunkedArray is annoying to work with, so we combine chunks here
vec_arr = data[vector_column_name].combine_chunks() vec_arr = data[vector_column_name].combine_chunks()
if pa.types.is_list(data[vector_column_name].type): typ = data[vector_column_name].type
if pa.types.is_list(typ) or pa.types.is_large_list(typ):
# if it's a variable size list array, # if it's a variable size list array,
# we make sure the dimensions are all the same # we make sure the dimensions are all the same
has_jagged_ndims = len(vec_arr.values) % len(data) != 0 has_jagged_ndims = len(vec_arr.values) % len(data) != 0

View File

@@ -123,6 +123,15 @@ def safe_import_pandas():
return None return None
def safe_import_polars():
try:
import polars as pl
return pl
except ImportError:
return None
@singledispatch @singledispatch
def value_to_sql(value): def value_to_sql(value):
raise NotImplementedError("SQL conversion is not implemented for this type") raise NotImplementedError("SQL conversion is not implemented for this type")

View File

@@ -1,13 +1,12 @@
[project] [project]
name = "lancedb" name = "lancedb"
version = "0.4.1" version = "0.5.0"
dependencies = [ dependencies = [
"deprecation", "deprecation",
"pylance==0.9.1", "pylance==0.9.6",
"ratelimiter~=1.0", "ratelimiter~=1.0",
"retry>=0.9.2", "retry>=0.9.2",
"tqdm>=4.27.0", "tqdm>=4.27.0",
"aiohttp",
"pydantic>=1.10", "pydantic>=1.10",
"attrs>=21.3.0", "attrs>=21.3.0",
"semver>=3.0", "semver>=3.0",
@@ -49,11 +48,11 @@ classifiers = [
repository = "https://github.com/lancedb/lancedb" repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies] [project.optional-dependencies]
tests = ["pandas>=1.4", "pytest", "pytest-mock", "pytest-asyncio", "requests"] tests = ["aiohttp", "pandas>=1.4", "pytest", "pytest-mock", "pytest-asyncio", "duckdb", "pytz", "polars"]
dev = ["ruff", "pre-commit", "black"] dev = ["ruff", "pre-commit"]
docs = ["mkdocs", "mkdocs-jupyter", "mkdocs-material", "mkdocstrings[python]"] docs = ["mkdocs", "mkdocs-jupyter", "mkdocs-material", "mkdocstrings[python]"]
clip = ["torch", "pillow", "open-clip"] clip = ["torch", "pillow", "open-clip"]
embeddings = ["openai", "sentence-transformers", "torch", "pillow", "open-clip-torch", "cohere", "InstructorEmbedding"] embeddings = ["openai>=1.6.1", "sentence-transformers", "torch", "pillow", "open-clip-torch", "cohere", "InstructorEmbedding"]
[project.scripts] [project.scripts]
lancedb = "lancedb.cli.cli:cli" lancedb = "lancedb.cli.cli:cli"
@@ -62,9 +61,6 @@ lancedb = "lancedb.cli.cli:cli"
requires = ["setuptools", "wheel"] requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta" build-backend = "setuptools.build_meta"
[tool.isort]
profile = "black"
[tool.ruff] [tool.ruff]
select = ["F", "E", "W", "I", "G", "TCH", "PERF"] select = ["F", "E", "W", "I", "G", "TCH", "PERF"]

View File

@@ -190,6 +190,48 @@ def test_create_mode(tmp_path):
assert tbl.to_pandas().item.tolist() == ["fizz", "buzz"] assert tbl.to_pandas().item.tolist() == ["fizz", "buzz"]
def test_create_exist_ok(tmp_path):
db = lancedb.connect(tmp_path)
data = pd.DataFrame(
{
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}
)
tbl = db.create_table("test", data=data)
with pytest.raises(OSError):
db.create_table("test", data=data)
# open the table but don't add more rows
tbl2 = db.create_table("test", data=data, exist_ok=True)
assert tbl.name == tbl2.name
assert tbl.schema == tbl2.schema
assert len(tbl) == len(tbl2)
schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), list_size=2)),
pa.field("item", pa.utf8()),
pa.field("price", pa.float64()),
]
)
tbl3 = db.create_table("test", schema=schema, exist_ok=True)
assert tbl3.schema == schema
bad_schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), list_size=2)),
pa.field("item", pa.utf8()),
pa.field("price", pa.float64()),
pa.field("extra", pa.float32()),
]
)
with pytest.raises(ValueError):
db.create_table("test", schema=bad_schema, exist_ok=True)
def test_delete_table(tmp_path): def test_delete_table(tmp_path):
db = lancedb.connect(tmp_path) db = lancedb.connect(tmp_path)
data = pd.DataFrame( data = pd.DataFrame(

View File

@@ -29,7 +29,7 @@ from lancedb.pydantic import LanceModel, Vector
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.parametrize("alias", ["sentence-transformers", "openai"]) @pytest.mark.parametrize("alias", ["sentence-transformers", "openai"])
def test_sentence_transformer(alias, tmp_path): def test_basic_text_embeddings(alias, tmp_path):
db = lancedb.connect(tmp_path) db = lancedb.connect(tmp_path)
registry = get_registry() registry = get_registry()
func = registry.get(alias).create(max_retries=0) func = registry.get(alias).create(max_retries=0)
@@ -89,7 +89,7 @@ def test_openclip(tmp_path):
db = lancedb.connect(tmp_path) db = lancedb.connect(tmp_path)
registry = get_registry() registry = get_registry()
func = registry.get("open-clip").create() func = registry.get("open-clip").create(max_retries=0)
class Images(LanceModel): class Images(LanceModel):
label: str label: str
@@ -170,7 +170,7 @@ def test_cohere_embedding_function():
@pytest.mark.slow @pytest.mark.slow
def test_instructor_embedding(tmp_path): def test_instructor_embedding(tmp_path):
model = get_registry().get("instructor").create() model = get_registry().get("instructor").create(max_retries=0)
class TextModel(LanceModel): class TextModel(LanceModel):
text: str = model.SourceField() text: str = model.SourceField()
@@ -182,3 +182,23 @@ def test_instructor_embedding(tmp_path):
tbl.add(df) tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == model.ndims() assert len(tbl.to_pandas()["vector"][0]) == model.ndims()
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("GOOGLE_API_KEY") is None, reason="GOOGLE_API_KEY not set"
)
def test_gemini_embedding(tmp_path):
model = get_registry().get("gemini-text").create(max_retries=0)
class TextModel(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect(tmp_path)
tbl = db.create_table("test", schema=TextModel, mode="overwrite")
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == model.ndims()
assert tbl.search("hello").limit(1).to_pandas()["text"][0] == "hello world"

View File

@@ -12,6 +12,7 @@
# limitations under the License. # limitations under the License.
import os import os
import random import random
from unittest import mock
import numpy as np import numpy as np
import pandas as pd import pandas as pd
@@ -47,6 +48,7 @@ def table(tmp_path) -> ldb.table.LanceTable:
data=pd.DataFrame( data=pd.DataFrame(
{ {
"vector": vectors, "vector": vectors,
"id": [i % 2 for i in range(100)],
"text": text, "text": text,
"text2": text, "text2": text,
"nested": [{"text": t} for t in text], "nested": [{"text": t} for t in text],
@@ -80,7 +82,7 @@ def test_search_index(tmp_path, table):
def test_create_index_from_table(tmp_path, table): def test_create_index_from_table(tmp_path, table):
table.create_fts_index("text") table.create_fts_index("text")
df = table.search("puppy").limit(10).select(["text"]).to_pandas() df = table.search("puppy").limit(10).select(["text"]).to_pandas()
assert len(df) == 10 assert len(df) <= 10
assert "text" in df.columns assert "text" in df.columns
# Check whether it can be updated # Check whether it can be updated
@@ -88,6 +90,7 @@ def test_create_index_from_table(tmp_path, table):
[ [
{ {
"vector": np.random.randn(128), "vector": np.random.randn(128),
"id": 101,
"text": "gorilla", "text": "gorilla",
"text2": "gorilla", "text2": "gorilla",
"nested": {"text": "gorilla"}, "nested": {"text": "gorilla"},
@@ -121,3 +124,61 @@ def test_nested_schema(tmp_path, table):
table.create_fts_index("nested.text") table.create_fts_index("nested.text")
rs = table.search("puppy").limit(10).to_list() rs = table.search("puppy").limit(10).to_list()
assert len(rs) == 10 assert len(rs) == 10
def test_search_index_with_filter(table):
table.create_fts_index("text")
orig_import = __import__
def import_mock(name, *args):
if name == "duckdb":
raise ImportError
return orig_import(name, *args)
# no duckdb
with mock.patch("builtins.__import__", side_effect=import_mock):
rs = table.search("puppy").where("id=1").limit(10).to_list()
for r in rs:
assert r["id"] == 1
# yes duckdb
rs2 = table.search("puppy").where("id=1").limit(10).to_list()
for r in rs2:
assert r["id"] == 1
assert rs == rs2
def test_null_input(table):
table.add(
[
{
"vector": np.random.randn(128),
"id": 101,
"text": None,
"text2": None,
"nested": {"text": None},
}
]
)
table.create_fts_index("text")
def test_syntax(table):
# https://github.com/lancedb/lancedb/issues/769
table.create_fts_index("text")
with pytest.raises(ValueError, match="Syntax Error"):
table.search("they could have been dogs OR cats").limit(10).to_list()
table.search("they could have been dogs OR cats").phrase_query().limit(10).to_list()
# this should work
table.search('"they could have been dogs OR cats"').limit(10).to_list()
# this should work too
table.search('''"the cats OR dogs were not really 'pets' at all"''').limit(
10
).to_list()
table.search('the cats OR dogs were not really "pets" at all').phrase_query().limit(
10
).to_list()
table.search('the cats OR dogs were not really "pets" at all').phrase_query().limit(
10
).to_list()

View File

@@ -13,9 +13,10 @@
import json import json
import pytz
import sys import sys
from datetime import date, datetime from datetime import date, datetime
from typing import List, Optional from typing import List, Optional, Tuple
import pyarrow as pa import pyarrow as pa
import pydantic import pydantic
@@ -38,11 +39,14 @@ def test_pydantic_to_arrow():
id: int id: int
s: str s: str
vec: list[float] vec: list[float]
li: List[int] li: list[int]
lili: list[list[float]]
litu: list[tuple[float, float]]
opt: Optional[str] = None opt: Optional[str] = None
st: StructModel st: StructModel
dt: date dt: date
dtt: datetime dtt: datetime
dt_with_tz: datetime = Field(json_schema_extra={"tz": "Asia/Shanghai"})
# d: dict # d: dict
m = TestModel( m = TestModel(
@@ -50,9 +54,12 @@ def test_pydantic_to_arrow():
s="hello", s="hello",
vec=[1.0, 2.0, 3.0], vec=[1.0, 2.0, 3.0],
li=[2, 3, 4], li=[2, 3, 4],
lili=[[2.5, 1.5], [3.5, 4.5], [5.5, 6.5]],
litu=[(2.5, 1.5), (3.5, 4.5), (5.5, 6.5)],
st=StructModel(a="a", b=1.0), st=StructModel(a="a", b=1.0),
dt=date.today(), dt=date.today(),
dtt=datetime.now(), dtt=datetime.now(),
dt_with_tz=datetime.now(pytz.timezone("Asia/Shanghai")),
) )
schema = pydantic_to_schema(TestModel) schema = pydantic_to_schema(TestModel)
@@ -63,6 +70,8 @@ def test_pydantic_to_arrow():
pa.field("s", pa.utf8(), False), pa.field("s", pa.utf8(), False),
pa.field("vec", pa.list_(pa.float64()), False), pa.field("vec", pa.list_(pa.float64()), False),
pa.field("li", pa.list_(pa.int64()), False), pa.field("li", pa.list_(pa.int64()), False),
pa.field("lili", pa.list_(pa.list_(pa.float64())), False),
pa.field("litu", pa.list_(pa.list_(pa.float64())), False),
pa.field("opt", pa.utf8(), True), pa.field("opt", pa.utf8(), True),
pa.field( pa.field(
"st", "st",
@@ -73,11 +82,38 @@ def test_pydantic_to_arrow():
), ),
pa.field("dt", pa.date32(), False), pa.field("dt", pa.date32(), False),
pa.field("dtt", pa.timestamp("us"), False), pa.field("dtt", pa.timestamp("us"), False),
pa.field("dt_with_tz", pa.timestamp("us", tz="Asia/Shanghai"), False),
] ]
) )
assert schema == expect_schema assert schema == expect_schema
@pytest.mark.skipif(
sys.version_info < (3, 10),
reason="using | type syntax requires python3.10 or higher",
)
def test_optional_types_py310():
class TestModel(pydantic.BaseModel):
a: str | None
b: None | str
c: Optional[str]
schema = pydantic_to_schema(TestModel)
expect_schema = pa.schema(
[
pa.field("a", pa.utf8(), True),
pa.field("b", pa.utf8(), True),
pa.field("c", pa.utf8(), True),
]
)
assert schema == expect_schema
@pytest.mark.skipif(
sys.version_info > (3, 8),
reason="using native type alias requires python3.9 or higher",
)
def test_pydantic_to_arrow_py38(): def test_pydantic_to_arrow_py38():
class StructModel(pydantic.BaseModel): class StructModel(pydantic.BaseModel):
a: str a: str
@@ -88,10 +124,13 @@ def test_pydantic_to_arrow_py38():
s: str s: str
vec: List[float] vec: List[float]
li: List[int] li: List[int]
lili: List[List[float]]
litu: List[Tuple[float, float]]
opt: Optional[str] = None opt: Optional[str] = None
st: StructModel st: StructModel
dt: date dt: date
dtt: datetime dtt: datetime
dt_with_tz: datetime = Field(json_schema_extra={"tz": "Asia/Shanghai"})
# d: dict # d: dict
m = TestModel( m = TestModel(
@@ -99,9 +138,12 @@ def test_pydantic_to_arrow_py38():
s="hello", s="hello",
vec=[1.0, 2.0, 3.0], vec=[1.0, 2.0, 3.0],
li=[2, 3, 4], li=[2, 3, 4],
lili=[[2.5, 1.5], [3.5, 4.5], [5.5, 6.5]],
litu=[(2.5, 1.5), (3.5, 4.5), (5.5, 6.5)],
st=StructModel(a="a", b=1.0), st=StructModel(a="a", b=1.0),
dt=date.today(), dt=date.today(),
dtt=datetime.now(), dtt=datetime.now(),
dt_with_tz=datetime.now(pytz.timezone("Asia/Shanghai")),
) )
schema = pydantic_to_schema(TestModel) schema = pydantic_to_schema(TestModel)
@@ -112,6 +154,8 @@ def test_pydantic_to_arrow_py38():
pa.field("s", pa.utf8(), False), pa.field("s", pa.utf8(), False),
pa.field("vec", pa.list_(pa.float64()), False), pa.field("vec", pa.list_(pa.float64()), False),
pa.field("li", pa.list_(pa.int64()), False), pa.field("li", pa.list_(pa.int64()), False),
pa.field("lili", pa.list_(pa.list_(pa.float64())), False),
pa.field("litu", pa.list_(pa.list_(pa.float64())), False),
pa.field("opt", pa.utf8(), True), pa.field("opt", pa.utf8(), True),
pa.field( pa.field(
"st", "st",
@@ -122,6 +166,7 @@ def test_pydantic_to_arrow_py38():
), ),
pa.field("dt", pa.date32(), False), pa.field("dt", pa.date32(), False),
pa.field("dtt", pa.timestamp("us"), False), pa.field("dtt", pa.timestamp("us"), False),
pa.field("dt_with_tz", pa.timestamp("us", tz="Asia/Shanghai"), False),
] ]
) )
assert schema == expect_schema assert schema == expect_schema

View File

@@ -18,15 +18,15 @@ from lancedb.remote.client import VectorQuery, VectorQueryResult
class FakeLanceDBClient: class FakeLanceDBClient:
async def close(self): def close(self):
pass pass
async def query(self, table_name: str, query: VectorQuery) -> VectorQueryResult: def query(self, table_name: str, query: VectorQuery) -> VectorQueryResult:
assert table_name == "test" assert table_name == "test"
t = pa.schema([]).empty_table() t = pa.schema([]).empty_table()
return VectorQueryResult(t) return VectorQueryResult(t)
async def post(self, path: str): def post(self, path: str):
pass pass

View File

@@ -20,6 +20,7 @@ from unittest.mock import PropertyMock, patch
import lance import lance
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import polars as pl
import pyarrow as pa import pyarrow as pa
import pytest import pytest
from pydantic import BaseModel from pydantic import BaseModel
@@ -182,6 +183,46 @@ def test_add_pydantic_model(db):
assert len(really_flattened.columns) == 7 assert len(really_flattened.columns) == 7
def test_polars(db):
data = {
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}
# Ingest polars dataframe
table = LanceTable.create(db, "test", data=pl.DataFrame(data))
assert len(table) == 2
result = table.to_pandas()
assert np.allclose(result["vector"].tolist(), data["vector"])
assert result["item"].tolist() == data["item"]
assert np.allclose(result["price"].tolist(), data["price"])
schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), 2)),
pa.field("item", pa.large_string()),
pa.field("price", pa.float64()),
]
)
assert table.schema == schema
# search results to polars dataframe
q = [3.1, 4.1]
result = table.search(q).limit(1).to_polars()
assert np.allclose(result["vector"][0], q)
assert result["item"][0] == "foo"
assert np.allclose(result["price"][0], 10.0)
# enter table to polars dataframe
result = table.to_polars()
assert np.allclose(result.collect()["vector"].to_list(), data["vector"])
# make sure filtering isn't broken
filtered_result = result.filter(pl.col("item").is_in(["foo", "bar"])).collect()
assert len(filtered_result) == 2
def _add(table, schema): def _add(table, schema):
# table = LanceTable(db, "test") # table = LanceTable(db, "test")
assert len(table) == 2 assert len(table) == 2
@@ -569,6 +610,14 @@ def test_empty_query(db):
val = df.id.iloc[0] val = df.id.iloc[0]
assert val == 1 assert val == 1
table = LanceTable.create(db, "my_table2", data=[{"id": i} for i in range(100)])
df = table.search().select(["id"]).to_pandas()
assert len(df) == 10
df = table.search().select(["id"]).limit(None).to_pandas()
assert len(df) == 100
df = table.search().select(["id"]).limit(-1).to_pandas()
assert len(df) == 100
def test_compact_cleanup(db): def test_compact_cleanup(db):
table = LanceTable.create( table = LanceTable.create(
@@ -597,3 +646,14 @@ def test_compact_cleanup(db):
with pytest.raises(Exception, match="Version 3 no longer exists"): with pytest.raises(Exception, match="Version 3 no longer exists"):
table.checkout(3) table.checkout(3)
def test_count_rows(db):
table = LanceTable.create(
db,
"my_table",
data=[{"text": "foo", "id": 0}, {"text": "bar", "id": 1}],
)
assert len(table) == 2
assert table.count_rows() == 2
assert table.count_rows(filter="text='bar'") == 1

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "vectordb-node" name = "vectordb-node"
version = "0.4.1" version = "0.4.3"
description = "Serverless, low-latency vector database for AI applications" description = "Serverless, low-latency vector database for AI applications"
license = "Apache-2.0" license = "Apache-2.0"
edition = "2018" edition = "2018"

View File

@@ -36,7 +36,7 @@ fn validate_vector_column(record_batch: &RecordBatch) -> Result<()> {
pub(crate) fn arrow_buffer_to_record_batch(slice: &[u8]) -> Result<(Vec<RecordBatch>, SchemaRef)> { pub(crate) fn arrow_buffer_to_record_batch(slice: &[u8]) -> Result<(Vec<RecordBatch>, SchemaRef)> {
let mut batches: Vec<RecordBatch> = Vec::new(); let mut batches: Vec<RecordBatch> = Vec::new();
let file_reader = FileReader::try_new(Cursor::new(slice), None)?; let file_reader = FileReader::try_new(Cursor::new(slice), None)?;
let schema = file_reader.schema().clone(); let schema = file_reader.schema();
for b in file_reader { for b in file_reader {
let record_batch = b?; let record_batch = b?;
validate_vector_column(&record_batch)?; validate_vector_column(&record_batch)?;
@@ -50,7 +50,7 @@ pub(crate) fn record_batch_to_buffer(batches: Vec<RecordBatch>) -> Result<Vec<u8
return Ok(Vec::new()); return Ok(Vec::new());
} }
let schema = batches.get(0).unwrap().schema(); let schema = batches.first().unwrap().schema();
let mut fr = FileWriter::try_new(Vec::new(), schema.deref())?; let mut fr = FileWriter::try_new(Vec::new(), schema.deref())?;
for batch in batches.iter() { for batch in batches.iter() {
fr.write(batch)? fr.write(batch)?

View File

@@ -13,6 +13,9 @@
// limitations under the License. // limitations under the License.
use neon::prelude::*; use neon::prelude::*;
use neon::types::buffer::TypedArray;
use crate::error::ResultExt;
pub(crate) fn vec_str_to_array<'a, C: Context<'a>>( pub(crate) fn vec_str_to_array<'a, C: Context<'a>>(
vec: &Vec<String>, vec: &Vec<String>,
@@ -34,3 +37,20 @@ pub(crate) fn js_array_to_vec(array: &JsArray, cx: &mut FunctionContext) -> Vec<
} }
query_vec query_vec
} }
// Creates a new JsBuffer from a rust buffer with a special logic for electron
pub(crate) fn new_js_buffer<'a>(
buffer: Vec<u8>,
cx: &mut TaskContext<'a>,
is_electron: bool,
) -> NeonResult<Handle<'a, JsBuffer>> {
if is_electron {
// Electron does not support `external`: https://github.com/neon-bindings/neon/pull/937
let mut js_buffer = JsBuffer::new(cx, buffer.len()).or_throw(cx)?;
let buffer_data = js_buffer.as_mut_slice(cx);
buffer_data.copy_from_slice(buffer.as_slice());
Ok(js_buffer)
} else {
Ok(JsBuffer::external(cx, buffer))
}
}

View File

@@ -250,5 +250,6 @@ fn main(mut cx: ModuleContext) -> NeonResult<()> {
"tableCreateVectorIndex", "tableCreateVectorIndex",
index::vector::table_create_vector_index, index::vector::table_create_vector_index,
)?; )?;
cx.export_function("tableSchema", JsTable::js_schema)?;
Ok(()) Ok(())
} }

View File

@@ -7,7 +7,6 @@ use lance_linalg::distance::MetricType;
use neon::context::FunctionContext; use neon::context::FunctionContext;
use neon::handle::Handle; use neon::handle::Handle;
use neon::prelude::*; use neon::prelude::*;
use neon::types::buffer::TypedArray;
use crate::arrow::record_batch_to_buffer; use crate::arrow::record_batch_to_buffer;
use crate::error::ResultExt; use crate::error::ResultExt;
@@ -96,26 +95,9 @@ impl JsQuery {
deferred.settle_with(&channel, move |mut cx| { deferred.settle_with(&channel, move |mut cx| {
let results = results.or_throw(&mut cx)?; let results = results.or_throw(&mut cx)?;
let buffer = record_batch_to_buffer(results).or_throw(&mut cx)?; let buffer = record_batch_to_buffer(results).or_throw(&mut cx)?;
Self::new_js_buffer(buffer, &mut cx, is_electron) convert::new_js_buffer(buffer, &mut cx, is_electron)
}); });
}); });
Ok(promise) Ok(promise)
} }
// Creates a new JsBuffer from a rust buffer with a special logic for electron
fn new_js_buffer<'a>(
buffer: Vec<u8>,
cx: &mut TaskContext<'a>,
is_electron: bool,
) -> NeonResult<Handle<'a, JsBuffer>> {
if is_electron {
// Electron does not support `external`: https://github.com/neon-bindings/neon/pull/937
let mut js_buffer = JsBuffer::new(cx, buffer.len()).or_throw(cx)?;
let buffer_data = js_buffer.as_mut_slice(cx);
buffer_data.copy_from_slice(buffer.as_slice());
Ok(js_buffer)
} else {
Ok(JsBuffer::external(cx, buffer))
}
}
} }

View File

@@ -12,18 +12,18 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use arrow_array::RecordBatchIterator; use arrow_array::{RecordBatch, RecordBatchIterator};
use lance::dataset::optimize::CompactionOptions; use lance::dataset::optimize::CompactionOptions;
use lance::dataset::{WriteMode, WriteParams}; use lance::dataset::{WriteMode, WriteParams};
use lance::io::object_store::ObjectStoreParams; use lance::io::object_store::ObjectStoreParams;
use crate::arrow::arrow_buffer_to_record_batch; use crate::arrow::{arrow_buffer_to_record_batch, record_batch_to_buffer};
use neon::prelude::*; use neon::prelude::*;
use neon::types::buffer::TypedArray; use neon::types::buffer::TypedArray;
use vectordb::Table; use vectordb::Table;
use crate::error::ResultExt; use crate::error::ResultExt;
use crate::{get_aws_creds, get_aws_region, runtime, JsDatabase}; use crate::{convert, get_aws_creds, get_aws_region, runtime, JsDatabase};
pub(crate) struct JsTable { pub(crate) struct JsTable {
pub table: Table, pub table: Table,
@@ -426,4 +426,27 @@ impl JsTable {
Ok(promise) Ok(promise)
} }
pub(crate) fn js_schema(mut cx: FunctionContext) -> JsResult<JsPromise> {
let js_table = cx.this().downcast_or_throw::<JsBox<JsTable>, _>(&mut cx)?;
let rt = runtime(&mut cx)?;
let (deferred, promise) = cx.promise();
let channel = cx.channel();
let table = js_table.table.clone();
let is_electron = cx
.argument::<JsBoolean>(0)
.or_throw(&mut cx)?
.value(&mut cx);
rt.spawn(async move {
deferred.settle_with(&channel, move |mut cx| {
let schema = table.schema();
let batches = vec![RecordBatch::new_empty(schema)];
let buffer = record_batch_to_buffer(batches).or_throw(&mut cx)?;
convert::new_js_buffer(buffer, &mut cx, is_electron)
})
});
Ok(promise)
}
} }

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "vectordb" name = "vectordb"
version = "0.4.1" version = "0.4.3"
edition = "2021" edition = "2021"
description = "LanceDB: A serverless, low-latency vector database for AI applications" description = "LanceDB: A serverless, low-latency vector database for AI applications"
license = "Apache-2.0" license = "Apache-2.0"