Compare commits

...

58 Commits

Author SHA1 Message Date
Lance Release
5517e102c3 Bump version: 0.14.1-beta.0 → 0.14.1-beta.1 2024-10-23 00:33:40 +00:00
Will Jones
82197c54e4 perf: eliminate iop in refresh (#1760)
Closes #1741

If we checkout a version, we need to make a `HEAD` request to get the
size of the manifest. The new `checkout_latest()` code path can skip
this IOP. This makes the refresh slightly faster.
2024-10-18 13:40:24 -07:00
Will Jones
48f46d4751 docs(node): update indexStats signature and regenerate docs (#1742)
`indexStats` still referenced UUID even though in
https://github.com/lancedb/lancedb/pull/1702 we changed it to take name
instead.
2024-10-18 10:53:28 -07:00
Lance Release
437316cbbc Updating package-lock.json 2024-10-17 18:59:18 +00:00
Lance Release
d406eab2c8 Bump version: 0.11.0 → 0.11.1-beta.0 2024-10-17 18:59:01 +00:00
Lance Release
1f41101897 Bump version: 0.14.0 → 0.14.1-beta.0 2024-10-17 18:58:45 +00:00
Will Jones
99e4db0d6a feat(rust): allow add_embedding on create_empty_table (#1754)
Fixes https://github.com/lancedb/lancedb/issues/1750
2024-10-17 11:58:15 -07:00
Will Jones
46486d4d22 fix: list_indices can handle fts indexes (#1753)
Fixes #1752
2024-10-16 10:39:40 -07:00
Weston Pace
f43cb8bba1 feat: upgrade lance to 0.18.3 (#1748) 2024-10-16 00:48:31 -07:00
James Wu
38eb05f297 fix(python): remove dependency on retry package (#1749)
## user story

fixes https://github.com/lancedb/lancedb/issues/1480

https://github.com/invl/retry has not had an update in 8 years, one if
its sub-dependencies via requirements.txt
(https://github.com/pytest-dev/py) is no longer maintained and has a
high severity vulnerability (CVE-2022-42969).

retry is only used for a single function in the python codebase for a
deprecated helper function `with_embeddings`, which was created for an
older tutorial (https://github.com/lancedb/lancedb/pull/12) [but is now
deprecated](https://lancedb.github.io/lancedb/embeddings/legacy/).

## changes

i backported a limited range of functionality of the `@retry()`
decorator directly into lancedb so that we no longer have a dependency
to the `retry` package.

## tests

```
/Users/james/src/lancedb/python $ ruff check .
All checks passed!
/Users/james/src/lancedb/python $ pytest python/tests/test_embeddings.py
python/tests/test_embeddings.py .......s....                                                                                                                        [100%]
================================================================ 11 passed, 1 skipped, 2 warnings in 7.08s ================================================================
```
2024-10-15 15:13:57 -07:00
Ryan Green
679a70231e feat: allow fast_search on python remote table (#1747)
Add `fast_search` parameter to query builder and remote table to support
skipping flat search in remote search
2024-10-14 14:39:54 -06:00
Dominik Weckmüller
e7b56b7b2a docs: add permanent link chain icon to headings without impacting SEO (#1746)
I noted that there are no permanent links in the docs. Adapted the
current best solution from
https://github.com/squidfunk/mkdocs-material/discussions/3535. It adds a
GitHub-like chain icon to the left of each heading (right on mobile) and
does not impact SEO unlike the default solution with pilcrow char `¶`
that might show up on google search results.

<img alt="image"
src="https://user-images.githubusercontent.com/182589/153004627-6df3f8e9-c747-4f43-bd62-a8dabaa96c3f.gif">
2024-10-14 11:58:23 -07:00
Olzhas Alexandrov
5ccd0edec2 docs: clarify infrastructure requirements for S3 Express One Zone (#1745) 2024-10-11 14:06:28 -06:00
Will Jones
9c74c435e0 ci: update package lock (#1740) 2024-10-09 15:14:08 -06:00
Lance Release
6de53ce393 Updating package-lock.json 2024-10-09 18:54:29 +00:00
Lance Release
9f42fbba96 Bump version: 0.11.0-beta.2 → 0.11.0 2024-10-09 18:54:09 +00:00
Lance Release
d892f7a622 Bump version: 0.11.0-beta.1 → 0.11.0-beta.2 2024-10-09 18:54:04 +00:00
Lance Release
515ab5f417 Bump version: 0.14.0-beta.1 → 0.14.0 2024-10-09 18:53:35 +00:00
Lance Release
8d0055fe6b Bump version: 0.14.0-beta.0 → 0.14.0-beta.1 2024-10-09 18:53:34 +00:00
Will Jones
5f9d8509b3 feat: upgrade Lance to v0.18.2 (#1737)
Includes changes from v0.18.1 and v0.18.2:

* [v0.18.1 change
log](https://github.com/lancedb/lance/releases/tag/v0.18.1)
* [v0.18.2 change
log](https://github.com/lancedb/lance/releases/tag/v0.18.2)

Closes #1656
Closes #1615
Closes #1661
2024-10-09 11:46:46 -06:00
Will Jones
f3b6a1f55b feat(node): bind remote SDK to rust implementation (#1730)
Closes [#2509](https://github.com/lancedb/sophon/issues/2509)

This is the Node.js analogue of #1700
2024-10-09 11:46:27 -06:00
Will Jones
aff25e3bf9 fix(node): add native packages to bump version (#1738)
We weren't bumping the version, so when users downloaded our package
from npm, they were getting the old binaries.
2024-10-08 23:03:53 -06:00
Will Jones
8509f73221 feat: better errors for remote SDK (#1722)
* Adds nicer errors to remote SDK, that expose useful properties like
`request_id` and `status_code`.
* Makes sure the Python tracebacks print nicely by mapping the `source`
field from a Rust error to the `__cause__` field.
2024-10-08 22:21:13 -06:00
Will Jones
607476788e feat(rust): list_indices in remote SDK (#1726)
Implements `list_indices`.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-10-08 21:45:21 -06:00
Gagan Bhullar
4d458d5829 feat(python): drop support for dictionary in Table.add (#1725)
PR closes #1706
2024-10-08 20:41:08 -06:00
Will Jones
e61ba7f4e2 fix(rust): remote SDK bugs (#1723)
A few bugs uncovered by integration tests:

* We didn't prepend `/v1` to the Table endpoint URLs
* `/create_index` takes `metric_type` not `distance_type`. (This is also
an error in the OpenAPI docs.)
* `/create_index` expects the `metric_type` parameter to always be
lowercase.
* We were writing an IPC file message when we were supposed to send an
IPC stream message.
2024-10-04 08:43:07 -07:00
Prashant Dixit
408bc96a44 fix: broken notebook link fix (#1721) 2024-10-03 16:15:27 +05:30
Rithik Kumar
6ceaf8b06e docs: add langchainjs writing assistant (#1719) 2024-10-03 00:55:00 +05:30
Prashant Dixit
e2ca8daee1 docs: saleforce's sfr rag (#1717)
This PR adds Salesforce's newly released SFR RAG
2024-10-02 21:15:24 +05:30
Will Jones
f305f34d9b feat(python): bind python async remote client to rust client (#1700)
Closes [#1638](https://github.com/lancedb/lancedb/issues/1638)

This just binds the Python Async client to the Rust remote client.
2024-10-01 15:46:59 -07:00
Will Jones
a416925ca1 feat(rust): client configuration for remote client (#1696)
This PR ports over advanced client configuration present in the Python
`RestfulLanceDBClient` to the Rust one. The goal is to have feature
parity so we can replace the implementation.

* [x] Request timeout
* [x] Retries with backoff
* [x] Request id generation
* [x] User agent (with default tied to library version  )
* [x] Table existence cache
* [ ] Deferred: ~Request id customization (should this just pick up OTEL
trace ids?)~

Fixes #1684
2024-10-01 10:22:53 -07:00
Will Jones
2c4b07eb17 feat(python): merge_insert in async Python (#1707)
Fixes #1401
2024-10-01 10:06:52 -07:00
Will Jones
33b402c861 fix: list_indices returns correct index type (#1715)
Fixes https://github.com/lancedb/lancedb/issues/1711

Doesn't address this https://github.com/lancedb/lance/issues/2039

Instead we load the index statistics, which seems to contain the index
type. However, this involves more IO than previously. I'm not sure
whether we care that much. If we do, we can fix that upstream Lance
issue.
2024-10-01 09:16:18 -07:00
Rithik Kumar
7b2cdd2269 docs: revamp Voxel51 v1 (#1714)
Revamp Voxel51

![image](https://github.com/user-attachments/assets/7ac34457-74ec-4654-b1d1-556e3d7357f5)
2024-10-01 11:59:03 +05:30
Akash Saravanan
d6b5054778 feat(python): add support for trust_remote_code in hf embeddings (#1712)
Resovles #1709. Adds `trust_remote_code` as a parameter to the
`TransformersEmbeddingFunction` class with a default of False. Updated
relevant documentation with the same.
2024-10-01 01:06:28 +05:30
Lei Xu
f0e7f5f665 ci: change to use github runner (#1708)
Use github runner
2024-09-27 17:53:05 -07:00
Will Jones
f958f4d2e8 feat: remote index stats (#1702)
BREAKING CHANGE: the return value of `index_stats` method has changed
and all `index_stats` APIs now take index name instead of UUID. Also
several deprecated index statistics methods were removed.

* Removes deprecated methods for individual index statistics
* Aligns public `IndexStatistics` struct with API response from LanceDB
Cloud.
* Implements `index_stats` for remote Rust SDK and Python async API.
2024-09-27 12:10:00 -07:00
Will Jones
c1d9d6f70b feat(rust): remote rename table (#1703)
Adds rename to remote table. Pre-requisite for
https://github.com/lancedb/lancedb/pull/1701
2024-09-27 09:37:54 -07:00
Will Jones
1778219ea9 feat(rust): remote client query and create_index endpoints (#1663)
Support for `query` and `create_index`.

Closes [#2519](https://github.com/lancedb/sophon/issues/2519)
2024-09-27 09:00:22 -07:00
Rob Meng
ee6c18f207 feat: expose underlying dataset uri of the table (#1704) 2024-09-27 10:20:02 -04:00
rjrobben
e606a455df fix(EmbeddingFunction): modify safe_model_dump to explicitly exclude class fields with underscore (#1688)
Resolve issue #1681

---------

Co-authored-by: rjrobben <rjrobben123@gmail.com>
2024-09-25 11:53:49 -07:00
Gagan Bhullar
8f0eb34109 fix: hnsw default partitions (#1667)
PR fixes #1662

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-09-25 09:16:03 -07:00
Ayush Chaurasia
2f2721e242 feat(python): allow explicit hybrid search query pattern in SaaS (feat parity) (#1698)
-  fixes https://github.com/lancedb/lancedb/issues/1697.
- unifies vector column inference logic for remote and local table to
prevent future disparities.
- Updates docstring in RemoteTable to specify empty queries are not
supported
2024-09-25 21:04:00 +05:30
QianZhu
f00b21c98c fix: metric type for python/node search api (#1689) 2024-09-24 16:10:29 -07:00
Lance Release
962b3afd17 Updating package-lock.json 2024-09-24 16:51:37 +00:00
Lance Release
b72ac073ab Bump version: 0.11.0-beta.0 → 0.11.0-beta.1 2024-09-24 16:51:16 +00:00
Bert
3152ccd13c fix: re-add hostOverride arg to ConnectionOptions (#1694)
Fixes issue where hostOverride was no-longer passed through to
RemoteConnection
2024-09-24 13:29:26 -03:00
Bert
d5021356b4 feat: add fast_search to vectordb (#1693) 2024-09-24 13:28:54 -03:00
Will Jones
e82f63b40a fix(node): pass no const enum (#1690)
Apparently this is a no-no for libraries.
https://ncjamieson.com/dont-export-const-enums/

Fixes [#1664](https://github.com/lancedb/lancedb/issues/1664)
2024-09-24 07:41:42 -07:00
Ayush Chaurasia
f81ce68e41 fix(python): force deduce vector column name if running explicit hybrid query (#1692)
Right now when passing vector and query explicitly for hybrid search ,
vector_column_name is not deduced.
(https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/#hybrid-search-in-lancedb
). Because vector and query can be both none when initialising the
QueryBuilder in this case. This PR forces deduction of query type if it
is set to "hybrid"
2024-09-24 19:02:56 +05:30
Will Jones
f5c25b6fff ci: run clippy on tests (#1659) 2024-09-23 07:33:47 -07:00
Ayush Chaurasia
86978e7588 feat!: enforce all rerankers always return relevance score & deprecate linear combination fixes (#1687)
- Enforce all rerankers always return _relevance_score. This was already
loosely done in tests before but based on user feedback its better to
always have _relevance_score present in all reranked results
- Deprecate LinearCombinationReranker in docs. And also fix a case where
it would not return _relevance_score if one result set was missing
2024-09-23 12:12:02 +05:30
Lei Xu
7c314d61cc chore: add error handling for openai embedding generation (#1680) 2024-09-23 12:10:56 +05:30
Lei Xu
7a8d2f37c4 feat(rust): add with_row_id to rust SDK (#1683) 2024-09-21 21:26:19 -07:00
Rithik Kumar
11072b9edc docs: phidata integration page (#1678)
Added new integration page for phidata :

![image](https://github.com/user-attachments/assets/8cd9b420-f249-4eac-ac13-ae53983822be)
2024-09-21 00:40:47 +05:30
Lei Xu
915d828cee feat!: set embeddings to Null if embedding function return invalid results (#1674) 2024-09-19 23:16:20 -07:00
Lance Release
d9a72adc58 Updating package-lock.json 2024-09-19 17:53:19 +00:00
Lance Release
d6cf2dafc6 Bump version: 0.10.0 → 0.11.0-beta.0 2024-09-19 17:53:00 +00:00
124 changed files with 6162 additions and 2610 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.10.0"
current_version = "0.11.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.
@@ -66,6 +66,32 @@ glob = "nodejs/npm/*/package.json"
replace = "\"version\": \"{new_version}\","
search = "\"version\": \"{current_version}\","
# vectodb node binary packages
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-darwin-arm64\": \"{new_version}\""
search = "\"@lancedb/vectordb-darwin-arm64\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-darwin-x64\": \"{new_version}\""
search = "\"@lancedb/vectordb-darwin-x64\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-arm64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-arm64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-x64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-x64-msvc\": \"{current_version}\""
# Cargo files
# ------------
[[tool.bumpversion.files]]
@@ -77,3 +103,8 @@ search = "\nversion = \"{current_version}\""
filename = "rust/lancedb/Cargo.toml"
replace = "\nversion = \"{new_version}\""
search = "\nversion = \"{current_version}\""
[[tool.bumpversion.files]]
filename = "nodejs/Cargo.toml"
replace = "\nversion = \"{new_version}\""
search = "\nversion = \"{current_version}\""

View File

@@ -24,7 +24,7 @@ env:
jobs:
test-python:
name: Test doc python code
runs-on: "warp-ubuntu-latest-x64-4x"
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
@@ -60,7 +60,7 @@ jobs:
for d in *; do cd "$d"; echo "$d".py; python "$d".py; cd ..; done
test-node:
name: Test doc nodejs code
runs-on: "warp-ubuntu-latest-x64-4x"
runs-on: ubuntu-24.04
timeout-minutes: 60
strategy:
fail-fast: false

View File

@@ -26,15 +26,14 @@ env:
jobs:
lint:
timeout-minutes: 30
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
defaults:
run:
shell: bash
working-directory: rust
env:
# Need up-to-date compilers for kernels
CC: gcc-12
CXX: g++-12
CC: clang-18
CXX: clang++-18
steps:
- uses: actions/checkout@v4
with:
@@ -50,21 +49,21 @@ jobs:
- name: Run format
run: cargo fmt --all -- --check
- name: Run clippy
run: cargo clippy --all --all-features -- -D warnings
run: cargo clippy --workspace --tests --all-features -- -D warnings
linux:
timeout-minutes: 30
# To build all features, we need more disk space than is available
# on the GitHub-provided runner. This is mostly due to the the
# on the free OSS github runner. This is mostly due to the the
# sentence-transformers feature.
runs-on: warp-ubuntu-latest-x64-4x
runs-on: ubuntu-2404-4x-x64
defaults:
run:
shell: bash
working-directory: rust
env:
# Need up-to-date compilers for kernels
CC: gcc-12
CXX: g++-12
CC: clang-18
CXX: clang++-18
steps:
- uses: actions/checkout@v4
with:
@@ -77,6 +76,12 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Make Swap
run: |
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
- name: Start S3 integration test environment
working-directory: .
run: docker compose up --detach --wait

View File

@@ -20,13 +20,15 @@ keywords = ["lancedb", "lance", "database", "vector", "search"]
categories = ["database-implementations"]
[workspace.dependencies]
lance = { "version" = "=0.18.0", "features" = ["dynamodb"] }
lance-index = { "version" = "=0.18.0" }
lance-linalg = { "version" = "=0.18.0" }
lance-table = { "version" = "=0.18.0" }
lance-testing = { "version" = "=0.18.0" }
lance-datafusion = { "version" = "=0.18.0" }
lance-encoding = { "version" = "=0.18.0" }
lance = { "version" = "=0.18.3", "features" = [
"dynamodb",
], git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-index = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-linalg = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-table = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-testing = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-datafusion = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
lance-encoding = { "version" = "=0.18.3", git = "https://github.com/lancedb/lance.git", tag = "v0.18.3-beta.2" }
# Note that this one does not include pyarrow
arrow = { version = "52.2", optional = false }
arrow-array = "52.2"
@@ -38,16 +40,19 @@ arrow-arith = "52.2"
arrow-cast = "52.2"
async-trait = "0"
chrono = "0.4.35"
datafusion-physical-plan = "40.0"
datafusion-common = "41.0"
datafusion-physical-plan = "41.0"
half = { "version" = "=2.4.1", default-features = false, features = [
"num-traits",
] }
futures = "0"
log = "0.4"
moka = { version = "0.11", features = ["future"] }
object_store = "0.10.2"
pin-project = "1.0.7"
snafu = "0.7.4"
url = "2"
num-traits = "0.2"
rand = "0.8"
regex = "1.10"
lazy_static = "1"

View File

@@ -82,4 +82,4 @@ result = table.search([100, 100]).limit(2).to_pandas()
## Blogs, Tutorials & Videos
* 📈 <a href="https://blog.lancedb.com/benchmarking-random-access-in-lance/">2000x better performance with Lance over Parquet</a>
* 🤖 <a href="https://github.com/lancedb/lancedb/blob/main/docs/src/notebooks/youtube_transcript_search.ipynb">Build a question and answer bot with LanceDB</a>
* 🤖 <a href="https://github.com/lancedb/vectordb-recipes/tree/main/examples/Youtube-Search-QA-Bot">Build a question and answer bot with LanceDB</a>

View File

@@ -34,6 +34,7 @@ theme:
- navigation.footer
- navigation.tracking
- navigation.instant
- content.footnote.tooltips
icon:
repo: fontawesome/brands/github
annotation: material/arrow-right-circle
@@ -65,6 +66,11 @@ plugins:
markdown_extensions:
- admonition
- footnotes
- pymdownx.critic
- pymdownx.caret
- pymdownx.keys
- pymdownx.mark
- pymdownx.tilde
- pymdownx.details
- pymdownx.highlight:
anchor_linenums: true
@@ -84,6 +90,9 @@ markdown_extensions:
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- markdown.extensions.toc:
baselevel: 1
permalink: ""
nav:
- Home:
@@ -114,6 +123,7 @@ nav:
- Graph RAG: rag/graph_rag.md
- Self RAG: rag/self_rag.md
- Adaptive RAG: rag/adaptive_rag.md
- SFR RAG: rag/sfr_rag.md
- Advanced Techniques:
- HyDE: rag/advanced_techniques/hyde.md
- FLARE: rag/advanced_techniques/flare.md
@@ -177,6 +187,7 @@ nav:
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- 🎯 Examples:
- Overview: examples/index.md
- 🐍 Python:
@@ -240,6 +251,7 @@ nav:
- Graph RAG: rag/graph_rag.md
- Self RAG: rag/self_rag.md
- Adaptive RAG: rag/adaptive_rag.md
- SFR RAG: rag/sfr_rag.md
- Advanced Techniques:
- HyDE: rag/advanced_techniques/hyde.md
- FLARE: rag/advanced_techniques/flare.md
@@ -299,6 +311,7 @@ nav:
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Examples:
- examples/index.md
- 🐍 Python:
@@ -354,4 +367,5 @@ extra:
- icon: fontawesome/brands/x-twitter
link: https://twitter.com/lancedb
- icon: fontawesome/brands/linkedin
link: https://www.linkedin.com/company/lancedb
link: https://www.linkedin.com/company/lancedb

View File

@@ -1,5 +1,5 @@
# Huggingface embedding models
We offer support for all huggingface models (which can be loaded via [transformers](https://huggingface.co/docs/transformers/en/index) library). The default model is `colbert-ir/colbertv2.0` which also has its own special callout - `registry.get("colbert")`
We offer support for all Hugging Face models (which can be loaded via [transformers](https://huggingface.co/docs/transformers/en/index) library). The default model is `colbert-ir/colbertv2.0` which also has its own special callout - `registry.get("colbert")`. Some Hugging Face models might require custom models defined on the HuggingFace Hub in their own modeling files. You may enable this by setting `trust_remote_code=True`. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
Example usage -
```python

View File

@@ -8,9 +8,15 @@ LanceDB provides language APIs, allowing you to embed a database in your languag
* 👾 [JavaScript](examples_js.md) examples
* 🦀 Rust examples (coming soon)
## Applications powered by LanceDB
## Python Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Ultralytics Explorer 🚀**<br>[![Ultralytics](https://img.shields.io/badge/Ultralytics-Docs-green?labelColor=0f3bc4&style=flat-square&logo=https://cdn.prod.website-files.com/646dd1f1a3703e451ba81ecc/64994922cf2a6385a4bf4489_UltralyticsYOLO_mark_blue.svg&link=https://docs.ultralytics.com/datasets/explorer/)](https://docs.ultralytics.com/datasets/explorer/)<br>[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb) | - 🔍 **Explore CV Datasets**: Semantic search, SQL queries, vector similarity, natural language.<br>- 🖥️ **GUI & Python API**: Seamless dataset interaction.<br>- ⚡ **Efficient & Scalable**: Leverages LanceDB for large datasets.<br>- 📊 **Detailed Analysis**: Easily analyze data patterns.<br>- 🌐 **Browser GUI Demo**: Create embeddings, search images, run queries. |
| **Website Chatbot🤖**<br>[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/lancedb/lancedb-vercel-chatbot)<br>[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Flancedb%2Flancedb-vercel-chatbot&amp;env=OPENAI_API_KEY&amp;envDescription=OpenAI%20API%20Key%20for%20chat%20completion.&amp;project-name=lancedb-vercel-chatbot&amp;repository-name=lancedb-vercel-chatbot&amp;demo-title=LanceDB%20Chatbot%20Demo&amp;demo-description=Demo%20website%20chatbot%20with%20LanceDB.&amp;demo-url=https%3A%2F%2Flancedb.vercel.app&amp;demo-image=https%3A%2F%2Fi.imgur.com%2FazVJtvr.png) | - 🌐 **Chatbot from Sitemap/Docs**: Create a chatbot using site or document context.<br>- 🚀 **Embed LanceDB in Next.js**: Lightweight, on-prem storage.<br>- 🧠 **AI-Powered Context Retrieval**: Efficiently access relevant data.<br>- 🔧 **Serverless & Native JS**: Seamless integration with Next.js.<br>- ⚡ **One-Click Deploy on Vercel**: Quick and easy setup.. |
## Nodejs Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Langchain Writing Assistant✍ **<br>[![Github](../assets/github.svg)](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/lanchain_writing_assistant) | - **📂 Data Source Integration**: Use your own data by specifying data source file, and the app instantly processes it to provide insights. <br>- **🧠 Intelligent Suggestions**: Powered by LangChain.js and LanceDB, it improves writing productivity and accuracy. <br>- **💡 Enhanced Writing Experience**: It delivers real-time contextual insights and factual suggestions while the user writes. |

View File

@@ -498,7 +498,7 @@ This can also be done with the ``AWS_ENDPOINT`` and ``AWS_DEFAULT_REGION`` envir
#### S3 Express
LanceDB supports [S3 Express One Zone](https://aws.amazon.com/s3/storage-classes/express-one-zone/) endpoints, but requires additional configuration. Also, S3 Express endpoints only support connecting from an EC2 instance within the same region.
LanceDB supports [S3 Express One Zone](https://aws.amazon.com/s3/storage-classes/express-one-zone/) endpoints, but requires additional infrastructure configuration for the compute service, such as EC2 or Lambda. Please refer to [Networking requirements for S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-networking.html).
To configure LanceDB to use an S3 Express endpoint, you must set the storage option `s3_express`. The bucket name in your table URI should **include the suffix**.

View File

@@ -0,0 +1,383 @@
**phidata** is a framework for building **AI Assistants** with long-term memory, contextual knowledge, and the ability to take actions using function calling. It helps turn general-purpose LLMs into specialized assistants tailored to your use case by extending its capabilities using **memory**, **knowledge**, and **tools**.
- **Memory**: Stores chat history in a **database** and enables LLMs to have long-term conversations.
- **Knowledge**: Stores information in a **vector database** and provides LLMs with business context. (Here we will use LanceDB)
- **Tools**: Enable LLMs to take actions like pulling data from an **API**, **sending emails** or **querying a database**, etc.
![example](https://raw.githubusercontent.com/lancedb/assets/refs/heads/main/docs/assets/integration/phidata_assistant.png)
Memory & knowledge make LLMs smarter while tools make them autonomous.
LanceDB is a vector database and its integration into phidata makes it easy for us to provide a **knowledge base** to LLMs. It enables us to store information as [embeddings](../embeddings/understanding_embeddings.md) and search for the **results** similar to ours using **query**.
??? Question "What is Knowledge Base?"
Knowledge Base is a database of information that the Assistant can search to improve its responses. This information is stored in a vector database and provides LLMs with business context, which makes them respond in a context-aware manner.
While any type of storage can act as a knowledge base, vector databases offer the best solution for retrieving relevant results from dense information quickly.
Let's see how using LanceDB inside phidata helps in making LLM more useful:
## Prerequisites: install and import necessary dependencies
**Create a virtual environment**
1. install virtualenv package
```python
pip install virtualenv
```
2. Create a directory for your project and go to the directory and create a virtual environment inside it.
```python
mkdir phi
```
```python
cd phi
```
```python
python -m venv phidata_
```
**Activating virtual environment**
1. from inside the project directory, run the following command to activate the virtual environment.
```python
phidata_/Scripts/activate
```
**Install the following packages in the virtual environment**
```python
pip install lancedb phidata youtube_transcript_api openai ollama pandas numpy
```
**Create python files and import necessary libraries**
You need to create two files - `transcript.py` and `ollama_assistant.py` or `openai_assistant.py`
=== "openai_assistant.py"
```python
import os, openai
from rich.prompt import Prompt
from phi.assistant import Assistant
from phi.knowledge.text import TextKnowledgeBase
from phi.vectordb.lancedb import LanceDb
from phi.llm.openai import OpenAIChat
from phi.embedder.openai import OpenAIEmbedder
from transcript import extract_transcript
if "OPENAI_API_KEY" not in os.environ:
# OR set the key here as a variable
openai.api_key = "sk-..."
# The code below creates a file "transcript.txt" in the directory, the txt file will be used below
youtube_url = "https://www.youtube.com/watch?v=Xs33-Gzl8Mo"
segment_duration = 20
transcript_text,dict_transcript = extract_transcript(youtube_url,segment_duration)
```
=== "ollama_assistant.py"
```python
from rich.prompt import Prompt
from phi.assistant import Assistant
from phi.knowledge.text import TextKnowledgeBase
from phi.vectordb.lancedb import LanceDb
from phi.llm.ollama import Ollama
from phi.embedder.ollama import OllamaEmbedder
from transcript import extract_transcript
# The code below creates a file "transcript.txt" in the directory, the txt file will be used below
youtube_url = "https://www.youtube.com/watch?v=Xs33-Gzl8Mo"
segment_duration = 20
transcript_text,dict_transcript = extract_transcript(youtube_url,segment_duration)
```
=== "transcript.py"
``` python
from youtube_transcript_api import YouTubeTranscriptApi
import re
def smodify(seconds):
hours, remainder = divmod(seconds, 3600)
minutes, seconds = divmod(remainder, 60)
return f"{int(hours):02}:{int(minutes):02}:{int(seconds):02}"
def extract_transcript(youtube_url,segment_duration):
# Extract video ID from the URL
video_id = re.search(r'(?<=v=)[\w-]+', youtube_url)
if not video_id:
video_id = re.search(r'(?<=be/)[\w-]+', youtube_url)
if not video_id:
return None
video_id = video_id.group(0)
# Attempt to fetch the transcript
try:
# Try to get the official transcript
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
except Exception:
# If no official transcript is found, try to get auto-generated transcript
try:
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
for transcript in transcript_list:
transcript = transcript.translate('en').fetch()
except Exception:
return None
# Format the transcript into 120s chunks
transcript_text,dict_transcript = format_transcript(transcript,segment_duration)
# Open the file in write mode, which creates it if it doesn't exist
with open("transcript.txt", "w",encoding="utf-8") as file:
file.write(transcript_text)
return transcript_text,dict_transcript
def format_transcript(transcript,segment_duration):
chunked_transcript = []
chunk_dict = []
current_chunk = []
current_time = 0
# 2 minutes in seconds
start_time_chunk = 0 # To track the start time of the current chunk
for segment in transcript:
start_time = segment['start']
end_time_x = start_time + segment['duration']
text = segment['text']
# Add text to the current chunk
current_chunk.append(text)
# Update the current time with the duration of the current segment
# The duration of the current segment is given by segment['start'] - start_time_chunk
if current_chunk:
current_time = start_time - start_time_chunk
# If current chunk duration reaches or exceeds 2 minutes, save the chunk
if current_time >= segment_duration:
# Use the start time of the first segment in the current chunk as the timestamp
chunked_transcript.append(f"[{smodify(start_time_chunk)} to {smodify(end_time_x)}] " + " ".join(current_chunk))
current_chunk = re.sub(r'[\xa0\n]', lambda x: '' if x.group() == '\xa0' else ' ', "\n".join(current_chunk))
chunk_dict.append({"timestamp":f"[{smodify(start_time_chunk)} to {smodify(end_time_x)}]", "text": "".join(current_chunk)})
current_chunk = [] # Reset the chunk
start_time_chunk = start_time + segment['duration'] # Update the start time for the next chunk
current_time = 0 # Reset current time
# Add any remaining text in the last chunk
if current_chunk:
chunked_transcript.append(f"[{smodify(start_time_chunk)} to {smodify(end_time_x)}] " + " ".join(current_chunk))
current_chunk = re.sub(r'[\xa0\n]', lambda x: '' if x.group() == '\xa0' else ' ', "\n".join(current_chunk))
chunk_dict.append({"timestamp":f"[{smodify(start_time_chunk)} to {smodify(end_time_x)}]", "text": "".join(current_chunk)})
return "\n\n".join(chunked_transcript), chunk_dict
```
!!! warning
If creating Ollama assistant, download and install Ollama [from here](https://ollama.com/) and then run the Ollama instance in the background. Also, download the required models using `ollama pull <model-name>`. Check out the models [here](https://ollama.com/library)
**Run the following command to deactivate the virtual environment if needed**
```python
deactivate
```
## **Step 1** - Create a Knowledge Base for AI Assistant using LanceDB
=== "openai_assistant.py"
```python
# Create knowledge Base with OpenAIEmbedder in LanceDB
knowledge_base = TextKnowledgeBase(
path="transcript.txt",
vector_db=LanceDb(
embedder=OpenAIEmbedder(api_key = openai.api_key),
table_name="transcript_documents",
uri="./t3mp/.lancedb",
),
num_documents = 10
)
```
=== "ollama_assistant.py"
```python
# Create knowledge Base with OllamaEmbedder in LanceDB
knowledge_base = TextKnowledgeBase(
path="transcript.txt",
vector_db=LanceDb(
embedder=OllamaEmbedder(model="nomic-embed-text",dimensions=768),
table_name="transcript_documents",
uri="./t2mp/.lancedb",
),
num_documents = 10
)
```
Check out the list of **embedders** supported by **phidata** and their usage [here](https://docs.phidata.com/embedder/introduction).
Here we have used `TextKnowledgeBase`, which loads text/docx files to the knowledge base.
Let's see all the parameters that `TextKnowledgeBase` takes -
| Name| Type | Purpose | Default |
|:----|:-----|:--------|:--------|
|`path`|`Union[str, Path]`| Path to text file(s). It can point to a single text file or a directory of text files.| provided by user |
|`formats`|`List[str]`| File formats accepted by this knowledge base. |`[".txt"]`|
|`vector_db`|`VectorDb`| Vector Database for the Knowledge Base. phidata provides a wrapper around many vector DBs, you can import it like this - `from phi.vectordb.lancedb import LanceDb` | provided by user |
|`num_documents`|`int`| Number of results (documents/vectors) that vector search should return. |`5`|
|`reader`|`TextReader`| phidata provides many types of reader objects which read data, clean it and create chunks of data, encapsulate each chunk inside an object of the `Document` class, and return **`List[Document]`**. | `TextReader()` |
|`optimize_on`|`int`| It is used to specify the number of documents on which to optimize the vector database. Supposed to create an index. |`1000`|
??? Tip "Wonder! What is `Document` class?"
We know that, before storing the data in vectorDB, we need to split the data into smaller chunks upon which embeddings will be created and these embeddings along with the chunks will be stored in vectorDB. When the user queries over the vectorDB, some of these embeddings will be returned as the result based on the semantic similarity with the query.
When the user queries over vectorDB, the queries are converted into embeddings, and a nearest neighbor search is performed over these query embeddings which returns the embeddings that correspond to most semantically similar chunks(parts of our data) present in vectorDB.
Here, a “Document” is a class in phidata. Since there is an option to let phidata create and manage embeddings, it splits our data into smaller chunks(as expected). It does not directly create embeddings on it. Instead, it takes each chunk and encapsulates it inside the object of the `Document` class along with various other metadata related to the chunk. Then embeddings are created on these `Document` objects and stored in vectorDB.
```python
class Document(BaseModel):
"""Model for managing a document"""
content: str # <--- here data of chunk is stored
id: Optional[str] = None
name: Optional[str] = None
meta_data: Dict[str, Any] = {}
embedder: Optional[Embedder] = None
embedding: Optional[List[float]] = None
usage: Optional[Dict[str, Any]] = None
```
However, using phidata you can load many other types of data in the knowledge base(other than text). Check out [phidata Knowledge Base](https://docs.phidata.com/knowledge/introduction) for more information.
Let's dig deeper into the `vector_db` parameter and see what parameters `LanceDb` takes -
| Name| Type | Purpose | Default |
|:----|:-----|:--------|:--------|
|`embedder`|`Embedder`| phidata provides many Embedders that abstract the interaction with embedding APIs and utilize it to generate embeddings. Check out other embedders [here](https://docs.phidata.com/embedder/introduction) | `OpenAIEmbedder` |
|`distance`|`List[str]`| The choice of distance metric used to calculate the similarity between vectors, which directly impacts search results and performance in vector databases. |`Distance.cosine`|
|`connection`|`lancedb.db.LanceTable`| LanceTable can be accessed through `.connection`. You can connect to an existing table of LanceDB, created outside of phidata, and utilize it. If not provided, it creates a new table using `table_name` parameter and adds it to `connection`. |`None`|
|`uri`|`str`| It specifies the directory location of **LanceDB database** and establishes a connection that can be used to interact with the database. | `"/tmp/lancedb"` |
|`table_name`|`str`| If `connection` is not provided, it initializes and connects to a new **LanceDB table** with a specified(or default) name in the database present at `uri`. |`"phi"`|
|`nprobes`|`int`| It refers to the number of partitions that the search algorithm examines to find the nearest neighbors of a given query vector. Higher values will yield better recall (more likely to find vectors if they exist) at the expense of latency. |`20`|
!!! note
Since we just initialized the KnowledgeBase. The VectorDB table that corresponds to this Knowledge Base is not yet populated with our data. It will be populated in **Step 3**, once we perform the `load` operation.
You can check the state of the LanceDB table using - `knowledge_base.vector_db.connection.to_pandas()`
Now that the Knowledge Base is initialized, , we can go to **step 2**.
## **Step 2** - Create an assistant with our choice of LLM and reference to the knowledge base.
=== "openai_assistant.py"
```python
# define an assistant with gpt-4o-mini llm and reference to the knowledge base created above
assistant = Assistant(
llm=OpenAIChat(model="gpt-4o-mini", max_tokens=1000, temperature=0.3,api_key = openai.api_key),
description="""You are an Expert in explaining youtube video transcripts. You are a bot that takes transcript of a video and answer the question based on it.
This is transcript for the above timestamp: {relevant_document}
The user input is: {user_input}
generate highlights only when asked.
When asked to generate highlights from the video, understand the context for each timestamp and create key highlight points, answer in following way -
[timestamp] - highlight 1
[timestamp] - highlight 2
... so on
Your task is to understand the user question, and provide an answer using the provided contexts. Your answers are correct, high-quality, and written by an domain expert. If the provided context does not contain the answer, simply state,'The provided context does not have the answer.'""",
knowledge_base=knowledge_base,
add_references_to_prompt=True,
)
```
=== "ollama_assistant.py"
```python
# define an assistant with llama3.1 llm and reference to the knowledge base created above
assistant = Assistant(
llm=Ollama(model="llama3.1"),
description="""You are an Expert in explaining youtube video transcripts. You are a bot that takes transcript of a video and answer the question based on it.
This is transcript for the above timestamp: {relevant_document}
The user input is: {user_input}
generate highlights only when asked.
When asked to generate highlights from the video, understand the context for each timestamp and create key highlight points, answer in following way -
[timestamp] - highlight 1
[timestamp] - highlight 2
... so on
Your task is to understand the user question, and provide an answer using the provided contexts. Your answers are correct, high-quality, and written by an domain expert. If the provided context does not contain the answer, simply state,'The provided context does not have the answer.'""",
knowledge_base=knowledge_base,
add_references_to_prompt=True,
)
```
Assistants add **memory**, **knowledge**, and **tools** to LLMs. Here we will add only **knowledge** in this example.
Whenever we will give a query to LLM, the assistant will retrieve relevant information from our **Knowledge Base**(table in LanceDB) and pass it to LLM along with the user query in a structured way.
- The `add_references_to_prompt=True` always adds information from the knowledge base to the prompt, regardless of whether it is relevant to the question.
To know more about an creating assistant in phidata, check out [phidata docs](https://docs.phidata.com/assistants/introduction) here.
## **Step 3** - Load data to Knowledge Base.
```python
# load out data into the knowledge_base (populating the LanceTable)
assistant.knowledge_base.load(recreate=False)
```
The above code loads the data to the Knowledge Base(LanceDB Table) and now it is ready to be used by the assistant.
| Name| Type | Purpose | Default |
|:----|:-----|:--------|:--------|
|`recreate`|`bool`| If True, it drops the existing table and recreates the table in the vectorDB. |`False`|
|`upsert`|`bool`| If True and the vectorDB supports upsert, it will upsert documents to the vector db. | `False` |
|`skip_existing`|`bool`| If True, skips documents that already exist in the vectorDB when inserting. |`True`|
??? tip "What is upsert?"
Upsert is a database operation that combines "update" and "insert". It updates existing records if a document with the same identifier does exist, or inserts new records if no matching record exists. This is useful for maintaining the most current information without manually checking for existence.
During the Load operation, phidata directly interacts with the LanceDB library and performs the loading of the table with our data in the following steps -
1. **Creates** and **initializes** the table if it does not exist.
2. Then it **splits** our data into smaller **chunks**.
??? question "How do they create chunks?"
**phidata** provides many types of **Knowledge Bases** based on the type of data. Most of them :material-information-outline:{ title="except LlamaIndexKnowledgeBase and LangChainKnowledgeBase"} has a property method called `document_lists` of type `Iterator[List[Document]]`. During the load operation, this property method is invoked. It traverses on the data provided by us (in this case, a text file(s)) using `reader`. Then it **reads**, **creates chunks**, and **encapsulates** each chunk inside a `Document` object and yields **lists of `Document` objects** that contain our data.
3. Then **embeddings** are created on these chunks are **inserted** into the LanceDB Table
??? question "How do they insert your data as different rows in LanceDB Table?"
The chunks of your data are in the form - **lists of `Document` objects**. It was yielded in the step above.
for each `Document` in `List[Document]`, it does the following operations:
- Creates embedding on `Document`.
- Cleans the **content attribute**(chunks of our data is here) of `Document`.
- Prepares data by creating `id` and loading `payload` with the metadata related to this chunk. (1)
{ .annotate }
1. Three columns will be added to the table - `"id"`, `"vector"`, and `"payload"` (payload contains various metadata including **`content`**)
- Then add this data to LanceTable.
4. Now the internal state of `knowledge_base` is changed (embeddings are created and loaded in the table ) and it **ready to be used by assistant**.
## **Step 4** - Start a cli chatbot with access to the Knowledge base
```python
# start cli chatbot with knowledge base
assistant.print_response("Ask me about something from the knowledge base")
while True:
message = Prompt.ask(f"[bold] :sunglasses: User [/bold]")
if message in ("exit", "bye"):
break
assistant.print_response(message, markdown=True)
```
For more information and amazing cookbooks of phidata, read the [phidata documentation](https://docs.phidata.com/introduction) and also visit [LanceDB x phidata docmentation](https://docs.phidata.com/vectordb/lancedb).

View File

@@ -1,13 +1,73 @@
# FiftyOne
FiftyOne is an open source toolkit for building high-quality datasets and computer vision models. It provides an API to create LanceDB tables and run similarity queries, both programmatically in Python and via point-and-click in the App.
FiftyOne is an open source toolkit that enables users to curate better data and build better models. It includes tools for data exploration, visualization, and management, as well as features for collaboration and sharing.
Any developers, data scientists, and researchers who work with computer vision and machine learning can use FiftyOne to improve the quality of their datasets and deliver insights about their models.
![example](../assets/voxel.gif)
## Basic recipe
**FiftyOne** provides an API to create LanceDB tables and run similarity queries, both **programmatically in Python** and via **point-and-click in the App**.
The basic workflow shown below uses LanceDB to create a similarity index on your FiftyOne
datasets:
Let's get started and see how to use **LanceDB** to create a **similarity index** on your FiftyOne datasets.
## Overview
**[Embeddings](../embeddings/understanding_embeddings.md)** are foundational to all of the **vector search** features. In FiftyOne, embeddings are managed by the [**FiftyOne Brain**](https://docs.voxel51.com/user_guide/brain.html) that provides powerful machine learning techniques designed to transform how you curate your data from an art into a measurable science.
!!!question "Have you ever wanted to find the images most similar to an image in your dataset?"
The **FiftyOne Brain** makes computing **visual similarity** really easy. You can compute the similarity of samples in your dataset using an embedding model and store the results in the **brain key**.
You can then sort your samples by similarity or use this information to find potential duplicate images.
Here we will be doing the following :
1. **Create Index** - In order to run similarity queries against our media, we need to **index** the data. We can do this via the `compute_similarity()` function.
- In the function, specify the **model** you want to use to generate the embedding vectors, and what **vector search engine** you want to use on the **backend** (here LanceDB).
!!!tip
You can also give the similarity index a name(`brain_key`), which is useful if you want to run vector searches against multiple indexes.
2. **Query** - Once you have generated your similarity index, you can query your dataset with `sort_by_similarity()`. The query can be any of the following:
- An ID (sample or patch)
- A query vector of same dimension as the index
- A list of IDs (samples or patches)
- A text prompt (search semantically)
## Prerequisites: install necessary dependencies
1. **Create and activate a virtual environment**
Install virtualenv package and run the following command in your project directory.
```python
python -m venv fiftyone_
```
From inside the project directory run the following to activate the virtual environment.
=== "Windows"
```python
fiftyone_/Scripts/activate
```
=== "macOS/Linux"
```python
source fiftyone_/Scripts/activate
```
2. **Install the following packages in the virtual environment**
To install FiftyOne, ensure you have activated any virtual environment that you are using, then run
```python
pip install fiftyone
```
## Understand basic workflow
The basic workflow shown below uses LanceDB to create a similarity index on your FiftyOne datasets:
1. Load a dataset into FiftyOne.
@@ -19,14 +79,10 @@ datasets:
5. If desired, delete the table.
The example below demonstrates this workflow.
## Quick Example
!!! Note
Let's jump on a quick example that demonstrates this workflow.
Install the LanceDB Python client to run the code shown below.
```
pip install lancedb
```
```python
@@ -36,7 +92,10 @@ import fiftyone.zoo as foz
# Step 1: Load your data into FiftyOne
dataset = foz.load_zoo_dataset("quickstart")
```
Make sure you install torch ([guide here](https://pytorch.org/get-started/locally/)) before proceeding.
```python
# Steps 2 and 3: Compute embeddings and create a similarity index
lancedb_index = fob.compute_similarity(
dataset,
@@ -45,8 +104,11 @@ lancedb_index = fob.compute_similarity(
backend="lancedb",
)
```
Once the similarity index has been generated, we can query our data in FiftyOne
by specifying the `brain_key`:
!!! note
Running the code above will download the clip model (2.6Gb)
Once the similarity index has been generated, we can query our data in FiftyOne by specifying the `brain_key`:
```python
# Step 4: Query your data
@@ -56,7 +118,22 @@ view = dataset.sort_by_similarity(
brain_key="lancedb_index",
k=10, # limit to 10 most similar samples
)
```
The returned result are of type - `DatasetView`.
!!! note
`DatasetView` does not hold its contents in-memory. Views simply store the rule(s) that are applied to extract the content of interest from the underlying Dataset when the view is iterated/aggregated on.
This means, for example, that the contents of a `DatasetView` may change as the underlying Dataset is modified.
??? question "Can you query a view instead of dataset?"
Yes, you can also query a view.
Performing a similarity search on a `DatasetView` will only return results from the view; if the view contains samples that were not included in the index, they will never be included in the result.
This means that you can index an entire Dataset once and then perform searches on subsets of the dataset by constructing views that contain the images of interest.
```python
# Step 5 (optional): Cleanup
# Delete the LanceDB table
@@ -66,4 +143,90 @@ lancedb_index.cleanup()
dataset.delete_brain_run("lancedb_index")
```
## Using LanceDB backend
By default, calling `compute_similarity()` or `sort_by_similarity()` will use an sklearn backend.
To use the LanceDB backend, simply set the optional `backend` parameter of `compute_similarity()` to `"lancedb"`:
```python
import fiftyone.brain as fob
#... rest of the code
fob.compute_similarity(..., backend="lancedb", ...)
```
Alternatively, you can configure FiftyOne to use the LanceDB backend by setting the following environment variable.
In your terminal, set the environment variable using:
=== "Windows"
```python
$Env:FIFTYONE_BRAIN_DEFAULT_SIMILARITY_BACKEND="lancedb" //powershell
set FIFTYONE_BRAIN_DEFAULT_SIMILARITY_BACKEND=lancedb //cmd
```
=== "macOS/Linux"
```python
export FIFTYONE_BRAIN_DEFAULT_SIMILARITY_BACKEND=lancedb
```
!!! note
This will only run during the terminal session. Once terminal is closed, environment variable is deleted.
Alternatively, you can **permanently** configure FiftyOne to use the LanceDB backend creating a `brain_config.json` at `~/.fiftyone/brain_config.json`. The JSON file may contain any desired subset of config fields that you wish to customize.
```json
{
"default_similarity_backend": "lancedb"
}
```
This will override the default `brain_config` and will set it according to your customization. You can check the configuration by running the following code :
```python
import fiftyone.brain as fob
# Print your current brain config
print(fob.brain_config)
```
## LanceDB config parameters
The LanceDB backend supports query parameters that can be used to customize your similarity queries. These parameters include:
| Name| Purpose | Default |
|:----|:--------|:--------|
|**table_name**|The name of the LanceDB table to use. If none is provided, a new table will be created|`None`|
|**metric**|The embedding distance metric to use when creating a new table. The supported values are ("cosine", "euclidean")|`"cosine"`|
|**uri**| The database URI to use. In this Database URI, tables will be created. |`"/tmp/lancedb"`|
There are two ways to specify/customize the parameters:
1. **Using `brain_config.json` file**
```json
{
"similarity_backends": {
"lancedb": {
"table_name": "your-table",
"metric": "euclidean",
"uri": "/tmp/lancedb"
}
}
}
```
2. **Directly passing to `compute_similarity()` to configure a specific new index** :
```python
lancedb_index = fob.compute_similarity(
...
backend="lancedb",
brain_key="lancedb_index",
table_name="your-table",
metric="euclidean",
uri="/tmp/lancedb",
)
```
For a much more in depth walkthrough of the integration, visit the LanceDB x Voxel51 [docs page](https://docs.voxel51.com/integrations/lancedb.html).

View File

@@ -41,7 +41,6 @@ To build everything fresh:
```bash
npm install
npm run tsc
npm run build
```
@@ -51,18 +50,6 @@ Then you should be able to run the tests with:
npm test
```
### Rebuilding Rust library
```bash
npm run build
```
### Rebuilding Typescript
```bash
npm run tsc
```
### Fix lints
To run the linter and have it automatically fix all errors

View File

@@ -38,4 +38,4 @@ A [WriteMode](../enums/WriteMode.md) to use on this operation
#### Defined in
[index.ts:1019](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1019)
[index.ts:1359](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1359)

View File

@@ -30,6 +30,7 @@ A connection to a LanceDB database.
- [dropTable](LocalConnection.md#droptable)
- [openTable](LocalConnection.md#opentable)
- [tableNames](LocalConnection.md#tablenames)
- [withMiddleware](LocalConnection.md#withmiddleware)
## Constructors
@@ -46,7 +47,7 @@ A connection to a LanceDB database.
#### Defined in
[index.ts:489](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L489)
[index.ts:739](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L739)
## Properties
@@ -56,7 +57,7 @@ A connection to a LanceDB database.
#### Defined in
[index.ts:487](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L487)
[index.ts:737](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L737)
___
@@ -74,7 +75,7 @@ ___
#### Defined in
[index.ts:486](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L486)
[index.ts:736](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L736)
## Accessors
@@ -92,7 +93,7 @@ ___
#### Defined in
[index.ts:494](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L494)
[index.ts:744](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L744)
## Methods
@@ -113,7 +114,7 @@ Creates a new Table, optionally initializing it with new data.
| Name | Type |
| :------ | :------ |
| `name` | `string` \| [`CreateTableOptions`](../interfaces/CreateTableOptions.md)\<`T`\> |
| `data?` | `Record`\<`string`, `unknown`\>[] |
| `data?` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] |
| `optsOrEmbedding?` | [`WriteOptions`](../interfaces/WriteOptions.md) \| [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)\<`T`\> |
| `opt?` | [`WriteOptions`](../interfaces/WriteOptions.md) |
@@ -127,7 +128,7 @@ Creates a new Table, optionally initializing it with new data.
#### Defined in
[index.ts:542](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L542)
[index.ts:788](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L788)
___
@@ -158,7 +159,7 @@ ___
#### Defined in
[index.ts:576](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L576)
[index.ts:822](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L822)
___
@@ -184,7 +185,7 @@ Drop an existing table.
#### Defined in
[index.ts:630](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L630)
[index.ts:876](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L876)
___
@@ -210,7 +211,7 @@ Open a table in the database.
#### Defined in
[index.ts:510](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L510)
[index.ts:760](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L760)
**openTable**\<`T`\>(`name`, `embeddings`): `Promise`\<[`Table`](../interfaces/Table.md)\<`T`\>\>
@@ -239,7 +240,7 @@ Connection.openTable
#### Defined in
[index.ts:518](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L518)
[index.ts:768](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L768)
**openTable**\<`T`\>(`name`, `embeddings?`): `Promise`\<[`Table`](../interfaces/Table.md)\<`T`\>\>
@@ -266,7 +267,7 @@ Connection.openTable
#### Defined in
[index.ts:522](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L522)
[index.ts:772](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L772)
___
@@ -286,4 +287,36 @@ Get the names of all tables in the database.
#### Defined in
[index.ts:501](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L501)
[index.ts:751](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L751)
___
### withMiddleware
**withMiddleware**(`middleware`): [`Connection`](../interfaces/Connection.md)
Instrument the behavior of this Connection with middleware.
The middleware will be called in the order they are added.
Currently this functionality is only supported for remote Connections.
#### Parameters
| Name | Type |
| :------ | :------ |
| `middleware` | `HttpMiddleware` |
#### Returns
[`Connection`](../interfaces/Connection.md)
- this Connection instrumented by the passed middleware
#### Implementation of
[Connection](../interfaces/Connection.md).[withMiddleware](../interfaces/Connection.md#withmiddleware)
#### Defined in
[index.ts:880](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L880)

View File

@@ -37,6 +37,8 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
### Methods
- [add](LocalTable.md#add)
- [addColumns](LocalTable.md#addcolumns)
- [alterColumns](LocalTable.md#altercolumns)
- [checkElectron](LocalTable.md#checkelectron)
- [cleanupOldVersions](LocalTable.md#cleanupoldversions)
- [compactFiles](LocalTable.md#compactfiles)
@@ -44,13 +46,16 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
- [createIndex](LocalTable.md#createindex)
- [createScalarIndex](LocalTable.md#createscalarindex)
- [delete](LocalTable.md#delete)
- [dropColumns](LocalTable.md#dropcolumns)
- [filter](LocalTable.md#filter)
- [getSchema](LocalTable.md#getschema)
- [indexStats](LocalTable.md#indexstats)
- [listIndices](LocalTable.md#listindices)
- [mergeInsert](LocalTable.md#mergeinsert)
- [overwrite](LocalTable.md#overwrite)
- [search](LocalTable.md#search)
- [update](LocalTable.md#update)
- [withMiddleware](LocalTable.md#withmiddleware)
## Constructors
@@ -74,7 +79,7 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
#### Defined in
[index.ts:642](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L642)
[index.ts:892](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L892)
**new LocalTable**\<`T`\>(`tbl`, `name`, `options`, `embeddings`)
@@ -95,7 +100,7 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
#### Defined in
[index.ts:649](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L649)
[index.ts:899](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L899)
## Properties
@@ -105,7 +110,7 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
#### Defined in
[index.ts:639](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L639)
[index.ts:889](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L889)
___
@@ -115,7 +120,7 @@ ___
#### Defined in
[index.ts:638](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L638)
[index.ts:888](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L888)
___
@@ -125,7 +130,7 @@ ___
#### Defined in
[index.ts:637](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L637)
[index.ts:887](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L887)
___
@@ -143,7 +148,7 @@ ___
#### Defined in
[index.ts:640](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L640)
[index.ts:890](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L890)
___
@@ -153,7 +158,7 @@ ___
#### Defined in
[index.ts:636](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L636)
[index.ts:886](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L886)
___
@@ -179,7 +184,7 @@ Creates a filter query to find all rows matching the specified criteria
#### Defined in
[index.ts:688](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L688)
[index.ts:938](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L938)
## Accessors
@@ -197,7 +202,7 @@ Creates a filter query to find all rows matching the specified criteria
#### Defined in
[index.ts:668](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L668)
[index.ts:918](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L918)
___
@@ -215,7 +220,7 @@ ___
#### Defined in
[index.ts:849](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L849)
[index.ts:1171](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1171)
## Methods
@@ -229,7 +234,7 @@ Insert records into this Table.
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
@@ -243,7 +248,59 @@ The number of rows added to the table
#### Defined in
[index.ts:696](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L696)
[index.ts:946](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L946)
___
### addColumns
**addColumns**(`newColumnTransforms`): `Promise`\<`void`\>
Add new columns with defined values.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `newColumnTransforms` | \{ `name`: `string` ; `valueSql`: `string` }[] | pairs of column names and the SQL expression to use to calculate the value of the new column. These expressions will be evaluated for each row in the table, and can reference existing columns in the table. |
#### Returns
`Promise`\<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[addColumns](../interfaces/Table.md#addcolumns)
#### Defined in
[index.ts:1195](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1195)
___
### alterColumns
**alterColumns**(`columnAlterations`): `Promise`\<`void`\>
Alter the name or nullability of columns.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `columnAlterations` | [`ColumnAlteration`](../interfaces/ColumnAlteration.md)[] | One or more alterations to apply to columns. |
#### Returns
`Promise`\<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[alterColumns](../interfaces/Table.md#altercolumns)
#### Defined in
[index.ts:1201](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1201)
___
@@ -257,7 +314,7 @@ ___
#### Defined in
[index.ts:861](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L861)
[index.ts:1183](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1183)
___
@@ -280,7 +337,7 @@ Clean up old versions of the table, freeing disk space.
#### Defined in
[index.ts:808](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L808)
[index.ts:1130](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1130)
___
@@ -307,16 +364,22 @@ Metrics about the compaction operation.
#### Defined in
[index.ts:831](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L831)
[index.ts:1153](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1153)
___
### countRows
**countRows**(): `Promise`\<`number`\>
**countRows**(`filter?`): `Promise`\<`number`\>
Returns the number of rows in this table.
#### Parameters
| Name | Type |
| :------ | :------ |
| `filter?` | `string` |
#### Returns
`Promise`\<`number`\>
@@ -327,7 +390,7 @@ Returns the number of rows in this table.
#### Defined in
[index.ts:749](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L749)
[index.ts:1021](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1021)
___
@@ -357,13 +420,13 @@ VectorIndexParams.
#### Defined in
[index.ts:734](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L734)
[index.ts:1003](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1003)
___
### createScalarIndex
**createScalarIndex**(`column`, `replace`): `Promise`\<`void`\>
**createScalarIndex**(`column`, `replace?`): `Promise`\<`void`\>
Create a scalar index on this Table for the given column
@@ -372,7 +435,7 @@ Create a scalar index on this Table for the given column
| Name | Type | Description |
| :------ | :------ | :------ |
| `column` | `string` | The column to index |
| `replace` | `boolean` | If false, fail if an index already exists on the column Scalar indices, like vector indices, can be used to speed up scans. A scalar index can speed up scans that contain filter expressions on the indexed column. For example, the following scan will be faster if the column `my_col` has a scalar index: ```ts const con = await lancedb.connect('./.lancedb'); const table = await con.openTable('images'); const results = await table.where('my_col = 7').execute(); ``` Scalar indices can also speed up scans containing a vector search and a prefilter: ```ts const con = await lancedb.connect('././lancedb'); const table = await con.openTable('images'); const results = await table.search([1.0, 2.0]).where('my_col != 7').prefilter(true); ``` Scalar indices can only speed up scans for basic filters using equality, comparison, range (e.g. `my_col BETWEEN 0 AND 100`), and set membership (e.g. `my_col IN (0, 1, 2)`) Scalar indices can be used if the filter contains multiple indexed columns and the filter criteria are AND'd or OR'd together (e.g. `my_col < 0 AND other_col> 100`) Scalar indices may be used if the filter contains non-indexed columns but, depending on the structure of the filter, they may not be usable. For example, if the column `not_indexed` does not have a scalar index then the filter `my_col = 0 OR not_indexed = 1` will not be able to use any scalar index on `my_col`. |
| `replace?` | `boolean` | If false, fail if an index already exists on the column it is always set to true for remote connections Scalar indices, like vector indices, can be used to speed up scans. A scalar index can speed up scans that contain filter expressions on the indexed column. For example, the following scan will be faster if the column `my_col` has a scalar index: ```ts const con = await lancedb.connect('./.lancedb'); const table = await con.openTable('images'); const results = await table.where('my_col = 7').execute(); ``` Scalar indices can also speed up scans containing a vector search and a prefilter: ```ts const con = await lancedb.connect('././lancedb'); const table = await con.openTable('images'); const results = await table.search([1.0, 2.0]).where('my_col != 7').prefilter(true); ``` Scalar indices can only speed up scans for basic filters using equality, comparison, range (e.g. `my_col BETWEEN 0 AND 100`), and set membership (e.g. `my_col IN (0, 1, 2)`) Scalar indices can be used if the filter contains multiple indexed columns and the filter criteria are AND'd or OR'd together (e.g. `my_col < 0 AND other_col> 100`) Scalar indices may be used if the filter contains non-indexed columns but, depending on the structure of the filter, they may not be usable. For example, if the column `not_indexed` does not have a scalar index then the filter `my_col = 0 OR not_indexed = 1` will not be able to use any scalar index on `my_col`. |
#### Returns
@@ -392,7 +455,7 @@ await table.createScalarIndex('my_col')
#### Defined in
[index.ts:742](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L742)
[index.ts:1011](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1011)
___
@@ -418,7 +481,38 @@ Delete rows from this table.
#### Defined in
[index.ts:758](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L758)
[index.ts:1030](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1030)
___
### dropColumns
▸ **dropColumns**(`columnNames`): `Promise`\<`void`\>
Drop one or more columns from the dataset
This is a metadata-only operation and does not remove the data from the
underlying storage. In order to remove the data, you must subsequently
call ``compact_files`` to rewrite the data without the removed columns and
then call ``cleanup_files`` to remove the old files.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `columnNames` | `string`[] | The names of the columns to drop. These can be nested column references (e.g. "a.b.c") or top-level column names (e.g. "a"). |
#### Returns
`Promise`\<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[dropColumns](../interfaces/Table.md#dropcolumns)
#### Defined in
[index.ts:1205](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1205)
___
@@ -438,9 +532,13 @@ Creates a filter query to find all rows matching the specified criteria
[`Query`](Query.md)\<`T`\>
#### Implementation of
[Table](../interfaces/Table.md).[filter](../interfaces/Table.md#filter)
#### Defined in
[index.ts:684](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L684)
[index.ts:934](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L934)
___
@@ -454,13 +552,13 @@ ___
#### Defined in
[index.ts:854](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L854)
[index.ts:1176](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1176)
___
### indexStats
▸ **indexStats**(`indexUuid`): `Promise`\<[`IndexStats`](../interfaces/IndexStats.md)\>
▸ **indexStats**(`indexName`): `Promise`\<[`IndexStats`](../interfaces/IndexStats.md)\>
Get statistics about an index.
@@ -468,7 +566,7 @@ Get statistics about an index.
| Name | Type |
| :------ | :------ |
| `indexUuid` | `string` |
| `indexName` | `string` |
#### Returns
@@ -480,7 +578,7 @@ Get statistics about an index.
#### Defined in
[index.ts:845](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L845)
[index.ts:1167](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1167)
___
@@ -500,7 +598,57 @@ List the indicies on this table.
#### Defined in
[index.ts:841](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L841)
[index.ts:1163](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1163)
___
### mergeInsert
▸ **mergeInsert**(`on`, `data`, `args`): `Promise`\<`void`\>
Runs a "merge insert" operation on the table
This operation can add rows, update rows, and remove rows all in a single
transaction. It is a very generic tool that can be used to create
behaviors like "insert if not exists", "update or insert (i.e. upsert)",
or even replace a portion of existing data with new data (e.g. replace
all data where month="january")
The merge insert operation works by combining new data from a
**source table** with existing data in a **target table** by using a
join. There are three categories of records.
"Matched" records are records that exist in both the source table and
the target table. "Not matched" records exist only in the source table
(e.g. these are new data) "Not matched by source" records exist only
in the target table (this is old data)
The MergeInsertArgs can be used to customize what should happen for
each category of data.
Please note that the data may appear to be reordered as part of this
operation. This is because updated rows will be deleted from the
dataset and then reinserted at the end with the new values.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `on` | `string` | a column to join on. This is how records from the source table and target table are matched. |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | the new data to insert |
| `args` | [`MergeInsertArgs`](../interfaces/MergeInsertArgs.md) | parameters controlling how the operation should behave |
#### Returns
`Promise`\<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[mergeInsert](../interfaces/Table.md#mergeinsert)
#### Defined in
[index.ts:1065](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1065)
___
@@ -514,7 +662,7 @@ Insert records into this Table, replacing its contents.
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
@@ -528,7 +676,7 @@ The number of rows added to the table
#### Defined in
[index.ts:716](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L716)
[index.ts:977](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L977)
___
@@ -554,7 +702,7 @@ Creates a search query to find the nearest neighbors of the given search term
#### Defined in
[index.ts:676](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L676)
[index.ts:926](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L926)
___
@@ -580,4 +728,36 @@ Update rows in this table.
#### Defined in
[index.ts:771](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L771)
[index.ts:1043](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1043)
___
### withMiddleware
▸ **withMiddleware**(`middleware`): [`Table`](../interfaces/Table.md)\<`T`\>
Instrument the behavior of this Table with middleware.
The middleware will be called in the order they are added.
Currently this functionality is only supported for remote tables.
#### Parameters
| Name | Type |
| :------ | :------ |
| `middleware` | `HttpMiddleware` |
#### Returns
[`Table`](../interfaces/Table.md)\<`T`\>
- this Table instrumented by the passed middleware
#### Implementation of
[Table](../interfaces/Table.md).[withMiddleware](../interfaces/Table.md#withmiddleware)
#### Defined in
[index.ts:1209](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1209)

View File

@@ -0,0 +1,82 @@
[vectordb](../README.md) / [Exports](../modules.md) / MakeArrowTableOptions
# Class: MakeArrowTableOptions
Options to control the makeArrowTable call.
## Table of contents
### Constructors
- [constructor](MakeArrowTableOptions.md#constructor)
### Properties
- [dictionaryEncodeStrings](MakeArrowTableOptions.md#dictionaryencodestrings)
- [embeddings](MakeArrowTableOptions.md#embeddings)
- [schema](MakeArrowTableOptions.md#schema)
- [vectorColumns](MakeArrowTableOptions.md#vectorcolumns)
## Constructors
### constructor
**new MakeArrowTableOptions**(`values?`)
#### Parameters
| Name | Type |
| :------ | :------ |
| `values?` | `Partial`\<[`MakeArrowTableOptions`](MakeArrowTableOptions.md)\> |
#### Defined in
[arrow.ts:98](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L98)
## Properties
### dictionaryEncodeStrings
**dictionaryEncodeStrings**: `boolean` = `false`
If true then string columns will be encoded with dictionary encoding
Set this to true if your string columns tend to repeat the same values
often. For more precise control use the `schema` property to specify the
data type for individual columns.
If `schema` is provided then this property is ignored.
#### Defined in
[arrow.ts:96](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L96)
___
### embeddings
`Optional` **embeddings**: [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)\<`any`\>
#### Defined in
[arrow.ts:85](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L85)
___
### schema
`Optional` **schema**: `Schema`\<`any`\>
#### Defined in
[arrow.ts:63](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L63)
___
### vectorColumns
**vectorColumns**: `Record`\<`string`, `VectorColumnOptions`\>
#### Defined in
[arrow.ts:81](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L81)

View File

@@ -40,7 +40,7 @@ An embedding function that automatically creates vector representation for a giv
#### Defined in
[embedding/openai.ts:21](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/openai.ts#L21)
[embedding/openai.ts:22](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/openai.ts#L22)
## Properties
@@ -50,17 +50,17 @@ An embedding function that automatically creates vector representation for a giv
#### Defined in
[embedding/openai.ts:19](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/openai.ts#L19)
[embedding/openai.ts:20](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/openai.ts#L20)
___
### \_openai
`Private` `Readonly` **\_openai**: `any`
`Private` `Readonly` **\_openai**: `OpenAI`
#### Defined in
[embedding/openai.ts:18](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/openai.ts#L18)
[embedding/openai.ts:19](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/openai.ts#L19)
___
@@ -76,7 +76,7 @@ The name of the column that will be used as input for the Embedding Function.
#### Defined in
[embedding/openai.ts:50](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/openai.ts#L50)
[embedding/openai.ts:56](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/openai.ts#L56)
## Methods
@@ -102,4 +102,4 @@ Creates a vector representation for the given values.
#### Defined in
[embedding/openai.ts:38](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/openai.ts#L38)
[embedding/openai.ts:43](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/openai.ts#L43)

View File

@@ -19,6 +19,7 @@ A builder for nearest neighbor queries for LanceDB.
### Properties
- [\_embeddings](Query.md#_embeddings)
- [\_fastSearch](Query.md#_fastsearch)
- [\_filter](Query.md#_filter)
- [\_limit](Query.md#_limit)
- [\_metricType](Query.md#_metrictype)
@@ -34,6 +35,7 @@ A builder for nearest neighbor queries for LanceDB.
### Methods
- [execute](Query.md#execute)
- [fastSearch](Query.md#fastsearch)
- [filter](Query.md#filter)
- [isElectron](Query.md#iselectron)
- [limit](Query.md#limit)
@@ -65,7 +67,7 @@ A builder for nearest neighbor queries for LanceDB.
#### Defined in
[query.ts:38](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L38)
[query.ts:39](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L39)
## Properties
@@ -75,7 +77,17 @@ A builder for nearest neighbor queries for LanceDB.
#### Defined in
[query.ts:36](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L36)
[query.ts:37](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L37)
___
### \_fastSearch
`Private` **\_fastSearch**: `boolean`
#### Defined in
[query.ts:36](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L36)
___
@@ -85,7 +97,7 @@ ___
#### Defined in
[query.ts:33](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L33)
[query.ts:33](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L33)
___
@@ -95,7 +107,7 @@ ___
#### Defined in
[query.ts:29](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L29)
[query.ts:29](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L29)
___
@@ -105,7 +117,7 @@ ___
#### Defined in
[query.ts:34](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L34)
[query.ts:34](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L34)
___
@@ -115,7 +127,7 @@ ___
#### Defined in
[query.ts:31](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L31)
[query.ts:31](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L31)
___
@@ -125,7 +137,7 @@ ___
#### Defined in
[query.ts:35](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L35)
[query.ts:35](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L35)
___
@@ -135,7 +147,7 @@ ___
#### Defined in
[query.ts:26](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L26)
[query.ts:26](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L26)
___
@@ -145,7 +157,7 @@ ___
#### Defined in
[query.ts:28](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L28)
[query.ts:28](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L28)
___
@@ -155,7 +167,7 @@ ___
#### Defined in
[query.ts:30](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L30)
[query.ts:30](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L30)
___
@@ -165,7 +177,7 @@ ___
#### Defined in
[query.ts:32](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L32)
[query.ts:32](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L32)
___
@@ -175,7 +187,7 @@ ___
#### Defined in
[query.ts:27](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L27)
[query.ts:27](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L27)
___
@@ -201,7 +213,7 @@ A filter statement to be applied to this query.
#### Defined in
[query.ts:87](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L87)
[query.ts:90](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L90)
## Methods
@@ -223,7 +235,30 @@ Execute the query and return the results as an Array of Objects
#### Defined in
[query.ts:115](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L115)
[query.ts:127](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L127)
___
### fastSearch
**fastSearch**(`value`): [`Query`](Query.md)\<`T`\>
Skip searching un-indexed data. This can make search faster, but will miss
any data that is not yet indexed.
#### Parameters
| Name | Type |
| :------ | :------ |
| `value` | `boolean` |
#### Returns
[`Query`](Query.md)\<`T`\>
#### Defined in
[query.ts:119](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L119)
___
@@ -245,7 +280,7 @@ A filter statement to be applied to this query.
#### Defined in
[query.ts:82](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L82)
[query.ts:85](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L85)
___
@@ -259,7 +294,7 @@ ___
#### Defined in
[query.ts:142](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L142)
[query.ts:155](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L155)
___
@@ -268,6 +303,7 @@ ___
**limit**(`value`): [`Query`](Query.md)\<`T`\>
Sets the number of results that will be returned
default value is 10
#### Parameters
@@ -281,7 +317,7 @@ Sets the number of results that will be returned
#### Defined in
[query.ts:55](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L55)
[query.ts:58](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L58)
___
@@ -307,7 +343,7 @@ MetricType for the different options
#### Defined in
[query.ts:102](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L102)
[query.ts:105](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L105)
___
@@ -329,7 +365,7 @@ The number of probes used. A higher number makes search more accurate but also s
#### Defined in
[query.ts:73](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L73)
[query.ts:76](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L76)
___
@@ -349,7 +385,7 @@ ___
#### Defined in
[query.ts:107](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L107)
[query.ts:110](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L110)
___
@@ -371,7 +407,7 @@ Refine the results by reading extra elements and re-ranking them in memory.
#### Defined in
[query.ts:64](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L64)
[query.ts:67](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L67)
___
@@ -393,4 +429,4 @@ Return only the specified columns.
#### Defined in
[query.ts:93](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/query.ts#L93)
[query.ts:96](https://github.com/lancedb/lancedb/blob/92179835/node/src/query.ts#L96)

View File

@@ -0,0 +1,52 @@
[vectordb](../README.md) / [Exports](../modules.md) / IndexStatus
# Enumeration: IndexStatus
## Table of contents
### Enumeration Members
- [Done](IndexStatus.md#done)
- [Failed](IndexStatus.md#failed)
- [Indexing](IndexStatus.md#indexing)
- [Pending](IndexStatus.md#pending)
## Enumeration Members
### Done
**Done** = ``"done"``
#### Defined in
[index.ts:713](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L713)
___
### Failed
• **Failed** = ``"failed"``
#### Defined in
[index.ts:714](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L714)
___
### Indexing
• **Indexing** = ``"indexing"``
#### Defined in
[index.ts:712](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L712)
___
### Pending
• **Pending** = ``"pending"``
#### Defined in
[index.ts:711](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L711)

View File

@@ -22,7 +22,7 @@ Cosine distance
#### Defined in
[index.ts:1041](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1041)
[index.ts:1381](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1381)
___
@@ -34,7 +34,7 @@ Dot product
#### Defined in
[index.ts:1046](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1046)
[index.ts:1386](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1386)
___
@@ -46,4 +46,4 @@ Euclidean distance
#### Defined in
[index.ts:1036](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1036)
[index.ts:1376](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1376)

View File

@@ -22,7 +22,7 @@ Append new data to the table.
#### Defined in
[index.ts:1007](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1007)
[index.ts:1347](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1347)
___
@@ -34,7 +34,7 @@ Create a new [Table](../interfaces/Table.md).
#### Defined in
[index.ts:1003](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1003)
[index.ts:1343](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1343)
___
@@ -46,4 +46,4 @@ Overwrite the existing [Table](../interfaces/Table.md) if presented.
#### Defined in
[index.ts:1005](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1005)
[index.ts:1345](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1345)

View File

@@ -18,7 +18,7 @@
#### Defined in
[index.ts:54](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L54)
[index.ts:68](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L68)
___
@@ -28,7 +28,7 @@ ___
#### Defined in
[index.ts:56](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L56)
[index.ts:70](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L70)
___
@@ -38,4 +38,4 @@ ___
#### Defined in
[index.ts:58](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L58)
[index.ts:72](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L72)

View File

@@ -19,7 +19,7 @@ The number of bytes removed from disk.
#### Defined in
[index.ts:878](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L878)
[index.ts:1218](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1218)
___
@@ -31,4 +31,4 @@ The number of old table versions removed.
#### Defined in
[index.ts:882](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L882)
[index.ts:1222](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1222)

View File

@@ -0,0 +1,53 @@
[vectordb](../README.md) / [Exports](../modules.md) / ColumnAlteration
# Interface: ColumnAlteration
A definition of a column alteration. The alteration changes the column at
`path` to have the new name `name`, to be nullable if `nullable` is true,
and to have the data type `data_type`. At least one of `rename` or `nullable`
must be provided.
## Table of contents
### Properties
- [nullable](ColumnAlteration.md#nullable)
- [path](ColumnAlteration.md#path)
- [rename](ColumnAlteration.md#rename)
## Properties
### nullable
`Optional` **nullable**: `boolean`
Set the new nullability. Note that a nullable column cannot be made non-nullable.
#### Defined in
[index.ts:638](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L638)
___
### path
**path**: `string`
The path to the column to alter. This is a dot-separated path to the column.
If it is a top-level column then it is just the name of the column. If it is
a nested column then it is the path to the column, e.g. "a.b.c" for a column
`c` nested inside a column `b` nested inside a column `a`.
#### Defined in
[index.ts:633](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L633)
___
### rename
`Optional` **rename**: `string`
#### Defined in
[index.ts:634](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L634)

View File

@@ -22,7 +22,7 @@ fragments added.
#### Defined in
[index.ts:933](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L933)
[index.ts:1273](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1273)
___
@@ -35,7 +35,7 @@ file.
#### Defined in
[index.ts:928](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L928)
[index.ts:1268](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1268)
___
@@ -47,7 +47,7 @@ The number of new fragments that were created.
#### Defined in
[index.ts:923](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L923)
[index.ts:1263](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1263)
___
@@ -59,4 +59,4 @@ The number of fragments that were removed.
#### Defined in
[index.ts:919](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L919)
[index.ts:1259](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1259)

View File

@@ -24,7 +24,7 @@ Default is true.
#### Defined in
[index.ts:901](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L901)
[index.ts:1241](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1241)
___
@@ -38,7 +38,7 @@ the deleted rows. Default is 10%.
#### Defined in
[index.ts:907](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L907)
[index.ts:1247](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1247)
___
@@ -46,11 +46,11 @@ ___
`Optional` **maxRowsPerGroup**: `number`
The maximum number of rows per group. Defaults to 1024.
The maximum number of T per group. Defaults to 1024.
#### Defined in
[index.ts:895](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L895)
[index.ts:1235](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1235)
___
@@ -63,7 +63,7 @@ the number of cores on the machine.
#### Defined in
[index.ts:912](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L912)
[index.ts:1252](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1252)
___
@@ -77,4 +77,4 @@ Defaults to 1024 * 1024.
#### Defined in
[index.ts:891](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L891)
[index.ts:1231](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1231)

View File

@@ -22,6 +22,7 @@ Connection could be local against filesystem or remote against a server.
- [dropTable](Connection.md#droptable)
- [openTable](Connection.md#opentable)
- [tableNames](Connection.md#tablenames)
- [withMiddleware](Connection.md#withmiddleware)
## Properties
@@ -31,7 +32,7 @@ Connection could be local against filesystem or remote against a server.
#### Defined in
[index.ts:183](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L183)
[index.ts:261](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L261)
## Methods
@@ -59,7 +60,7 @@ Creates a new Table, optionally initializing it with new data.
#### Defined in
[index.ts:207](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L207)
[index.ts:285](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L285)
**createTable**(`name`, `data`): `Promise`\<[`Table`](Table.md)\<`number`[]\>\>
@@ -70,7 +71,7 @@ Creates a new Table and initialize it with new data.
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
#### Returns
@@ -78,7 +79,7 @@ Creates a new Table and initialize it with new data.
#### Defined in
[index.ts:221](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L221)
[index.ts:299](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L299)
**createTable**(`name`, `data`, `options`): `Promise`\<[`Table`](Table.md)\<`number`[]\>\>
@@ -89,7 +90,7 @@ Creates a new Table and initialize it with new data.
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `options` | [`WriteOptions`](WriteOptions.md) | The write options to use when creating the table. |
#### Returns
@@ -98,7 +99,7 @@ Creates a new Table and initialize it with new data.
#### Defined in
[index.ts:233](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L233)
[index.ts:311](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L311)
**createTable**\<`T`\>(`name`, `data`, `embeddings`): `Promise`\<[`Table`](Table.md)\<`T`\>\>
@@ -115,7 +116,7 @@ Creates a new Table and initialize it with new data.
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `embeddings` | [`EmbeddingFunction`](EmbeddingFunction.md)\<`T`\> | An embedding function to use on this table |
#### Returns
@@ -124,7 +125,7 @@ Creates a new Table and initialize it with new data.
#### Defined in
[index.ts:246](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L246)
[index.ts:324](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L324)
**createTable**\<`T`\>(`name`, `data`, `embeddings`, `options`): `Promise`\<[`Table`](Table.md)\<`T`\>\>
@@ -141,7 +142,7 @@ Creates a new Table and initialize it with new data.
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `embeddings` | [`EmbeddingFunction`](EmbeddingFunction.md)\<`T`\> | An embedding function to use on this table |
| `options` | [`WriteOptions`](WriteOptions.md) | The write options to use when creating the table. |
@@ -151,7 +152,7 @@ Creates a new Table and initialize it with new data.
#### Defined in
[index.ts:259](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L259)
[index.ts:337](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L337)
___
@@ -173,7 +174,7 @@ Drop an existing table.
#### Defined in
[index.ts:270](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L270)
[index.ts:348](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L348)
___
@@ -202,7 +203,7 @@ Open a table in the database.
#### Defined in
[index.ts:193](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L193)
[index.ts:271](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L271)
___
@@ -216,4 +217,32 @@ ___
#### Defined in
[index.ts:185](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L185)
[index.ts:263](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L263)
___
### withMiddleware
**withMiddleware**(`middleware`): [`Connection`](Connection.md)
Instrument the behavior of this Connection with middleware.
The middleware will be called in the order they are added.
Currently this functionality is only supported for remote Connections.
#### Parameters
| Name | Type |
| :------ | :------ |
| `middleware` | `HttpMiddleware` |
#### Returns
[`Connection`](Connection.md)
- this Connection instrumented by the passed middleware
#### Defined in
[index.ts:360](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L360)

View File

@@ -10,7 +10,10 @@
- [awsCredentials](ConnectionOptions.md#awscredentials)
- [awsRegion](ConnectionOptions.md#awsregion)
- [hostOverride](ConnectionOptions.md#hostoverride)
- [readConsistencyInterval](ConnectionOptions.md#readconsistencyinterval)
- [region](ConnectionOptions.md#region)
- [storageOptions](ConnectionOptions.md#storageoptions)
- [timeout](ConnectionOptions.md#timeout)
- [uri](ConnectionOptions.md#uri)
## Properties
@@ -19,9 +22,13 @@
`Optional` **apiKey**: `string`
API key for the remote connections
Can also be passed by setting environment variable `LANCEDB_API_KEY`
#### Defined in
[index.ts:81](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L81)
[index.ts:112](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L112)
___
@@ -33,9 +40,14 @@ User provided AWS crednetials.
If not provided, LanceDB will use the default credentials provider chain.
**`Deprecated`**
Pass `aws_access_key_id`, `aws_secret_access_key`, and `aws_session_token`
through `storageOptions` instead.
#### Defined in
[index.ts:75](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L75)
[index.ts:92](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L92)
___
@@ -43,11 +55,15 @@ ___
`Optional` **awsRegion**: `string`
AWS region to connect to. Default is defaultAwsRegion.
AWS region to connect to. Default is defaultAwsRegion
**`Deprecated`**
Pass `region` through `storageOptions` instead.
#### Defined in
[index.ts:78](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L78)
[index.ts:98](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L98)
___
@@ -55,13 +71,33 @@ ___
`Optional` **hostOverride**: `string`
Override the host URL for the remote connections.
Override the host URL for the remote connection.
This is useful for local testing.
#### Defined in
[index.ts:91](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L91)
[index.ts:122](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L122)
___
### readConsistencyInterval
`Optional` **readConsistencyInterval**: `number`
(For LanceDB OSS only): The interval, in seconds, at which to check for
updates to the table from other processes. If None, then consistency is not
checked. For performance reasons, this is the default. For strong
consistency, set this to zero seconds. Then every read will check for
updates from other processes. As a compromise, you can set this to a
non-zero value for eventual consistency. If more than that interval
has passed since the last check, then the table will be checked for updates.
Note: this consistency only applies to read operations. Write operations are
always consistent.
#### Defined in
[index.ts:140](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L140)
___
@@ -69,11 +105,37 @@ ___
`Optional` **region**: `string`
Region to connect
Region to connect. Default is 'us-east-1'
#### Defined in
[index.ts:84](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L84)
[index.ts:115](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L115)
___
### storageOptions
`Optional` **storageOptions**: `Record`\<`string`, `string`\>
User provided options for object storage. For example, S3 credentials or request timeouts.
The various options are described at https://lancedb.github.io/lancedb/guides/storage/
#### Defined in
[index.ts:105](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L105)
___
### timeout
`Optional` **timeout**: `number`
Duration in milliseconds for request timeout. Default = 10,000 (10 seconds)
#### Defined in
[index.ts:127](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L127)
___
@@ -85,8 +147,8 @@ LanceDB database URI.
- `/path/to/database` - local database
- `s3://bucket/path/to/database` or `gs://bucket/path/to/database` - database on cloud storage
- `db://host:port` - remote database (SaaS)
- `db://host:port` - remote database (LanceDB cloud)
#### Defined in
[index.ts:69](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L69)
[index.ts:83](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L83)

View File

@@ -26,7 +26,7 @@
#### Defined in
[index.ts:116](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L116)
[index.ts:163](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L163)
___
@@ -36,7 +36,7 @@ ___
#### Defined in
[index.ts:122](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L122)
[index.ts:169](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L169)
___
@@ -46,7 +46,7 @@ ___
#### Defined in
[index.ts:113](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L113)
[index.ts:160](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L160)
___
@@ -56,7 +56,7 @@ ___
#### Defined in
[index.ts:119](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L119)
[index.ts:166](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L166)
___
@@ -66,4 +66,4 @@ ___
#### Defined in
[index.ts:125](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L125)
[index.ts:172](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L172)

View File

@@ -18,11 +18,29 @@ An embedding function that automatically creates vector representation for a giv
### Properties
- [destColumn](EmbeddingFunction.md#destcolumn)
- [embed](EmbeddingFunction.md#embed)
- [embeddingDataType](EmbeddingFunction.md#embeddingdatatype)
- [embeddingDimension](EmbeddingFunction.md#embeddingdimension)
- [excludeSource](EmbeddingFunction.md#excludesource)
- [sourceColumn](EmbeddingFunction.md#sourcecolumn)
## Properties
### destColumn
`Optional` **destColumn**: `string`
The name of the column that will contain the embedding
By default this is "vector"
#### Defined in
[embedding/embedding_function.ts:49](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L49)
___
### embed
**embed**: (`data`: `T`[]) => `Promise`\<`number`[][]\>
@@ -45,7 +63,54 @@ Creates a vector representation for the given values.
#### Defined in
[embedding/embedding_function.ts:27](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/embedding_function.ts#L27)
[embedding/embedding_function.ts:62](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L62)
___
### embeddingDataType
`Optional` **embeddingDataType**: `Float`\<`Floats`\>
The data type of the embedding
The embedding function should return `number`. This will be converted into
an Arrow float array. By default this will be Float32 but this property can
be used to control the conversion.
#### Defined in
[embedding/embedding_function.ts:33](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L33)
___
### embeddingDimension
`Optional` **embeddingDimension**: `number`
The dimension of the embedding
This is optional, normally this can be determined by looking at the results of
`embed`. If this is not specified, and there is an attempt to apply the embedding
to an empty table, then that process will fail.
#### Defined in
[embedding/embedding_function.ts:42](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L42)
___
### excludeSource
`Optional` **excludeSource**: `boolean`
Should the source column be excluded from the resulting table
By default the source column is included. Set this to true and
only the embedding will be stored.
#### Defined in
[embedding/embedding_function.ts:57](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L57)
___
@@ -57,4 +122,4 @@ The name of the column that will be used as input for the Embedding Function.
#### Defined in
[embedding/embedding_function.ts:22](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/embedding/embedding_function.ts#L22)
[embedding/embedding_function.ts:24](https://github.com/lancedb/lancedb/blob/92179835/node/src/embedding/embedding_function.ts#L24)

View File

@@ -6,18 +6,51 @@
### Properties
- [distanceType](IndexStats.md#distancetype)
- [indexType](IndexStats.md#indextype)
- [numIndexedRows](IndexStats.md#numindexedrows)
- [numIndices](IndexStats.md#numindices)
- [numUnindexedRows](IndexStats.md#numunindexedrows)
## Properties
### distanceType
`Optional` **distanceType**: `string`
#### Defined in
[index.ts:728](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L728)
___
### indexType
**indexType**: `string`
#### Defined in
[index.ts:727](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L727)
___
### numIndexedRows
**numIndexedRows**: ``null`` \| `number`
#### Defined in
[index.ts:478](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L478)
[index.ts:725](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L725)
___
### numIndices
• `Optional` **numIndices**: `number`
#### Defined in
[index.ts:729](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L729)
___
@@ -27,4 +60,4 @@ ___
#### Defined in
[index.ts:479](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L479)
[index.ts:726](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L726)

View File

@@ -29,7 +29,7 @@ The column to be indexed
#### Defined in
[index.ts:942](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L942)
[index.ts:1282](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1282)
___
@@ -41,7 +41,7 @@ Cache size of the index
#### Defined in
[index.ts:991](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L991)
[index.ts:1331](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1331)
___
@@ -53,7 +53,7 @@ A unique name for the index
#### Defined in
[index.ts:947](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L947)
[index.ts:1287](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1287)
___
@@ -65,7 +65,7 @@ The max number of iterations for kmeans training.
#### Defined in
[index.ts:962](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L962)
[index.ts:1302](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1302)
___
@@ -77,7 +77,7 @@ Max number of iterations to train OPQ, if `use_opq` is true.
#### Defined in
[index.ts:981](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L981)
[index.ts:1321](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1321)
___
@@ -89,7 +89,7 @@ Metric type, L2 or Cosine
#### Defined in
[index.ts:952](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L952)
[index.ts:1292](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1292)
___
@@ -101,7 +101,7 @@ The number of bits to present one PQ centroid.
#### Defined in
[index.ts:976](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L976)
[index.ts:1316](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1316)
___
@@ -113,7 +113,7 @@ The number of partitions this index
#### Defined in
[index.ts:957](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L957)
[index.ts:1297](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1297)
___
@@ -125,7 +125,7 @@ Number of subvectors to build PQ code
#### Defined in
[index.ts:972](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L972)
[index.ts:1312](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1312)
___
@@ -137,7 +137,7 @@ Replace an existing index with the same name if it exists.
#### Defined in
[index.ts:986](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L986)
[index.ts:1326](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1326)
___
@@ -147,7 +147,7 @@ ___
#### Defined in
[index.ts:993](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L993)
[index.ts:1333](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1333)
___
@@ -159,4 +159,4 @@ Train as optimized product quantization.
#### Defined in
[index.ts:967](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L967)
[index.ts:1307](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1307)

View File

@@ -0,0 +1,73 @@
[vectordb](../README.md) / [Exports](../modules.md) / MergeInsertArgs
# Interface: MergeInsertArgs
## Table of contents
### Properties
- [whenMatchedUpdateAll](MergeInsertArgs.md#whenmatchedupdateall)
- [whenNotMatchedBySourceDelete](MergeInsertArgs.md#whennotmatchedbysourcedelete)
- [whenNotMatchedInsertAll](MergeInsertArgs.md#whennotmatchedinsertall)
## Properties
### whenMatchedUpdateAll
`Optional` **whenMatchedUpdateAll**: `string` \| `boolean`
If true then rows that exist in both the source table (new data) and
the target table (old data) will be updated, replacing the old row
with the corresponding matching row.
If there are multiple matches then the behavior is undefined.
Currently this causes multiple copies of the row to be created
but that behavior is subject to change.
Optionally, a filter can be specified. This should be an SQL
filter where fields with the prefix "target." refer to fields
in the target table (old data) and fields with the prefix
"source." refer to fields in the source table (new data). For
example, the filter "target.lastUpdated < source.lastUpdated" will
only update matched rows when the incoming `lastUpdated` value is
newer.
Rows that do not match the filter will not be updated. Rows that
do not match the filter do become "not matched" rows.
#### Defined in
[index.ts:690](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L690)
___
### whenNotMatchedBySourceDelete
`Optional` **whenNotMatchedBySourceDelete**: `string` \| `boolean`
If true then rows that exist only in the target table (old data)
will be deleted.
If this is a string then it will be treated as an SQL filter and
only rows that both do not match any row in the source table and
match the given filter will be deleted.
This can be used to replace a selection of existing data with
new data.
#### Defined in
[index.ts:707](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L707)
___
### whenNotMatchedInsertAll
`Optional` **whenNotMatchedInsertAll**: `boolean`
If true then rows that exist only in the source table (new data)
will be inserted into the target table.
#### Defined in
[index.ts:695](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L695)

View File

@@ -25,17 +25,26 @@ A LanceDB Table is the collection of Records. Each Record has one or more vector
- [delete](Table.md#delete)
- [indexStats](Table.md#indexstats)
- [listIndices](Table.md#listindices)
- [mergeInsert](Table.md#mergeinsert)
- [name](Table.md#name)
- [overwrite](Table.md#overwrite)
- [schema](Table.md#schema)
- [search](Table.md#search)
- [update](Table.md#update)
### Methods
- [addColumns](Table.md#addcolumns)
- [alterColumns](Table.md#altercolumns)
- [dropColumns](Table.md#dropcolumns)
- [filter](Table.md#filter)
- [withMiddleware](Table.md#withmiddleware)
## Properties
### add
**add**: (`data`: `Record`\<`string`, `unknown`\>[]) => `Promise`\<`number`\>
**add**: (`data`: `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[]) => `Promise`\<`number`\>
#### Type declaration
@@ -47,7 +56,7 @@ Insert records into this Table.
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
##### Returns
@@ -57,27 +66,33 @@ The number of rows added to the table
#### Defined in
[index.ts:291](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L291)
[index.ts:381](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L381)
___
### countRows
**countRows**: () => `Promise`\<`number`\>
**countRows**: (`filter?`: `string`) => `Promise`\<`number`\>
#### Type declaration
▸ (): `Promise`\<`number`\>
▸ (`filter?`): `Promise`\<`number`\>
Returns the number of rows in this table.
##### Parameters
| Name | Type |
| :------ | :------ |
| `filter?` | `string` |
##### Returns
`Promise`\<`number`\>
#### Defined in
[index.ts:361](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L361)
[index.ts:454](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L454)
___
@@ -107,17 +122,17 @@ VectorIndexParams.
#### Defined in
[index.ts:306](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L306)
[index.ts:398](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L398)
___
### createScalarIndex
**createScalarIndex**: (`column`: `string`, `replace`: `boolean`) => `Promise`\<`void`\>
**createScalarIndex**: (`column`: `string`, `replace?`: `boolean`) => `Promise`\<`void`\>
#### Type declaration
▸ (`column`, `replace`): `Promise`\<`void`\>
▸ (`column`, `replace?`): `Promise`\<`void`\>
Create a scalar index on this Table for the given column
@@ -126,7 +141,7 @@ Create a scalar index on this Table for the given column
| Name | Type | Description |
| :------ | :------ | :------ |
| `column` | `string` | The column to index |
| `replace` | `boolean` | If false, fail if an index already exists on the column Scalar indices, like vector indices, can be used to speed up scans. A scalar index can speed up scans that contain filter expressions on the indexed column. For example, the following scan will be faster if the column `my_col` has a scalar index: ```ts const con = await lancedb.connect('./.lancedb'); const table = await con.openTable('images'); const results = await table.where('my_col = 7').execute(); ``` Scalar indices can also speed up scans containing a vector search and a prefilter: ```ts const con = await lancedb.connect('././lancedb'); const table = await con.openTable('images'); const results = await table.search([1.0, 2.0]).where('my_col != 7').prefilter(true); ``` Scalar indices can only speed up scans for basic filters using equality, comparison, range (e.g. `my_col BETWEEN 0 AND 100`), and set membership (e.g. `my_col IN (0, 1, 2)`) Scalar indices can be used if the filter contains multiple indexed columns and the filter criteria are AND'd or OR'd together (e.g. `my_col < 0 AND other_col> 100`) Scalar indices may be used if the filter contains non-indexed columns but, depending on the structure of the filter, they may not be usable. For example, if the column `not_indexed` does not have a scalar index then the filter `my_col = 0 OR not_indexed = 1` will not be able to use any scalar index on `my_col`. |
| `replace?` | `boolean` | If false, fail if an index already exists on the column it is always set to true for remote connections Scalar indices, like vector indices, can be used to speed up scans. A scalar index can speed up scans that contain filter expressions on the indexed column. For example, the following scan will be faster if the column `my_col` has a scalar index: ```ts const con = await lancedb.connect('./.lancedb'); const table = await con.openTable('images'); const results = await table.where('my_col = 7').execute(); ``` Scalar indices can also speed up scans containing a vector search and a prefilter: ```ts const con = await lancedb.connect('././lancedb'); const table = await con.openTable('images'); const results = await table.search([1.0, 2.0]).where('my_col != 7').prefilter(true); ``` Scalar indices can only speed up scans for basic filters using equality, comparison, range (e.g. `my_col BETWEEN 0 AND 100`), and set membership (e.g. `my_col IN (0, 1, 2)`) Scalar indices can be used if the filter contains multiple indexed columns and the filter criteria are AND'd or OR'd together (e.g. `my_col < 0 AND other_col> 100`) Scalar indices may be used if the filter contains non-indexed columns but, depending on the structure of the filter, they may not be usable. For example, if the column `not_indexed` does not have a scalar index then the filter `my_col = 0 OR not_indexed = 1` will not be able to use any scalar index on `my_col`. |
##### Returns
@@ -142,7 +157,7 @@ await table.createScalarIndex('my_col')
#### Defined in
[index.ts:356](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L356)
[index.ts:449](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L449)
___
@@ -194,17 +209,17 @@ await tbl.countRows() // Returns 1
#### Defined in
[index.ts:395](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L395)
[index.ts:488](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L488)
___
### indexStats
• **indexStats**: (`indexUuid`: `string`) => `Promise`\<[`IndexStats`](IndexStats.md)\>
• **indexStats**: (`indexName`: `string`) => `Promise`\<[`IndexStats`](IndexStats.md)\>
#### Type declaration
▸ (`indexUuid`): `Promise`\<[`IndexStats`](IndexStats.md)\>
▸ (`indexName`): `Promise`\<[`IndexStats`](IndexStats.md)\>
Get statistics about an index.
@@ -212,7 +227,7 @@ Get statistics about an index.
| Name | Type |
| :------ | :------ |
| `indexUuid` | `string` |
| `indexName` | `string` |
##### Returns
@@ -220,7 +235,7 @@ Get statistics about an index.
#### Defined in
[index.ts:438](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L438)
[index.ts:567](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L567)
___
@@ -240,7 +255,57 @@ List the indicies on this table.
#### Defined in
[index.ts:433](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L433)
[index.ts:562](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L562)
___
### mergeInsert
• **mergeInsert**: (`on`: `string`, `data`: `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[], `args`: [`MergeInsertArgs`](MergeInsertArgs.md)) => `Promise`\<`void`\>
#### Type declaration
▸ (`on`, `data`, `args`): `Promise`\<`void`\>
Runs a "merge insert" operation on the table
This operation can add rows, update rows, and remove rows all in a single
transaction. It is a very generic tool that can be used to create
behaviors like "insert if not exists", "update or insert (i.e. upsert)",
or even replace a portion of existing data with new data (e.g. replace
all data where month="january")
The merge insert operation works by combining new data from a
**source table** with existing data in a **target table** by using a
join. There are three categories of records.
"Matched" records are records that exist in both the source table and
the target table. "Not matched" records exist only in the source table
(e.g. these are new data) "Not matched by source" records exist only
in the target table (this is old data)
The MergeInsertArgs can be used to customize what should happen for
each category of data.
Please note that the data may appear to be reordered as part of this
operation. This is because updated rows will be deleted from the
dataset and then reinserted at the end with the new values.
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `on` | `string` | a column to join on. This is how records from the source table and target table are matched. |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | the new data to insert |
| `args` | [`MergeInsertArgs`](MergeInsertArgs.md) | parameters controlling how the operation should behave |
##### Returns
`Promise`\<`void`\>
#### Defined in
[index.ts:553](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L553)
___
@@ -250,13 +315,13 @@ ___
#### Defined in
[index.ts:277](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L277)
[index.ts:367](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L367)
___
### overwrite
• **overwrite**: (`data`: `Record`\<`string`, `unknown`\>[]) => `Promise`\<`number`\>
• **overwrite**: (`data`: `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[]) => `Promise`\<`number`\>
#### Type declaration
@@ -268,7 +333,7 @@ Insert records into this Table, replacing its contents.
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
| `data` | `Table`\<`any`\> \| `Record`\<`string`, `unknown`\>[] | Records to be inserted into the Table |
##### Returns
@@ -278,7 +343,7 @@ The number of rows added to the table
#### Defined in
[index.ts:299](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L299)
[index.ts:389](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L389)
___
@@ -288,7 +353,7 @@ ___
#### Defined in
[index.ts:440](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L440)
[index.ts:571](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L571)
___
@@ -314,7 +379,7 @@ Creates a search query to find the nearest neighbors of the given search term
#### Defined in
[index.ts:283](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L283)
[index.ts:373](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L373)
___
@@ -365,4 +430,123 @@ let results = await tbl.search([1, 1]).execute();
#### Defined in
[index.ts:428](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L428)
[index.ts:521](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L521)
## Methods
### addColumns
▸ **addColumns**(`newColumnTransforms`): `Promise`\<`void`\>
Add new columns with defined values.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `newColumnTransforms` | \{ `name`: `string` ; `valueSql`: `string` }[] | pairs of column names and the SQL expression to use to calculate the value of the new column. These expressions will be evaluated for each row in the table, and can reference existing columns in the table. |
#### Returns
`Promise`\<`void`\>
#### Defined in
[index.ts:582](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L582)
___
### alterColumns
▸ **alterColumns**(`columnAlterations`): `Promise`\<`void`\>
Alter the name or nullability of columns.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `columnAlterations` | [`ColumnAlteration`](ColumnAlteration.md)[] | One or more alterations to apply to columns. |
#### Returns
`Promise`\<`void`\>
#### Defined in
[index.ts:591](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L591)
___
### dropColumns
▸ **dropColumns**(`columnNames`): `Promise`\<`void`\>
Drop one or more columns from the dataset
This is a metadata-only operation and does not remove the data from the
underlying storage. In order to remove the data, you must subsequently
call ``compact_files`` to rewrite the data without the removed columns and
then call ``cleanup_files`` to remove the old files.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `columnNames` | `string`[] | The names of the columns to drop. These can be nested column references (e.g. "a.b.c") or top-level column names (e.g. "a"). |
#### Returns
`Promise`\<`void`\>
#### Defined in
[index.ts:605](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L605)
___
### filter
▸ **filter**(`value`): [`Query`](../classes/Query.md)\<`T`\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `value` | `string` |
#### Returns
[`Query`](../classes/Query.md)\<`T`\>
#### Defined in
[index.ts:569](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L569)
___
### withMiddleware
▸ **withMiddleware**(`middleware`): [`Table`](Table.md)\<`T`\>
Instrument the behavior of this Table with middleware.
The middleware will be called in the order they are added.
Currently this functionality is only supported for remote tables.
#### Parameters
| Name | Type |
| :------ | :------ |
| `middleware` | `HttpMiddleware` |
#### Returns
[`Table`](Table.md)\<`T`\>
- this Table instrumented by the passed middleware
#### Defined in
[index.ts:617](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L617)

View File

@@ -20,7 +20,7 @@ new values to set
#### Defined in
[index.ts:454](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L454)
[index.ts:652](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L652)
___
@@ -33,4 +33,4 @@ in which case all rows will be updated.
#### Defined in
[index.ts:448](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L448)
[index.ts:646](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L646)

View File

@@ -20,7 +20,7 @@ new values to set as SQL expressions.
#### Defined in
[index.ts:468](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L468)
[index.ts:666](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L666)
___
@@ -33,4 +33,4 @@ in which case all rows will be updated.
#### Defined in
[index.ts:462](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L462)
[index.ts:660](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L660)

View File

@@ -8,6 +8,7 @@
- [columns](VectorIndex.md#columns)
- [name](VectorIndex.md#name)
- [status](VectorIndex.md#status)
- [uuid](VectorIndex.md#uuid)
## Properties
@@ -18,7 +19,7 @@
#### Defined in
[index.ts:472](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L472)
[index.ts:718](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L718)
___
@@ -28,7 +29,17 @@ ___
#### Defined in
[index.ts:473](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L473)
[index.ts:719](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L719)
___
### status
**status**: [`IndexStatus`](../enums/IndexStatus.md)
#### Defined in
[index.ts:721](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L721)
___
@@ -38,4 +49,4 @@ ___
#### Defined in
[index.ts:474](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L474)
[index.ts:720](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L720)

View File

@@ -24,4 +24,4 @@ A [WriteMode](../enums/WriteMode.md) to use on this operation
#### Defined in
[index.ts:1015](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1015)
[index.ts:1355](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1355)

View File

@@ -6,6 +6,7 @@
### Enumerations
- [IndexStatus](enums/IndexStatus.md)
- [MetricType](enums/MetricType.md)
- [WriteMode](enums/WriteMode.md)
@@ -14,6 +15,7 @@
- [DefaultWriteOptions](classes/DefaultWriteOptions.md)
- [LocalConnection](classes/LocalConnection.md)
- [LocalTable](classes/LocalTable.md)
- [MakeArrowTableOptions](classes/MakeArrowTableOptions.md)
- [OpenAIEmbeddingFunction](classes/OpenAIEmbeddingFunction.md)
- [Query](classes/Query.md)
@@ -21,6 +23,7 @@
- [AwsCredentials](interfaces/AwsCredentials.md)
- [CleanupStats](interfaces/CleanupStats.md)
- [ColumnAlteration](interfaces/ColumnAlteration.md)
- [CompactionMetrics](interfaces/CompactionMetrics.md)
- [CompactionOptions](interfaces/CompactionOptions.md)
- [Connection](interfaces/Connection.md)
@@ -29,6 +32,7 @@
- [EmbeddingFunction](interfaces/EmbeddingFunction.md)
- [IndexStats](interfaces/IndexStats.md)
- [IvfPQIndexConfig](interfaces/IvfPQIndexConfig.md)
- [MergeInsertArgs](interfaces/MergeInsertArgs.md)
- [Table](interfaces/Table.md)
- [UpdateArgs](interfaces/UpdateArgs.md)
- [UpdateSqlArgs](interfaces/UpdateSqlArgs.md)
@@ -42,7 +46,9 @@
### Functions
- [connect](modules.md#connect)
- [convertToTable](modules.md#converttotable)
- [isWriteOptions](modules.md#iswriteoptions)
- [makeArrowTable](modules.md#makearrowtable)
## Type Aliases
@@ -52,7 +58,7 @@
#### Defined in
[index.ts:996](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L996)
[index.ts:1336](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1336)
## Functions
@@ -62,11 +68,11 @@
Connect to a LanceDB instance at the given URI.
Accpeted formats:
Accepted formats:
- `/path/to/database` - local database
- `s3://bucket/path/to/database` or `gs://bucket/path/to/database` - database on cloud storage
- `db://host:port` - remote database (SaaS)
- `db://host:port` - remote database (LanceDB cloud)
#### Parameters
@@ -84,7 +90,7 @@ Accpeted formats:
#### Defined in
[index.ts:141](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L141)
[index.ts:188](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L188)
**connect**(`opts`): `Promise`\<[`Connection`](interfaces/Connection.md)\>
@@ -102,7 +108,35 @@ Connect to a LanceDB instance with connection options.
#### Defined in
[index.ts:147](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L147)
[index.ts:194](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L194)
___
### convertToTable
**convertToTable**\<`T`\>(`data`, `embeddings?`, `makeTableOptions?`): `Promise`\<`ArrowTable`\>
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type |
| :------ | :------ |
| `data` | `Record`\<`string`, `unknown`\>[] |
| `embeddings?` | [`EmbeddingFunction`](interfaces/EmbeddingFunction.md)\<`T`\> |
| `makeTableOptions?` | `Partial`\<[`MakeArrowTableOptions`](classes/MakeArrowTableOptions.md)\> |
#### Returns
`Promise`\<`ArrowTable`\>
#### Defined in
[arrow.ts:465](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L465)
___
@@ -122,4 +156,116 @@ value is WriteOptions
#### Defined in
[index.ts:1022](https://github.com/lancedb/lancedb/blob/c89d5e6/node/src/index.ts#L1022)
[index.ts:1362](https://github.com/lancedb/lancedb/blob/92179835/node/src/index.ts#L1362)
___
### makeArrowTable
**makeArrowTable**(`data`, `options?`): `ArrowTable`
An enhanced version of the makeTable function from Apache Arrow
that supports nested fields and embeddings columns.
This function converts an array of Record<String, any> (row-major JS objects)
to an Arrow Table (a columnar structure)
Note that it currently does not support nulls.
If a schema is provided then it will be used to determine the resulting array
types. Fields will also be reordered to fit the order defined by the schema.
If a schema is not provided then the types will be inferred and the field order
will be controlled by the order of properties in the first record.
If the input is empty then a schema must be provided to create an empty table.
When a schema is not specified then data types will be inferred. The inference
rules are as follows:
- boolean => Bool
- number => Float64
- String => Utf8
- Buffer => Binary
- Record<String, any> => Struct
- Array<any> => List
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`\<`string`, `any`\>[] | input data |
| `options?` | `Partial`\<[`MakeArrowTableOptions`](classes/MakeArrowTableOptions.md)\> | options to control the makeArrowTable call. |
#### Returns
`ArrowTable`
**`Example`**
```ts
import { fromTableToBuffer, makeArrowTable } from "../arrow";
import { Field, FixedSizeList, Float16, Float32, Int32, Schema } from "apache-arrow";
const schema = new Schema([
new Field("a", new Int32()),
new Field("b", new Float32()),
new Field("c", new FixedSizeList(3, new Field("item", new Float16()))),
]);
const table = makeArrowTable([
{ a: 1, b: 2, c: [1, 2, 3] },
{ a: 4, b: 5, c: [4, 5, 6] },
{ a: 7, b: 8, c: [7, 8, 9] },
], { schema });
```
By default it assumes that the column named `vector` is a vector column
and it will be converted into a fixed size list array of type float32.
The `vectorColumns` option can be used to support other vector column
names and data types.
```ts
const schema = new Schema([
new Field("a", new Float64()),
new Field("b", new Float64()),
new Field(
"vector",
new FixedSizeList(3, new Field("item", new Float32()))
),
]);
const table = makeArrowTable([
{ a: 1, b: 2, vector: [1, 2, 3] },
{ a: 4, b: 5, vector: [4, 5, 6] },
{ a: 7, b: 8, vector: [7, 8, 9] },
]);
assert.deepEqual(table.schema, schema);
```
You can specify the vector column types and names using the options as well
```typescript
const schema = new Schema([
new Field('a', new Float64()),
new Field('b', new Float64()),
new Field('vec1', new FixedSizeList(3, new Field('item', new Float16()))),
new Field('vec2', new FixedSizeList(3, new Field('item', new Float16())))
]);
const table = makeArrowTable([
{ a: 1, b: 2, vec1: [1, 2, 3], vec2: [2, 4, 6] },
{ a: 4, b: 5, vec1: [4, 5, 6], vec2: [8, 10, 12] },
{ a: 7, b: 8, vec1: [7, 8, 9], vec2: [14, 16, 18] }
], {
vectorColumns: {
vec1: { type: new Float16() },
vec2: { type: new Float16() }
}
}
assert.deepEqual(table.schema, schema)
```
#### Defined in
[arrow.ts:198](https://github.com/lancedb/lancedb/blob/92179835/node/src/arrow.ts#L198)

17
docs/src/rag/sfr_rag.md Normal file
View File

@@ -0,0 +1,17 @@
**SFR RAG 📑**
====================================================================
Salesforce AI Research introduces SFR-RAG, a 9-billion-parameter language model trained with a significant emphasis on reliable, precise, and faithful contextual generation abilities specific to real-world RAG use cases and relevant agentic tasks. They include precise factual knowledge extraction, distinguishing relevant against distracting contexts, citing appropriate sources along with answers, producing complex and multi-hop reasoning over multiple contexts, consistent format following, as well as refraining from hallucination over unanswerable queries.
**[Offical Implementation](https://github.com/SalesforceAIResearch/SFR-RAG)**
<figure markdown="span">
![agent-based-rag](https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/rag/salesforce_contextbench.png)
<figcaption>Average Scores in ContextualBench: <a href="https://blog.salesforceairesearch.com/sfr-rag/">Source</a>
</figcaption>
</figure>
To reliably evaluate LLMs in contextual question-answering for RAG, Saleforce introduced [ContextualBench](https://huggingface.co/datasets/Salesforce/ContextualBench?ref=blog.salesforceairesearch.com), featuring 7 benchmarks like [HotpotQA](https://arxiv.org/abs/1809.09600?ref=blog.salesforceairesearch.com) and [2WikiHopQA](https://www.aclweb.org/anthology/2020.coling-main.580/?ref=blog.salesforceairesearch.com) with consistent setups.
SFR-RAG outperforms GPT-4o, achieving state-of-the-art results in 3 out of 7 benchmarks, and significantly surpasses Command-R+ while using 10 times fewer parameters. It also excels at handling context, even when facts are altered or conflicting.
[Saleforce AI Research Blog](https://blog.salesforceairesearch.com/sfr-rag/)

View File

@@ -1,6 +1,9 @@
# Linear Combination Reranker
This is the default re-ranker used by LanceDB hybrid search. It combines the results of semantic and full-text search using a linear combination of the scores. The weights for the linear combination can be specified. It defaults to 0.7, i.e, 70% weight for semantic search and 30% weight for full-text search.
!!! note
This is depricated. It is recommended to use the `RRFReranker` instead, if you want to use a score based reranker.
It combines the results of semantic and full-text search using a linear combination of the scores. The weights for the linear combination can be specified. It defaults to 0.7, i.e, 70% weight for semantic search and 30% weight for full-text search.
!!! note
Supported Query Types: Hybrid

View File

@@ -1,6 +1,6 @@
# Reciprocal Rank Fusion Reranker
Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores by leveraging the positions/rank of the documents. The implementation follows this [paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
This is the default re-ranker used by LanceDB hybrid search. Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores by leveraging the positions/rank of the documents. The implementation follows this [paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
!!! note

View File

@@ -39,4 +39,46 @@
height: 1.2rem;
margin-top: -.1rem;
}
}
}
/* remove pilcrow as permanent link and add chain icon similar to github https://github.com/squidfunk/mkdocs-material/discussions/3535 */
.headerlink {
--permalink-size: 16px; /* for font-relative sizes, 0.6em is a good choice */
--permalink-spacing: 4px;
width: calc(var(--permalink-size) + var(--permalink-spacing));
height: var(--permalink-size);
vertical-align: middle;
background-color: var(--md-default-fg-color--lighter);
background-size: var(--permalink-size);
mask-size: var(--permalink-size);
-webkit-mask-size: var(--permalink-size);
mask-repeat: no-repeat;
-webkit-mask-repeat: no-repeat;
visibility: visible;
mask-image: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg>');
-webkit-mask-image: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg>');
}
[id]:target .headerlink {
background-color: var(--md-typeset-a-color);
}
.headerlink:hover {
background-color: var(--md-accent-fg-color) !important;
}
@media screen and (min-width: 76.25em) {
h1, h2, h3, h4, h5, h6 {
display: flex;
align-items: center;
flex-direction: row;
column-gap: 0.2em; /* fixes spaces in titles */
}
.headerlink {
order: -1;
margin-left: calc(var(--permalink-size) * -1 - var(--permalink-spacing)) !important;
}
}

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.10.0</version>
<version>0.11.1-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.10.0</version>
<version>0.11.1-beta.0</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

1470
node/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"description": " Serverless, low-latency vector database for AI applications",
"main": "dist/index.js",
"types": "dist/index.d.ts",
@@ -58,7 +58,7 @@
"ts-node-dev": "^2.0.0",
"typedoc": "^0.24.7",
"typedoc-plugin-markdown": "^3.15.3",
"typescript": "*",
"typescript": "^5.1.0",
"uuid": "^9.0.0"
},
"dependencies": {
@@ -88,10 +88,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.4.20",
"@lancedb/vectordb-darwin-x64": "0.4.20",
"@lancedb/vectordb-linux-arm64-gnu": "0.4.20",
"@lancedb/vectordb-linux-x64-gnu": "0.4.20",
"@lancedb/vectordb-win32-x64-msvc": "0.4.20"
"@lancedb/vectordb-darwin-arm64": "0.11.1-beta.0",
"@lancedb/vectordb-darwin-x64": "0.11.1-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.11.1-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.11.1-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.11.1-beta.0"
}
}

View File

@@ -220,7 +220,8 @@ export async function connect(
region: partOpts.region ?? defaultRegion,
timeout: partOpts.timeout ?? defaultRequestTimeout,
readConsistencyInterval: partOpts.readConsistencyInterval ?? undefined,
storageOptions: partOpts.storageOptions ?? undefined
storageOptions: partOpts.storageOptions ?? undefined,
hostOverride: partOpts.hostOverride ?? undefined
}
if (opts.uri.startsWith("db://")) {
// Remote connection
@@ -563,7 +564,7 @@ export interface Table<T = number[]> {
/**
* Get statistics about an index.
*/
indexStats: (indexUuid: string) => Promise<IndexStats>
indexStats: (indexName: string) => Promise<IndexStats>
filter(value: string): Query<T>
@@ -723,9 +724,9 @@ export interface VectorIndex {
export interface IndexStats {
numIndexedRows: number | null
numUnindexedRows: number | null
indexType: string | null
distanceType: string | null
completedAt: string | null
indexType: string
distanceType?: string
numIndices?: number
}
/**
@@ -1163,8 +1164,8 @@ export class LocalTable<T = number[]> implements Table<T> {
return tableListIndices.call(this._tbl);
}
async indexStats(indexUuid: string): Promise<IndexStats> {
return tableIndexStats.call(this._tbl, indexUuid);
async indexStats(indexName: string): Promise<IndexStats> {
return tableIndexStats.call(this._tbl, indexName);
}
get schema(): Promise<Schema> {

View File

@@ -14,6 +14,7 @@
import { describe } from 'mocha'
import * as chai from 'chai'
import { assert } from 'chai'
import * as chaiAsPromised from 'chai-as-promised'
import { v4 as uuidv4 } from 'uuid'
@@ -22,7 +23,6 @@ import { tmpdir } from 'os'
import * as fs from 'fs'
import * as path from 'path'
const assert = chai.assert
chai.use(chaiAsPromised)
describe('LanceDB AWS Integration test', function () {

View File

@@ -33,6 +33,7 @@ export class Query<T = number[]> {
private _filter?: string
private _metricType?: MetricType
private _prefilter: boolean
private _fastSearch: boolean
protected readonly _embeddings?: EmbeddingFunction<T>
constructor (query?: T, tbl?: any, embeddings?: EmbeddingFunction<T>) {
@@ -46,6 +47,7 @@ export class Query<T = number[]> {
this._metricType = undefined
this._embeddings = embeddings
this._prefilter = false
this._fastSearch = false
}
/***
@@ -110,6 +112,15 @@ export class Query<T = number[]> {
return this
}
/**
* Skip searching un-indexed data. This can make search faster, but will miss
* any data that is not yet indexed.
*/
fastSearch (value: boolean): Query<T> {
this._fastSearch = value
return this
}
/**
* Execute the query and return the results as an Array of Objects
*/
@@ -131,9 +142,9 @@ export class Query<T = number[]> {
Object.keys(entry).forEach((key: string) => {
if (entry[key] instanceof Vector) {
// toJSON() returns f16 array correctly
newObject[key] = (entry[key] as Vector).toJSON()
newObject[key] = (entry[key] as any).toJSON()
} else {
newObject[key] = entry[key]
newObject[key] = entry[key] as any
}
})
return newObject as unknown as T

View File

@@ -17,6 +17,7 @@ import axios, { type AxiosResponse, type ResponseType } from 'axios'
import { tableFromIPC, type Table as ArrowTable } from 'apache-arrow'
import { type RemoteResponse, type RemoteRequest, Method } from '../middleware'
import type { MetricType } from '..'
interface HttpLancedbClientMiddleware {
onRemoteRequest(
@@ -151,7 +152,9 @@ export class HttpLancedbClient {
prefilter: boolean,
refineFactor?: number,
columns?: string[],
filter?: string
filter?: string,
metricType?: MetricType,
fastSearch?: boolean
): Promise<ArrowTable<any>> {
const result = await this.post(
`/v1/table/${tableName}/query/`,
@@ -159,10 +162,12 @@ export class HttpLancedbClient {
vector,
k,
nprobes,
refineFactor,
refine_factor: refineFactor,
columns,
filter,
prefilter
prefilter,
metric: metricType,
fast_search: fastSearch
},
undefined,
undefined,

View File

@@ -238,16 +238,18 @@ export class RemoteQuery<T = number[]> extends Query<T> {
(this as any)._prefilter,
(this as any)._refineFactor,
(this as any)._select,
(this as any)._filter
(this as any)._filter,
(this as any)._metricType,
(this as any)._fastSearch
)
return data.toArray().map((entry: Record<string, unknown>) => {
const newObject: Record<string, unknown> = {}
Object.keys(entry).forEach((key: string) => {
if (entry[key] instanceof Vector) {
newObject[key] = (entry[key] as Vector).toArray()
newObject[key] = (entry[key] as any).toArray()
} else {
newObject[key] = entry[key]
newObject[key] = entry[key] as any
}
})
return newObject as unknown as T
@@ -515,17 +517,16 @@ export class RemoteTable<T = number[]> implements Table<T> {
}))
}
async indexStats (indexUuid: string): Promise<IndexStats> {
async indexStats (indexName: string): Promise<IndexStats> {
const results = await this._client.post(
`/v1/table/${encodeURIComponent(this._name)}/index/${indexUuid}/stats/`
`/v1/table/${encodeURIComponent(this._name)}/index/${indexName}/stats/`
)
const body = await results.body()
return {
numIndexedRows: body?.num_indexed_rows,
numUnindexedRows: body?.num_unindexed_rows,
indexType: body?.index_type,
distanceType: body?.distance_type,
completedAt: body?.completed_at
distanceType: body?.distance_type
}
}

View File

@@ -14,6 +14,7 @@
import { describe } from "mocha";
import { track } from "temp";
import { assert, expect } from 'chai'
import * as chai from "chai";
import * as chaiAsPromised from "chai-as-promised";
@@ -44,8 +45,6 @@ import {
} from "apache-arrow";
import type { RemoteRequest, RemoteResponse } from "../middleware";
const expect = chai.expect;
const assert = chai.assert;
chai.use(chaiAsPromised);
describe("LanceDB client", function () {
@@ -169,7 +168,7 @@ describe("LanceDB client", function () {
// Should reject a bad filter
await expect(table.filter("id % 2 = 0 AND").execute()).to.be.rejectedWith(
/.*sql parser error: Expected an expression:, found: EOF.*/
/.*sql parser error: .*/
);
});
@@ -888,9 +887,12 @@ describe("LanceDB client", function () {
expect(indices[0].columns).to.have.lengthOf(1);
expect(indices[0].columns[0]).to.equal("vector");
const stats = await table.indexStats(indices[0].uuid);
const stats = await table.indexStats(indices[0].name);
expect(stats.numIndexedRows).to.equal(300);
expect(stats.numUnindexedRows).to.equal(0);
expect(stats.indexType).to.equal("IVF_PQ");
expect(stats.distanceType).to.equal("l2");
expect(stats.numIndices).to.equal(1);
}).timeout(50_000);
});

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.0.0"
version = "0.11.1-beta.0"
license.workspace = true
description.workspace = true
repository.workspace = true
@@ -14,7 +14,7 @@ crate-type = ["cdylib"]
[dependencies]
arrow-ipc.workspace = true
futures.workspace = true
lancedb = { path = "../rust/lancedb" }
lancedb = { path = "../rust/lancedb", features = ["remote"] }
napi = { version = "2.16.8", default-features = false, features = [
"napi9",
"async",

View File

@@ -0,0 +1,93 @@
// Copyright 2024 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
import * as http from "http";
import { RequestListener } from "http";
import { Connection, ConnectionOptions, connect } from "../lancedb";
async function withMockDatabase(
listener: RequestListener,
callback: (db: Connection) => void,
connectionOptions?: ConnectionOptions,
) {
const server = http.createServer(listener);
server.listen(8000);
const db = await connect(
"db://dev",
Object.assign(
{
apiKey: "fake",
hostOverride: "http://localhost:8000",
},
connectionOptions,
),
);
try {
await callback(db);
} finally {
server.close();
}
}
describe("remote connection", () => {
it("should accept partial connection options", async () => {
await connect("db://test", {
apiKey: "fake",
clientConfig: {
timeoutConfig: { readTimeout: 5 },
retryConfig: { retries: 2 },
},
});
});
it("should pass down apiKey and userAgent", async () => {
await withMockDatabase(
(req, res) => {
expect(req.headers["x-api-key"]).toEqual("fake");
expect(req.headers["user-agent"]).toEqual(
`LanceDB-Node-Client/${process.env.npm_package_version}`,
);
const body = JSON.stringify({ tables: [] });
res.writeHead(200, { "Content-Type": "application/json" }).end(body);
},
async (db) => {
const tableNames = await db.tableNames();
expect(tableNames).toEqual([]);
},
);
});
it("allows customizing user agent", async () => {
await withMockDatabase(
(req, res) => {
expect(req.headers["user-agent"]).toEqual("MyApp/1.0");
const body = JSON.stringify({ tables: [] });
res.writeHead(200, { "Content-Type": "application/json" }).end(body);
},
async (db) => {
const tableNames = await db.tableNames();
expect(tableNames).toEqual([]);
},
{
clientConfig: {
userAgent: "MyApp/1.0",
},
},
);
});
});

View File

@@ -479,6 +479,9 @@ describe("When creating an index", () => {
expect(stats).toBeDefined();
expect(stats?.numIndexedRows).toEqual(300);
expect(stats?.numUnindexedRows).toEqual(0);
expect(stats?.distanceType).toBeUndefined();
expect(stats?.indexType).toEqual("BTREE");
expect(stats?.numIndices).toEqual(1);
});
test("when getting stats on non-existent index", async () => {

View File

@@ -23,8 +23,6 @@ import {
Connection as LanceDbConnection,
} from "./native.js";
import { RemoteConnection, RemoteConnectionOptions } from "./remote";
export {
WriteOptions,
WriteMode,
@@ -32,8 +30,10 @@ export {
ColumnAlteration,
ConnectionOptions,
IndexStatistics,
IndexMetadata,
IndexConfig,
ClientConfig,
TimeoutConfig,
RetryConfig,
} from "./native.js";
export {
@@ -88,7 +88,7 @@ export * as embedding from "./embedding";
*/
export async function connect(
uri: string,
opts?: Partial<ConnectionOptions | RemoteConnectionOptions>,
opts?: Partial<ConnectionOptions>,
): Promise<Connection>;
/**
* Connect to a LanceDB instance at the given URI.
@@ -109,13 +109,11 @@ export async function connect(
* ```
*/
export async function connect(
opts: Partial<RemoteConnectionOptions | ConnectionOptions> & { uri: string },
opts: Partial<ConnectionOptions> & { uri: string },
): Promise<Connection>;
export async function connect(
uriOrOptions:
| string
| (Partial<RemoteConnectionOptions | ConnectionOptions> & { uri: string }),
opts: Partial<ConnectionOptions | RemoteConnectionOptions> = {},
uriOrOptions: string | (Partial<ConnectionOptions> & { uri: string }),
opts: Partial<ConnectionOptions> = {},
): Promise<Connection> {
let uri: string | undefined;
if (typeof uriOrOptions !== "string") {
@@ -130,9 +128,6 @@ export async function connect(
throw new Error("uri is required");
}
if (uri?.startsWith("db://")) {
return new RemoteConnection(uri, opts as RemoteConnectionOptions);
}
opts = (opts as ConnectionOptions) ?? {};
(<ConnectionOptions>opts).storageOptions = cleanseStorageOptions(
(<ConnectionOptions>opts).storageOptions,

View File

@@ -1,218 +0,0 @@
// Copyright 2023 LanceDB Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
import axios, {
AxiosError,
type AxiosResponse,
type ResponseType,
} from "axios";
import { Table as ArrowTable } from "../arrow";
import { tableFromIPC } from "../arrow";
import { VectorQuery } from "../query";
export class RestfulLanceDBClient {
#dbName: string;
#region: string;
#apiKey: string;
#hostOverride?: string;
#closed: boolean = false;
#timeout: number = 12 * 1000; // 12 seconds;
#session?: import("axios").AxiosInstance;
constructor(
dbName: string,
apiKey: string,
region: string,
hostOverride?: string,
timeout?: number,
) {
this.#dbName = dbName;
this.#apiKey = apiKey;
this.#region = region;
this.#hostOverride = hostOverride ?? this.#hostOverride;
this.#timeout = timeout ?? this.#timeout;
}
// todo: cache the session.
get session(): import("axios").AxiosInstance {
if (this.#session !== undefined) {
return this.#session;
} else {
return axios.create({
baseURL: this.url,
headers: {
// biome-ignore lint: external API
Authorization: `Bearer ${this.#apiKey}`,
},
transformResponse: decodeErrorData,
timeout: this.#timeout,
});
}
}
get url(): string {
return (
this.#hostOverride ??
`https://${this.#dbName}.${this.#region}.api.lancedb.com`
);
}
get headers(): { [key: string]: string } {
const headers: { [key: string]: string } = {
"x-api-key": this.#apiKey,
"x-request-id": "na",
};
if (this.#region == "local") {
headers["Host"] = `${this.#dbName}.${this.#region}.api.lancedb.com`;
}
if (this.#hostOverride) {
headers["x-lancedb-database"] = this.#dbName;
}
return headers;
}
isOpen(): boolean {
return !this.#closed;
}
private checkNotClosed(): void {
if (this.#closed) {
throw new Error("Connection is closed");
}
}
close(): void {
this.#session = undefined;
this.#closed = true;
}
// biome-ignore lint/suspicious/noExplicitAny: <explanation>
async get(uri: string, params?: Record<string, any>): Promise<any> {
this.checkNotClosed();
uri = new URL(uri, this.url).toString();
let response;
try {
response = await this.session.get(uri, {
headers: this.headers,
params,
});
} catch (e) {
if (e instanceof AxiosError && e.response) {
response = e.response;
} else {
throw e;
}
}
RestfulLanceDBClient.checkStatus(response!);
return response!.data;
}
// biome-ignore lint/suspicious/noExplicitAny: api response
async post(uri: string, body?: any): Promise<any>;
async post(
uri: string,
// biome-ignore lint/suspicious/noExplicitAny: api request
body: any,
additional: {
config?: { responseType: "arraybuffer" };
headers?: Record<string, string>;
params?: Record<string, string>;
},
): Promise<Buffer>;
async post(
uri: string,
// biome-ignore lint/suspicious/noExplicitAny: api request
body?: any,
additional?: {
config?: { responseType: ResponseType };
headers?: Record<string, string>;
params?: Record<string, string>;
},
// biome-ignore lint/suspicious/noExplicitAny: api response
): Promise<any> {
this.checkNotClosed();
uri = new URL(uri, this.url).toString();
additional = Object.assign(
{ config: { responseType: "json" } },
additional,
);
const headers = { ...this.headers, ...additional.headers };
if (!headers["Content-Type"]) {
headers["Content-Type"] = "application/json";
}
let response;
try {
response = await this.session.post(uri, body, {
headers,
responseType: additional!.config!.responseType,
params: new Map(Object.entries(additional.params ?? {})),
});
} catch (e) {
if (e instanceof AxiosError && e.response) {
response = e.response;
} else {
throw e;
}
}
RestfulLanceDBClient.checkStatus(response!);
if (additional!.config!.responseType === "arraybuffer") {
return response!.data;
} else {
return JSON.parse(response!.data);
}
}
async listTables(limit = 10, pageToken = ""): Promise<string[]> {
const json = await this.get("/v1/table", { limit, pageToken });
return json.tables;
}
async query(tableName: string, query: VectorQuery): Promise<ArrowTable> {
const tbl = await this.post(`/v1/table/${tableName}/query`, query, {
config: {
responseType: "arraybuffer",
},
});
return tableFromIPC(tbl);
}
static checkStatus(response: AxiosResponse): void {
if (response.status === 404) {
throw new Error(`Not found: ${response.data}`);
} else if (response.status >= 400 && response.status < 500) {
throw new Error(
`Bad Request: ${response.status}, error: ${response.data}`,
);
} else if (response.status >= 500 && response.status < 600) {
throw new Error(
`Internal Server Error: ${response.status}, error: ${response.data}`,
);
} else if (response.status !== 200) {
throw new Error(
`Unknown Error: ${response.status}, error: ${response.data}`,
);
}
}
}
function decodeErrorData(data: unknown) {
if (Buffer.isBuffer(data)) {
const decoded = data.toString("utf-8");
return decoded;
}
return data;
}

View File

@@ -1,193 +0,0 @@
import { Schema } from "apache-arrow";
import {
Data,
SchemaLike,
fromTableToStreamBuffer,
makeEmptyTable,
} from "../arrow";
import {
Connection,
CreateTableOptions,
OpenTableOptions,
TableNamesOptions,
} from "../connection";
import { Table } from "../table";
import { TTLCache } from "../util";
import { RestfulLanceDBClient } from "./client";
import { RemoteTable } from "./table";
export interface RemoteConnectionOptions {
apiKey?: string;
region?: string;
hostOverride?: string;
timeout?: number;
}
export class RemoteConnection extends Connection {
#dbName: string;
#apiKey: string;
#region: string;
#client: RestfulLanceDBClient;
#tableCache = new TTLCache(300_000);
constructor(
url: string,
{ apiKey, region, hostOverride, timeout }: RemoteConnectionOptions,
) {
super();
apiKey = apiKey ?? process.env.LANCEDB_API_KEY;
region = region ?? process.env.LANCEDB_REGION;
if (!apiKey) {
throw new Error("apiKey is required when connecting to LanceDB Cloud");
}
if (!region) {
throw new Error("region is required when connecting to LanceDB Cloud");
}
const parsed = new URL(url);
if (parsed.protocol !== "db:") {
throw new Error(
`invalid protocol: ${parsed.protocol}, only accepts db://`,
);
}
this.#dbName = parsed.hostname;
this.#apiKey = apiKey;
this.#region = region;
this.#client = new RestfulLanceDBClient(
this.#dbName,
this.#apiKey,
this.#region,
hostOverride,
timeout,
);
}
isOpen(): boolean {
return this.#client.isOpen();
}
close(): void {
return this.#client.close();
}
display(): string {
return `RemoteConnection(${this.#dbName})`;
}
async tableNames(options?: Partial<TableNamesOptions>): Promise<string[]> {
const response = await this.#client.get("/v1/table/", {
limit: options?.limit ?? 10,
// biome-ignore lint/style/useNamingConvention: <explanation>
page_token: options?.startAfter ?? "",
});
const body = await response.body();
for (const table of body.tables) {
this.#tableCache.set(table, true);
}
return body.tables;
}
async openTable(
name: string,
_options?: Partial<OpenTableOptions> | undefined,
): Promise<Table> {
if (this.#tableCache.get(name) === undefined) {
await this.#client.post(
`/v1/table/${encodeURIComponent(name)}/describe/`,
);
this.#tableCache.set(name, true);
}
return new RemoteTable(this.#client, name, this.#dbName);
}
async createTable(
nameOrOptions:
| string
| ({ name: string; data: Data } & Partial<CreateTableOptions>),
data?: Data,
options?: Partial<CreateTableOptions> | undefined,
): Promise<Table> {
if (typeof nameOrOptions !== "string" && "name" in nameOrOptions) {
const { name, data, ...options } = nameOrOptions;
return this.createTable(name, data, options);
}
if (data === undefined) {
throw new Error("data is required");
}
if (options?.mode) {
console.warn(
"option 'mode' is not supported in LanceDB Cloud",
"LanceDB Cloud only supports the default 'create' mode.",
"If the table already exists, an error will be thrown.",
);
}
if (options?.embeddingFunction) {
console.warn(
"embedding_functions is not yet supported on LanceDB Cloud.",
"Please vote https://github.com/lancedb/lancedb/issues/626 ",
"for this feature.",
);
}
const { buf } = await Table.parseTableData(
data,
options,
true /** streaming */,
);
await this.#client.post(
`/v1/table/${encodeURIComponent(nameOrOptions)}/create/`,
buf,
{
config: {
responseType: "arraybuffer",
},
headers: { "Content-Type": "application/vnd.apache.arrow.stream" },
},
);
this.#tableCache.set(nameOrOptions, true);
return new RemoteTable(this.#client, nameOrOptions, this.#dbName);
}
async createEmptyTable(
name: string,
schema: SchemaLike,
options?: Partial<CreateTableOptions> | undefined,
): Promise<Table> {
if (options?.mode) {
console.warn(`mode is not supported on LanceDB Cloud`);
}
if (options?.embeddingFunction) {
console.warn(
"embeddingFunction is not yet supported on LanceDB Cloud.",
"Please vote https://github.com/lancedb/lancedb/issues/626 ",
"for this feature.",
);
}
const emptyTable = makeEmptyTable(schema);
const buf = await fromTableToStreamBuffer(emptyTable);
await this.#client.post(
`/v1/table/${encodeURIComponent(name)}/create/`,
buf,
{
config: {
responseType: "arraybuffer",
},
headers: { "Content-Type": "application/vnd.apache.arrow.stream" },
},
);
this.#tableCache.set(name, true);
return new RemoteTable(this.#client, name, this.#dbName);
}
async dropTable(name: string): Promise<void> {
await this.#client.post(`/v1/table/${encodeURIComponent(name)}/drop/`);
this.#tableCache.delete(name);
}
}

View File

@@ -1,3 +0,0 @@
export { RestfulLanceDBClient } from "./client";
export { type RemoteConnectionOptions, RemoteConnection } from "./connection";
export { RemoteTable } from "./table";

View File

@@ -1,226 +0,0 @@
// Copyright 2023 LanceDB Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
import { Table as ArrowTable } from "apache-arrow";
import { Data, IntoVector } from "../arrow";
import { IndexStatistics } from "..";
import { CreateTableOptions } from "../connection";
import { IndexOptions } from "../indices";
import { MergeInsertBuilder } from "../merge";
import { VectorQuery } from "../query";
import { AddDataOptions, Table, UpdateOptions } from "../table";
import { IntoSql, toSQL } from "../util";
import { RestfulLanceDBClient } from "./client";
export class RemoteTable extends Table {
#client: RestfulLanceDBClient;
#name: string;
// Used in the display() method
#dbName: string;
get #tablePrefix() {
return `/v1/table/${encodeURIComponent(this.#name)}/`;
}
get name(): string {
return this.#name;
}
public constructor(
client: RestfulLanceDBClient,
tableName: string,
dbName: string,
) {
super();
this.#client = client;
this.#name = tableName;
this.#dbName = dbName;
}
isOpen(): boolean {
return !this.#client.isOpen();
}
close(): void {
this.#client.close();
}
display(): string {
return `RemoteTable(${this.#dbName}; ${this.#name})`;
}
async schema(): Promise<import("apache-arrow").Schema> {
const resp = await this.#client.post(`${this.#tablePrefix}/describe/`);
// TODO: parse this into a valid arrow schema
return resp.schema;
}
async add(data: Data, options?: Partial<AddDataOptions>): Promise<void> {
const { buf, mode } = await Table.parseTableData(
data,
options as CreateTableOptions,
true,
);
await this.#client.post(`${this.#tablePrefix}/insert/`, buf, {
params: {
mode,
},
headers: {
"Content-Type": "application/vnd.apache.arrow.stream",
},
});
}
async update(
optsOrUpdates:
| (Map<string, string> | Record<string, string>)
| ({
values: Map<string, IntoSql> | Record<string, IntoSql>;
} & Partial<UpdateOptions>)
| ({
valuesSql: Map<string, string> | Record<string, string>;
} & Partial<UpdateOptions>),
options?: Partial<UpdateOptions>,
): Promise<void> {
const isValues =
"values" in optsOrUpdates && typeof optsOrUpdates.values !== "string";
const isValuesSql =
"valuesSql" in optsOrUpdates &&
typeof optsOrUpdates.valuesSql !== "string";
const isMap = (obj: unknown): obj is Map<string, string> => {
return obj instanceof Map;
};
let predicate;
let columns: [string, string][];
switch (true) {
case isMap(optsOrUpdates):
columns = Array.from(optsOrUpdates.entries());
predicate = options?.where;
break;
case isValues && isMap(optsOrUpdates.values):
columns = Array.from(optsOrUpdates.values.entries()).map(([k, v]) => [
k,
toSQL(v),
]);
predicate = optsOrUpdates.where;
break;
case isValues && !isMap(optsOrUpdates.values):
columns = Object.entries(optsOrUpdates.values).map(([k, v]) => [
k,
toSQL(v),
]);
predicate = optsOrUpdates.where;
break;
case isValuesSql && isMap(optsOrUpdates.valuesSql):
columns = Array.from(optsOrUpdates.valuesSql.entries());
predicate = optsOrUpdates.where;
break;
case isValuesSql && !isMap(optsOrUpdates.valuesSql):
columns = Object.entries(optsOrUpdates.valuesSql).map(([k, v]) => [
k,
v,
]);
predicate = optsOrUpdates.where;
break;
default:
columns = Object.entries(optsOrUpdates as Record<string, string>);
predicate = options?.where;
}
await this.#client.post(`${this.#tablePrefix}/update/`, {
predicate: predicate ?? null,
updates: columns,
});
}
async countRows(filter?: unknown): Promise<number> {
const payload = { predicate: filter };
return await this.#client.post(`${this.#tablePrefix}/count_rows/`, payload);
}
async delete(predicate: unknown): Promise<void> {
const payload = { predicate };
await this.#client.post(`${this.#tablePrefix}/delete/`, payload);
}
async createIndex(
column: string,
options?: Partial<IndexOptions>,
): Promise<void> {
if (options !== undefined) {
console.warn("options are not yet supported on the LanceDB cloud");
}
const indexType = "vector";
const metric = "L2";
const data = {
column,
// biome-ignore lint/style/useNamingConvention: external API
index_type: indexType,
// biome-ignore lint/style/useNamingConvention: external API
metric_type: metric,
};
await this.#client.post(`${this.#tablePrefix}/create_index`, data);
}
query(): import("..").Query {
throw new Error("query() is not yet supported on the LanceDB cloud");
}
search(_query: string | IntoVector): VectorQuery {
throw new Error("search() is not yet supported on the LanceDB cloud");
}
vectorSearch(_vector: unknown): import("..").VectorQuery {
throw new Error("vectorSearch() is not yet supported on the LanceDB cloud");
}
addColumns(_newColumnTransforms: unknown): Promise<void> {
throw new Error("addColumns() is not yet supported on the LanceDB cloud");
}
alterColumns(_columnAlterations: unknown): Promise<void> {
throw new Error("alterColumns() is not yet supported on the LanceDB cloud");
}
dropColumns(_columnNames: unknown): Promise<void> {
throw new Error("dropColumns() is not yet supported on the LanceDB cloud");
}
async version(): Promise<number> {
const resp = await this.#client.post(`${this.#tablePrefix}/describe/`);
return resp.version;
}
checkout(_version: unknown): Promise<void> {
throw new Error("checkout() is not yet supported on the LanceDB cloud");
}
checkoutLatest(): Promise<void> {
throw new Error(
"checkoutLatest() is not yet supported on the LanceDB cloud",
);
}
restore(): Promise<void> {
throw new Error("restore() is not yet supported on the LanceDB cloud");
}
optimize(_options?: unknown): Promise<import("../native").OptimizeStats> {
throw new Error("optimize() is not yet supported on the LanceDB cloud");
}
async listIndices(): Promise<import("../native").IndexConfig[]> {
return await this.#client.post(`${this.#tablePrefix}/index/list/`);
}
toArrow(): Promise<ArrowTable> {
throw new Error("toArrow() is not yet supported on the LanceDB cloud");
}
mergeInsert(_on: string | string[]): MergeInsertBuilder {
throw new Error("mergeInsert() is not yet supported on the LanceDB cloud");
}
async indexStats(_name: string): Promise<IndexStatistics | undefined> {
throw new Error("indexStats() is not yet supported on the LanceDB cloud");
}
}

208
nodejs/native.d.ts vendored
View File

@@ -1,208 +0,0 @@
/* tslint:disable */
/* eslint-disable */
/* auto-generated by NAPI-RS */
/** A description of an index currently configured on a column */
export interface IndexConfig {
/** The name of the index */
name: string
/** The type of the index */
indexType: string
/**
* The columns in the index
*
* Currently this is always an array of size 1. In the future there may
* be more columns to represent composite indices.
*/
columns: Array<string>
}
/** Statistics about a compaction operation. */
export interface CompactionStats {
/** The number of fragments removed */
fragmentsRemoved: number
/** The number of new, compacted fragments added */
fragmentsAdded: number
/** The number of data files removed */
filesRemoved: number
/** The number of new, compacted data files added */
filesAdded: number
}
/** Statistics about a cleanup operation */
export interface RemovalStats {
/** The number of bytes removed */
bytesRemoved: number
/** The number of old versions removed */
oldVersionsRemoved: number
}
/** Statistics about an optimize operation */
export interface OptimizeStats {
/** Statistics about the compaction operation */
compaction: CompactionStats
/** Statistics about the removal operation */
prune: RemovalStats
}
/**
* A definition of a column alteration. The alteration changes the column at
* `path` to have the new name `name`, to be nullable if `nullable` is true,
* and to have the data type `data_type`. At least one of `rename` or `nullable`
* must be provided.
*/
export interface ColumnAlteration {
/**
* The path to the column to alter. This is a dot-separated path to the column.
* If it is a top-level column then it is just the name of the column. If it is
* a nested column then it is the path to the column, e.g. "a.b.c" for a column
* `c` nested inside a column `b` nested inside a column `a`.
*/
path: string
/**
* The new name of the column. If not provided then the name will not be changed.
* This must be distinct from the names of all other columns in the table.
*/
rename?: string
/** Set the new nullability. Note that a nullable column cannot be made non-nullable. */
nullable?: boolean
}
/** A definition of a new column to add to a table. */
export interface AddColumnsSql {
/** The name of the new column. */
name: string
/**
* The values to populate the new column with, as a SQL expression.
* The expression can reference other columns in the table.
*/
valueSql: string
}
export interface IndexStatistics {
/** The number of rows indexed by the index */
numIndexedRows: number
/** The number of rows not indexed */
numUnindexedRows: number
/** The type of the index */
indexType?: string
/** The metadata for each index */
indices: Array<IndexMetadata>
}
export interface IndexMetadata {
metricType?: string
indexType?: string
}
export interface ConnectionOptions {
/**
* (For LanceDB OSS only): The interval, in seconds, at which to check for
* updates to the table from other processes. If None, then consistency is not
* checked. For performance reasons, this is the default. For strong
* consistency, set this to zero seconds. Then every read will check for
* updates from other processes. As a compromise, you can set this to a
* non-zero value for eventual consistency. If more than that interval
* has passed since the last check, then the table will be checked for updates.
* Note: this consistency only applies to read operations. Write operations are
* always consistent.
*/
readConsistencyInterval?: number
/**
* (For LanceDB OSS only): configuration for object storage.
*
* The available options are described at https://lancedb.github.io/lancedb/guides/storage/
*/
storageOptions?: Record<string, string>
}
/** Write mode for writing a table. */
export const enum WriteMode {
Create = 'Create',
Append = 'Append',
Overwrite = 'Overwrite'
}
/** Write options when creating a Table. */
export interface WriteOptions {
/** Write mode for writing to a table. */
mode?: WriteMode
}
export interface OpenTableOptions {
storageOptions?: Record<string, string>
}
export class Connection {
/** Create a new Connection instance from the given URI. */
static new(uri: string, options: ConnectionOptions): Promise<Connection>
display(): string
isOpen(): boolean
close(): void
/** List all tables in the dataset. */
tableNames(startAfter?: string | undefined | null, limit?: number | undefined | null): Promise<Array<string>>
/**
* Create table from a Apache Arrow IPC (file) buffer.
*
* Parameters:
* - name: The name of the table.
* - buf: The buffer containing the IPC file.
*
*/
createTable(name: string, buf: Buffer, mode: string, storageOptions?: Record<string, string> | undefined | null, useLegacyFormat?: boolean | undefined | null): Promise<Table>
createEmptyTable(name: string, schemaBuf: Buffer, mode: string, storageOptions?: Record<string, string> | undefined | null, useLegacyFormat?: boolean | undefined | null): Promise<Table>
openTable(name: string, storageOptions?: Record<string, string> | undefined | null, indexCacheSize?: number | undefined | null): Promise<Table>
/** Drop table with the name. Or raise an error if the table does not exist. */
dropTable(name: string): Promise<void>
}
export class Index {
static ivfPq(distanceType?: string | undefined | null, numPartitions?: number | undefined | null, numSubVectors?: number | undefined | null, maxIterations?: number | undefined | null, sampleRate?: number | undefined | null): Index
static btree(): Index
}
/** Typescript-style Async Iterator over RecordBatches */
export class RecordBatchIterator {
next(): Promise<Buffer | null>
}
/** A builder used to create and run a merge insert operation */
export class NativeMergeInsertBuilder {
whenMatchedUpdateAll(condition?: string | undefined | null): NativeMergeInsertBuilder
whenNotMatchedInsertAll(): NativeMergeInsertBuilder
whenNotMatchedBySourceDelete(filter?: string | undefined | null): NativeMergeInsertBuilder
execute(buf: Buffer): Promise<void>
}
export class Query {
onlyIf(predicate: string): void
select(columns: Array<[string, string]>): void
limit(limit: number): void
nearestTo(vector: Float32Array): VectorQuery
execute(maxBatchLength?: number | undefined | null): Promise<RecordBatchIterator>
explainPlan(verbose: boolean): Promise<string>
}
export class VectorQuery {
column(column: string): void
distanceType(distanceType: string): void
postfilter(): void
refineFactor(refineFactor: number): void
nprobes(nprobe: number): void
bypassVectorIndex(): void
onlyIf(predicate: string): void
select(columns: Array<[string, string]>): void
limit(limit: number): void
execute(maxBatchLength?: number | undefined | null): Promise<RecordBatchIterator>
explainPlan(verbose: boolean): Promise<string>
}
export class Table {
name: string
display(): string
isOpen(): boolean
close(): void
/** Return Schema as empty Arrow IPC file. */
schema(): Promise<Buffer>
add(buf: Buffer, mode: string): Promise<void>
countRows(filter?: string | undefined | null): Promise<number>
delete(predicate: string): Promise<void>
createIndex(index: Index | undefined | null, column: string, replace?: boolean | undefined | null): Promise<void>
update(onlyIf: string | undefined | null, columns: Array<[string, string]>): Promise<void>
query(): Query
vectorSearch(vector: Float32Array): VectorQuery
addColumns(transforms: Array<AddColumnsSql>): Promise<void>
alterColumns(alterations: Array<ColumnAlteration>): Promise<void>
dropColumns(columns: Array<string>): Promise<void>
version(): Promise<number>
checkout(version: number): Promise<void>
checkoutLatest(): Promise<void>
restore(): Promise<void>
optimize(olderThanMs?: number | undefined | null): Promise<OptimizeStats>
listIndices(): Promise<Array<IndexConfig>>
indexStats(indexName: string): Promise<IndexStatistics | null>
mergeInsert(on: Array<string>): NativeMergeInsertBuilder
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.10.0",
"version": "0.11.1-beta.0",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.10.0-beta.1",
"version": "0.11.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.10.0-beta.1",
"version": "0.11.0",
"cpu": [
"x64",
"arm64"
@@ -18,7 +18,6 @@
"win32"
],
"dependencies": {
"axios": "^1.7.2",
"reflect-metadata": "^0.2.2"
},
"devDependencies": {
@@ -30,6 +29,7 @@
"@napi-rs/cli": "^2.18.3",
"@types/axios": "^0.14.0",
"@types/jest": "^29.1.2",
"@types/node": "^22.7.4",
"@types/tmp": "^0.2.6",
"apache-arrow-13": "npm:apache-arrow@13.0.0",
"apache-arrow-14": "npm:apache-arrow@14.0.0",
@@ -4648,11 +4648,12 @@
"optional": true
},
"node_modules/@types/node": {
"version": "20.14.11",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.14.11.tgz",
"integrity": "sha512-kprQpL8MMeszbz6ojB5/tU8PLN4kesnN8Gjzw349rDlNgsSzg90lAVj3llK99Dh7JON+t9AuscPPFW6mPbTnSA==",
"version": "22.7.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-22.7.4.tgz",
"integrity": "sha512-y+NPi1rFzDs1NdQHHToqeiX2TIS79SWEAw9GYhkkx8bD0ChpfqC+n2j5OXOCpzfojBEBt6DnEnnG9MY0zk1XLg==",
"devOptional": true,
"dependencies": {
"undici-types": "~5.26.4"
"undici-types": "~6.19.2"
}
},
"node_modules/@types/node-fetch": {
@@ -4665,6 +4666,12 @@
"form-data": "^4.0.0"
}
},
"node_modules/@types/node/node_modules/undici-types": {
"version": "6.19.8",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.19.8.tgz",
"integrity": "sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==",
"devOptional": true
},
"node_modules/@types/pad-left": {
"version": "2.1.1",
"resolved": "https://registry.npmjs.org/@types/pad-left/-/pad-left-2.1.1.tgz",
@@ -4963,6 +4970,21 @@
"arrow2csv": "bin/arrow2csv.cjs"
}
},
"node_modules/apache-arrow-15/node_modules/@types/node": {
"version": "20.16.10",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.16.10.tgz",
"integrity": "sha512-vQUKgWTjEIRFCvK6CyriPH3MZYiYlNy0fKiEYHWbcoWLEgs4opurGGKlebrTLqdSMIbXImH6XExNiIyNUv3WpA==",
"dev": true,
"dependencies": {
"undici-types": "~6.19.2"
}
},
"node_modules/apache-arrow-15/node_modules/undici-types": {
"version": "6.19.8",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.19.8.tgz",
"integrity": "sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==",
"dev": true
},
"node_modules/apache-arrow-16": {
"name": "apache-arrow",
"version": "16.0.0",
@@ -4984,6 +5006,21 @@
"arrow2csv": "bin/arrow2csv.cjs"
}
},
"node_modules/apache-arrow-16/node_modules/@types/node": {
"version": "20.16.10",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.16.10.tgz",
"integrity": "sha512-vQUKgWTjEIRFCvK6CyriPH3MZYiYlNy0fKiEYHWbcoWLEgs4opurGGKlebrTLqdSMIbXImH6XExNiIyNUv3WpA==",
"dev": true,
"dependencies": {
"undici-types": "~6.19.2"
}
},
"node_modules/apache-arrow-16/node_modules/undici-types": {
"version": "6.19.8",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.19.8.tgz",
"integrity": "sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==",
"dev": true
},
"node_modules/apache-arrow-17": {
"name": "apache-arrow",
"version": "17.0.0",
@@ -5011,12 +5048,42 @@
"integrity": "sha512-BwR5KP3Es/CSht0xqBcUXS3qCAUVXwpRKsV2+arxeb65atasuXG9LykC9Ab10Cw3s2raH92ZqOeILaQbsB2ACg==",
"dev": true
},
"node_modules/apache-arrow-17/node_modules/@types/node": {
"version": "20.16.10",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.16.10.tgz",
"integrity": "sha512-vQUKgWTjEIRFCvK6CyriPH3MZYiYlNy0fKiEYHWbcoWLEgs4opurGGKlebrTLqdSMIbXImH6XExNiIyNUv3WpA==",
"dev": true,
"dependencies": {
"undici-types": "~6.19.2"
}
},
"node_modules/apache-arrow-17/node_modules/flatbuffers": {
"version": "24.3.25",
"resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-24.3.25.tgz",
"integrity": "sha512-3HDgPbgiwWMI9zVB7VYBHaMrbOO7Gm0v+yD2FV/sCKj+9NDeVL7BOBYUuhWAQGKWOzBo8S9WdMvV0eixO233XQ==",
"dev": true
},
"node_modules/apache-arrow-17/node_modules/undici-types": {
"version": "6.19.8",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.19.8.tgz",
"integrity": "sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==",
"dev": true
},
"node_modules/apache-arrow/node_modules/@types/node": {
"version": "20.16.10",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.16.10.tgz",
"integrity": "sha512-vQUKgWTjEIRFCvK6CyriPH3MZYiYlNy0fKiEYHWbcoWLEgs4opurGGKlebrTLqdSMIbXImH6XExNiIyNUv3WpA==",
"peer": true,
"dependencies": {
"undici-types": "~6.19.2"
}
},
"node_modules/apache-arrow/node_modules/undici-types": {
"version": "6.19.8",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.19.8.tgz",
"integrity": "sha512-ve2KP6f/JnbPBFyobGHuerC9g1FYGn/F8n1LWTwNxCEzd6IfqTwUQcNXgEtmmQ6DlRrC1hrSrBnCZPokRrDHjw==",
"peer": true
},
"node_modules/argparse": {
"version": "1.0.10",
"resolved": "https://registry.npmjs.org/argparse/-/argparse-1.0.10.tgz",
@@ -5046,12 +5113,14 @@
"node_modules/asynckit": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==",
"devOptional": true
},
"node_modules/axios": {
"version": "1.7.2",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.7.2.tgz",
"integrity": "sha512-2A8QhOMrbomlDuiLeK9XibIBzuHeRcqqNOHp0Cyp5EoJ1IFDh+XZH3A6BkXtv0K4gFGCI0Y4BM7B1wOEi0Rmgw==",
"dev": true,
"dependencies": {
"follow-redirects": "^1.15.6",
"form-data": "^4.0.0",
@@ -5536,6 +5605,7 @@
"version": "1.0.8",
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
"devOptional": true,
"dependencies": {
"delayed-stream": "~1.0.0"
},
@@ -5723,6 +5793,7 @@
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
"devOptional": true,
"engines": {
"node": ">=0.4.0"
}
@@ -6248,6 +6319,7 @@
"version": "1.15.6",
"resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.6.tgz",
"integrity": "sha512-wWN62YITEaOpSK584EZXJafH1AGpO8RVgElfkuXbTOrPX4fIfOyEpW/CsiNd8JdYrAoOvafRTOEnvsO++qCqFA==",
"dev": true,
"funding": [
{
"type": "individual",
@@ -6267,6 +6339,7 @@
"version": "4.0.0",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz",
"integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==",
"devOptional": true,
"dependencies": {
"asynckit": "^0.4.0",
"combined-stream": "^1.0.8",
@@ -7773,6 +7846,7 @@
"version": "1.52.0",
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
"devOptional": true,
"engines": {
"node": ">= 0.6"
}
@@ -7781,6 +7855,7 @@
"version": "2.1.35",
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
"devOptional": true,
"dependencies": {
"mime-db": "1.52.0"
},
@@ -8393,7 +8468,8 @@
"node_modules/proxy-from-env": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz",
"integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg=="
"integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==",
"dev": true
},
"node_modules/pump": {
"version": "3.0.0",
@@ -9561,7 +9637,8 @@
"node_modules/undici-types": {
"version": "5.26.5",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-5.26.5.tgz",
"integrity": "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA=="
"integrity": "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA==",
"optional": true
},
"node_modules/update-browserslist-db": {
"version": "1.0.13",

View File

@@ -10,7 +10,7 @@
"vector database",
"ann"
],
"version": "0.10.0",
"version": "0.11.1-beta.0",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
@@ -40,6 +40,7 @@
"@napi-rs/cli": "^2.18.3",
"@types/axios": "^0.14.0",
"@types/jest": "^29.1.2",
"@types/node": "^22.7.4",
"@types/tmp": "^0.2.6",
"apache-arrow-13": "npm:apache-arrow@13.0.0",
"apache-arrow-14": "npm:apache-arrow@14.0.0",
@@ -66,8 +67,8 @@
"os": ["darwin", "linux", "win32"],
"scripts": {
"artifacts": "napi artifacts",
"build:debug": "napi build --platform --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
"build:release": "napi build --platform --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
"build:debug": "napi build --platform --no-const-enum --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
"build:release": "napi build --platform --no-const-enum --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
"build": "npm run build:debug && tsc -b && shx cp lancedb/native.d.ts dist/native.d.ts && shx cp lancedb/*.node dist/",
"build-release": "npm run build:release && tsc -b && shx cp lancedb/native.d.ts dist/native.d.ts",
"lint-ci": "biome ci .",
@@ -81,7 +82,6 @@
"version": "napi version"
},
"dependencies": {
"axios": "^1.7.2",
"reflect-metadata": "^0.2.2"
},
"optionalDependencies": {

View File

@@ -68,6 +68,24 @@ impl Connection {
builder = builder.storage_option(key, value);
}
}
let client_config = options.client_config.unwrap_or_default();
builder = builder.client_config(client_config.into());
if let Some(api_key) = options.api_key {
builder = builder.api_key(&api_key);
}
if let Some(region) = options.region {
builder = builder.region(&region);
} else {
builder = builder.region("us-east-1");
}
if let Some(host_override) = options.host_override {
builder = builder.host_override(&host_override);
}
Ok(Self::inner_new(
builder
.execute()

View File

@@ -22,6 +22,7 @@ mod index;
mod iterator;
pub mod merge;
mod query;
pub mod remote;
mod table;
mod util;
@@ -42,6 +43,19 @@ pub struct ConnectionOptions {
///
/// The available options are described at https://lancedb.github.io/lancedb/guides/storage/
pub storage_options: Option<HashMap<String, String>>,
/// (For LanceDB cloud only): configuration for the remote HTTP client.
pub client_config: Option<remote::ClientConfig>,
/// (For LanceDB cloud only): the API key to use with LanceDB Cloud.
///
/// Can also be set via the environment variable `LANCEDB_API_KEY`.
pub api_key: Option<String>,
/// (For LanceDB cloud only): the region to use for LanceDB cloud.
/// Defaults to 'us-east-1'.
pub region: Option<String>,
/// (For LanceDB cloud only): the host to use for LanceDB cloud. Used
/// for testing purposes.
pub host_override: Option<String>,
}
/// Write mode for writing a table.

120
nodejs/src/remote.rs Normal file
View File

@@ -0,0 +1,120 @@
// Copyright 2024 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use napi_derive::*;
/// Timeout configuration for remote HTTP client.
#[napi(object)]
#[derive(Debug)]
pub struct TimeoutConfig {
/// The timeout for establishing a connection in seconds. Default is 120
/// seconds (2 minutes). This can also be set via the environment variable
/// `LANCE_CLIENT_CONNECT_TIMEOUT`, as an integer number of seconds.
pub connect_timeout: Option<f64>,
/// The timeout for reading data from the server in seconds. Default is 300
/// seconds (5 minutes). This can also be set via the environment variable
/// `LANCE_CLIENT_READ_TIMEOUT`, as an integer number of seconds.
pub read_timeout: Option<f64>,
/// The timeout for keeping idle connections in the connection pool in seconds.
/// Default is 300 seconds (5 minutes). This can also be set via the
/// environment variable `LANCE_CLIENT_CONNECTION_TIMEOUT`, as an integer
/// number of seconds.
pub pool_idle_timeout: Option<f64>,
}
/// Retry configuration for the remote HTTP client.
#[napi(object)]
#[derive(Debug)]
pub struct RetryConfig {
/// The maximum number of retries for a request. Default is 3. You can also
/// set this via the environment variable `LANCE_CLIENT_MAX_RETRIES`.
pub retries: Option<u8>,
/// The maximum number of retries for connection errors. Default is 3. You
/// can also set this via the environment variable `LANCE_CLIENT_CONNECT_RETRIES`.
pub connect_retries: Option<u8>,
/// The maximum number of retries for read errors. Default is 3. You can also
/// set this via the environment variable `LANCE_CLIENT_READ_RETRIES`.
pub read_retries: Option<u8>,
/// The backoff factor to apply between retries. Default is 0.25. Between each retry
/// the client will wait for the amount of seconds:
/// `{backoff factor} * (2 ** ({number of previous retries}))`. So for the default
/// of 0.25, the first retry will wait 0.25 seconds, the second retry will wait 0.5
/// seconds, the third retry will wait 1 second, etc.
///
/// You can also set this via the environment variable
/// `LANCE_CLIENT_RETRY_BACKOFF_FACTOR`.
pub backoff_factor: Option<f64>,
/// The jitter to apply to the backoff factor, in seconds. Default is 0.25.
///
/// A random value between 0 and `backoff_jitter` will be added to the backoff
/// factor in seconds. So for the default of 0.25 seconds, between 0 and 250
/// milliseconds will be added to the sleep between each retry.
///
/// You can also set this via the environment variable
/// `LANCE_CLIENT_RETRY_BACKOFF_JITTER`.
pub backoff_jitter: Option<f64>,
/// The HTTP status codes for which to retry the request. Default is
/// [429, 500, 502, 503].
///
/// You can also set this via the environment variable
/// `LANCE_CLIENT_RETRY_STATUSES`. Use a comma-separated list of integers.
pub statuses: Option<Vec<u16>>,
}
#[napi(object)]
#[derive(Debug, Default)]
pub struct ClientConfig {
pub user_agent: Option<String>,
pub retry_config: Option<RetryConfig>,
pub timeout_config: Option<TimeoutConfig>,
}
impl From<TimeoutConfig> for lancedb::remote::TimeoutConfig {
fn from(config: TimeoutConfig) -> Self {
Self {
connect_timeout: config
.connect_timeout
.map(std::time::Duration::from_secs_f64),
read_timeout: config.read_timeout.map(std::time::Duration::from_secs_f64),
pool_idle_timeout: config
.pool_idle_timeout
.map(std::time::Duration::from_secs_f64),
}
}
}
impl From<RetryConfig> for lancedb::remote::RetryConfig {
fn from(config: RetryConfig) -> Self {
Self {
retries: config.retries,
connect_retries: config.connect_retries,
read_retries: config.read_retries,
backoff_factor: config.backoff_factor.map(|v| v as f32),
backoff_jitter: config.backoff_jitter.map(|v| v as f32),
statuses: config.statuses,
}
}
}
impl From<ClientConfig> for lancedb::remote::ClientConfig {
fn from(config: ClientConfig) -> Self {
Self {
user_agent: config
.user_agent
.unwrap_or(concat!("LanceDB-Node-Client/", env!("CARGO_PKG_VERSION")).to_string()),
retry_config: config.retry_config.map(Into::into).unwrap_or_default(),
timeout_config: config.timeout_config.map(Into::into).unwrap_or_default(),
}
}
}

View File

@@ -337,7 +337,7 @@ impl Table {
#[napi(catch_unwind)]
pub async fn index_stats(&self, index_name: String) -> napi::Result<Option<IndexStatistics>> {
let tbl = self.inner_ref()?.as_native().unwrap();
let tbl = self.inner_ref()?;
let stats = tbl.index_stats(&index_name).await.default_error()?;
Ok(stats.map(IndexStatistics::from))
}
@@ -480,32 +480,22 @@ pub struct IndexStatistics {
/// The number of rows not indexed
pub num_unindexed_rows: f64,
/// The type of the index
pub index_type: Option<String>,
/// The metadata for each index
pub indices: Vec<IndexMetadata>,
pub index_type: String,
/// The type of the distance function used by the index. This is only
/// present for vector indices. Scalar and full text search indices do
/// not have a distance function.
pub distance_type: Option<String>,
/// The number of parts this index is split into.
pub num_indices: Option<u32>,
}
impl From<lancedb::index::IndexStatistics> for IndexStatistics {
fn from(value: lancedb::index::IndexStatistics) -> Self {
Self {
num_indexed_rows: value.num_indexed_rows as f64,
num_unindexed_rows: value.num_unindexed_rows as f64,
index_type: value.index_type.map(|t| format!("{:?}", t)),
indices: value.indices.into_iter().map(Into::into).collect(),
}
}
}
#[napi(object)]
pub struct IndexMetadata {
pub metric_type: Option<String>,
pub index_type: Option<String>,
}
impl From<lancedb::index::IndexMetadata> for IndexMetadata {
fn from(value: lancedb::index::IndexMetadata) -> Self {
Self {
metric_type: value.metric_type,
index_type: value.index_type,
index_type: value.index_type.to_string(),
distance_type: value.distance_type.map(|d| d.to_string()),
num_indices: value.num_indices,
}
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.14.0-beta.0"
current_version = "0.14.1-beta.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.14.0-beta.0"
version = "0.14.1-beta.1"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true
@@ -22,8 +22,6 @@ pyo3 = { version = "0.21", features = ["extension-module", "abi3-py38", "gil-ref
# pyo3-asyncio = { version = "0.20", features = ["attributes", "tokio-runtime"] }
pyo3-asyncio-0-21 = { version = "0.21.0", features = ["attributes", "tokio-runtime"] }
# Prevent dynamic linking of lzma, which comes from datafusion
lzma-sys = { version = "*", features = ["static"] }
pin-project = "1.1.5"
futures.workspace = true
tokio = { version = "1.36.0", features = ["sync"] }
@@ -35,4 +33,6 @@ pyo3-build-config = { version = "0.20.3", features = [
] }
[features]
default = ["remote"]
fp16kernels = ["lancedb/fp16kernels"]
remote = ["lancedb/remote"]

View File

@@ -3,9 +3,8 @@ name = "lancedb"
# version in Cargo.toml
dependencies = [
"deprecation",
"pylance==0.18.0",
"pylance==0.18.3-beta.2",
"requests>=2.31.0",
"retry>=0.9.2",
"tqdm>=4.27.0",
"pydantic>=1.10",
"attrs>=21.3.0",

View File

@@ -19,6 +19,8 @@ from typing import Dict, Optional, Union, Any
__version__ = importlib.metadata.version("lancedb")
from lancedb.remote import ClientConfig
from ._lancedb import connect as lancedb_connect
from .common import URI, sanitize_uri
from .db import AsyncConnection, DBConnection, LanceDBConnection
@@ -120,7 +122,7 @@ async def connect_async(
region: str = "us-east-1",
host_override: Optional[str] = None,
read_consistency_interval: Optional[timedelta] = None,
request_thread_pool: Optional[Union[int, ThreadPoolExecutor]] = None,
client_config: Optional[Union[ClientConfig, Dict[str, Any]]] = None,
storage_options: Optional[Dict[str, str]] = None,
) -> AsyncConnection:
"""Connect to a LanceDB database.
@@ -148,6 +150,10 @@ async def connect_async(
the last check, then the table will be checked for updates. Note: this
consistency only applies to read operations. Write operations are
always consistent.
client_config: ClientConfig or dict, optional
Configuration options for the LanceDB Cloud HTTP client. If a dict, then
the keys are the attributes of the ClientConfig class. If None, then the
default configuration is used.
storage_options: dict, optional
Additional options for the storage backend. See available options at
https://lancedb.github.io/lancedb/guides/storage/
@@ -160,7 +166,13 @@ async def connect_async(
... # For a local directory, provide a path to the database
... db = await lancedb.connect_async("~/.lancedb")
... # For object storage, use a URI prefix
... db = await lancedb.connect_async("s3://my-bucket/lancedb")
... db = await lancedb.connect_async("s3://my-bucket/lancedb",
... storage_options={
... "aws_access_key_id": "***"})
... # Connect to LanceDB cloud
... db = await lancedb.connect_async("db://my_database", api_key="ldb_...",
... client_config={
... "retry_config": {"retries": 5}})
Returns
-------
@@ -172,6 +184,9 @@ async def connect_async(
else:
read_consistency_interval_secs = None
if isinstance(client_config, dict):
client_config = ClientConfig(**client_config)
return AsyncConnection(
await lancedb_connect(
sanitize_uri(uri),
@@ -179,6 +194,7 @@ async def connect_async(
region,
host_override,
read_consistency_interval_secs,
client_config,
storage_options,
)
)

View File

@@ -20,7 +20,7 @@ from .util import safe_import_pandas
pd = safe_import_pandas()
DATA = Union[List[dict], dict, "pd.DataFrame", pa.Table, Iterable[pa.RecordBatch]]
DATA = Union[List[dict], "pd.DataFrame", pa.Table, Iterable[pa.RecordBatch]]
VEC = Union[list, np.ndarray, pa.Array, pa.ChunkedArray]
URI = Union[str, Path]
VECTOR_COLUMN_NAME = "vector"

View File

@@ -96,7 +96,7 @@ class DBConnection(EnforceOverrides):
User must provide at least one of `data` or `schema`.
Acceptable types are:
- dict or list-of-dict
- list-of-dict
- pandas.DataFrame
@@ -579,7 +579,7 @@ class AsyncConnection(object):
User must provide at least one of `data` or `schema`.
Acceptable types are:
- dict or list-of-dict
- list-of-dict
- pandas.DataFrame

View File

@@ -1,15 +1,6 @@
# Copyright (c) 2023. LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from abc import ABC, abstractmethod
from typing import List, Union
@@ -34,7 +25,7 @@ class EmbeddingFunction(BaseModel, ABC):
__slots__ = ("__weakref__",) # pydantic 1.x compatibility
max_retries: int = (
7 # Setitng 0 disables retires. Maybe this should not be enabled by default,
7 # Setting 0 disables retires. Maybe this should not be enabled by default,
)
_ndims: int = PrivateAttr()
@@ -46,22 +37,37 @@ class EmbeddingFunction(BaseModel, ABC):
return cls(**kwargs)
@abstractmethod
def compute_query_embeddings(self, *args, **kwargs) -> List[np.array]:
def compute_query_embeddings(self, *args, **kwargs) -> list[Union[np.array, None]]:
"""
Compute the embeddings for a given user query
Returns
-------
A list of embeddings for each input. The embedding of each input can be None
when the embedding is not valid.
"""
pass
@abstractmethod
def compute_source_embeddings(self, *args, **kwargs) -> List[np.array]:
"""
Compute the embeddings for the source column in the database
def compute_source_embeddings(self, *args, **kwargs) -> list[Union[np.array, None]]:
"""Compute the embeddings for the source column in the database
Returns
-------
A list of embeddings for each input. The embedding of each input can be None
when the embedding is not valid.
"""
pass
def compute_query_embeddings_with_retry(self, *args, **kwargs) -> List[np.array]:
"""
Compute the embeddings for a given user query with retries
def compute_query_embeddings_with_retry(
self, *args, **kwargs
) -> list[Union[np.array, None]]:
"""Compute the embeddings for a given user query with retries
Returns
-------
A list of embeddings for each input. The embedding of each input can be None
when the embedding is not valid.
"""
return retry_with_exponential_backoff(
self.compute_query_embeddings, max_retries=self.max_retries
@@ -70,9 +76,15 @@ class EmbeddingFunction(BaseModel, ABC):
**kwargs,
)
def compute_source_embeddings_with_retry(self, *args, **kwargs) -> List[np.array]:
"""
Compute the embeddings for the source column in the database with retries
def compute_source_embeddings_with_retry(
self, *args, **kwargs
) -> list[Union[np.array, None]]:
"""Compute the embeddings for the source column in the database with retries.
Returns
-------
A list of embeddings for each input. The embedding of each input can be None
when the embedding is not valid.
"""
return retry_with_exponential_backoff(
self.compute_source_embeddings, max_retries=self.max_retries
@@ -94,8 +106,14 @@ class EmbeddingFunction(BaseModel, ABC):
from ..pydantic import PYDANTIC_VERSION
if PYDANTIC_VERSION.major < 2:
return dict(self)
return self.model_dump()
return {k: v for k, v in self.__dict__.items() if not k.startswith("_")}
return self.model_dump(
exclude={
field_name
for field_name in self.model_fields
if field_name.startswith("_")
}
)
@abstractmethod
def ndims(self):
@@ -144,18 +162,20 @@ class TextEmbeddingFunction(EmbeddingFunction):
A callable ABC for embedding functions that take text as input
"""
def compute_query_embeddings(self, query: str, *args, **kwargs) -> List[np.array]:
def compute_query_embeddings(
self, query: str, *args, **kwargs
) -> list[Union[np.array, None]]:
return self.compute_source_embeddings(query, *args, **kwargs)
def compute_source_embeddings(self, texts: TEXT, *args, **kwargs) -> List[np.array]:
def compute_source_embeddings(
self, texts: TEXT, *args, **kwargs
) -> list[Union[np.array, None]]:
texts = self.sanitize_input(texts)
return self.generate_embeddings(texts)
@abstractmethod
def generate_embeddings(
self, texts: Union[List[str], np.ndarray], *args, **kwargs
) -> List[np.array]:
"""
Generate the embeddings for the given texts
"""
) -> list[Union[np.array, None]]:
"""Generate the embeddings for the given texts"""
pass

View File

@@ -1,15 +1,6 @@
# Copyright (c) 2023. LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from functools import cached_property
from typing import TYPE_CHECKING, List, Optional, Union
@@ -19,6 +10,7 @@ from .registry import register
if TYPE_CHECKING:
import numpy as np
import ollama
@register("ollama")
@@ -39,17 +31,20 @@ class OllamaEmbeddings(TextEmbeddingFunction):
def ndims(self):
return len(self.generate_embeddings(["foo"])[0])
def _compute_embedding(self, text):
return self._ollama_client.embeddings(
model=self.name,
prompt=text,
options=self.options,
keep_alive=self.keep_alive,
)["embedding"]
def _compute_embedding(self, text) -> Union["np.array", None]:
return (
self._ollama_client.embeddings(
model=self.name,
prompt=text,
options=self.options,
keep_alive=self.keep_alive,
)["embedding"]
or None
)
def generate_embeddings(
self, texts: Union[List[str], "np.ndarray"]
) -> List["np.array"]:
) -> list[Union["np.array", None]]:
"""
Get the embeddings for the given texts
@@ -63,7 +58,7 @@ class OllamaEmbeddings(TextEmbeddingFunction):
return embeddings
@cached_property
def _ollama_client(self):
def _ollama_client(self) -> "ollama.Client":
ollama = attempt_import_or_raise("ollama")
# ToDo explore ollama.AsyncClient
return ollama.Client(host=self.host, **self.ollama_client_kwargs)

View File

@@ -1,17 +1,9 @@
# Copyright (c) 2023. LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from functools import cached_property
from typing import TYPE_CHECKING, List, Optional, Union
import logging
from ..util import attempt_import_or_raise
from .base import TextEmbeddingFunction
@@ -89,17 +81,26 @@ class OpenAIEmbeddings(TextEmbeddingFunction):
texts: list[str] or np.ndarray (of str)
The texts to embed
"""
openai = attempt_import_or_raise("openai")
# TODO retry, rate limit, token limit
if self.name == "text-embedding-ada-002":
rs = self._openai_client.embeddings.create(input=texts, model=self.name)
else:
kwargs = {
"input": texts,
"model": self.name,
}
if self.dim:
kwargs["dimensions"] = self.dim
rs = self._openai_client.embeddings.create(**kwargs)
try:
if self.name == "text-embedding-ada-002":
rs = self._openai_client.embeddings.create(input=texts, model=self.name)
else:
kwargs = {
"input": texts,
"model": self.name,
}
if self.dim:
kwargs["dimensions"] = self.dim
rs = self._openai_client.embeddings.create(**kwargs)
except openai.BadRequestError:
logging.exception("Bad request: %s", texts)
return [None] * len(texts)
except Exception:
logging.exception("OpenAI embeddings error")
raise
return [v.embedding for v in rs.data]
@cached_property

View File

@@ -40,6 +40,11 @@ class TransformersEmbeddingFunction(EmbeddingFunction):
The device to use for the model. Default is "cpu".
show_progress_bar : bool
Whether to show a progress bar when loading the model. Default is True.
trust_remote_code : bool
Whether or not to allow for custom models defined on the HuggingFace
Hub in their own modeling files. This option should only be set to True
for repositories you trust and in which you have read the code, as it
will execute code present on the Hub on your local machine.
to download package, run :
`pip install transformers`
@@ -49,6 +54,7 @@ class TransformersEmbeddingFunction(EmbeddingFunction):
name: str = "colbert-ir/colbertv2.0"
device: str = "cpu"
trust_remote_code: bool = False
_tokenizer: Any = PrivateAttr()
_model: Any = PrivateAttr()
@@ -57,7 +63,9 @@ class TransformersEmbeddingFunction(EmbeddingFunction):
self._ndims = None
transformers = attempt_import_or_raise("transformers")
self._tokenizer = transformers.AutoTokenizer.from_pretrained(self.name)
self._model = transformers.AutoModel.from_pretrained(self.name)
self._model = transformers.AutoModel.from_pretrained(
self.name, trust_remote_code=self.trust_remote_code
)
self._model.to(self.device)
if PYDANTIC_VERSION.major < 2: # Pydantic 1.x compat

View File

@@ -21,14 +21,35 @@ import time
import urllib.error
import weakref
import logging
from functools import wraps
from typing import Callable, List, Union
import numpy as np
import pyarrow as pa
from lance.vector import vec_to_table
from retry import retry
from ..util import deprecated, safe_import_pandas
# ruff: noqa: PERF203
def retry(tries=10, delay=1, max_delay=30, backoff=3, jitter=1):
def wrapper(fn):
@wraps(fn)
def wrapped(*args, **kwargs):
for i in range(tries):
try:
return fn(*args, **kwargs)
except Exception:
if i + 1 == tries:
raise
else:
sleep = min(delay * (backoff**i) + jitter, max_delay)
time.sleep(sleep)
return wrapped
return wrapper
pd = safe_import_pandas()
DATA = Union[pa.Table, "pd.DataFrame"]

View File

@@ -104,4 +104,4 @@ class LanceMergeInsertBuilder(object):
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
"""
self._table._do_merge(self, new_data, on_bad_vectors, fill_value)
return self._table._do_merge(self, new_data, on_bad_vectors, fill_value)

View File

@@ -36,6 +36,7 @@ from . import __version__
from .arrow import AsyncRecordBatchReader
from .rerankers.base import Reranker
from .rerankers.rrf import RRFReranker
from .rerankers.util import check_reranker_result
from .util import safe_import_pandas
if TYPE_CHECKING:
@@ -87,6 +88,11 @@ class Query(pydantic.BaseModel):
tuning advice.
offset: int
The offset to start fetching results from
fast_search: bool
Skip a flat search of unindexed data. This will improve
search performance but search results will not include unindexed data.
- *default False*.
"""
vector_column: Optional[str] = None
@@ -123,6 +129,8 @@ class Query(pydantic.BaseModel):
offset: int = 0
fast_search: bool = False
class LanceQueryBuilder(ABC):
"""An abstract query builder. Subclasses are defined for vector search,
@@ -138,6 +146,7 @@ class LanceQueryBuilder(ABC):
vector_column_name: str,
ordering_field_name: Optional[str] = None,
fts_columns: Union[str, List[str]] = [],
fast_search: bool = False,
) -> LanceQueryBuilder:
"""
Create a query builder based on the given query and query type.
@@ -154,6 +163,8 @@ class LanceQueryBuilder(ABC):
If "auto", the query type is inferred based on the query.
vector_column_name: str
The name of the vector column to use for vector search.
fast_search: bool
Skip flat search of unindexed data.
"""
# Check hybrid search first as it supports empty query pattern
if query_type == "hybrid":
@@ -195,7 +206,9 @@ class LanceQueryBuilder(ABC):
else:
raise TypeError(f"Unsupported query type: {type(query)}")
return LanceVectorQueryBuilder(table, query, vector_column_name, str_query)
return LanceVectorQueryBuilder(
table, query, vector_column_name, str_query, fast_search
)
@classmethod
def _resolve_query(cls, table, query, query_type, vector_column_name):
@@ -564,6 +577,7 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
query: Union[np.ndarray, list, "PIL.Image.Image"],
vector_column: str,
str_query: Optional[str] = None,
fast_search: bool = False,
):
super().__init__(table)
self._query = query
@@ -574,13 +588,14 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
self._prefilter = False
self._reranker = None
self._str_query = str_query
self._fast_search = fast_search
def metric(self, metric: Literal["L2", "cosine"]) -> LanceVectorQueryBuilder:
def metric(self, metric: Literal["L2", "cosine", "dot"]) -> LanceVectorQueryBuilder:
"""Set the distance metric to use.
Parameters
----------
metric: "L2" or "cosine"
metric: "L2" or "cosine" or "dot"
The distance metric to use. By default "L2" is used.
Returns
@@ -588,7 +603,7 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
LanceVectorQueryBuilder
The LanceQueryBuilder object.
"""
self._metric = metric
self._metric = metric.lower()
return self
def nprobes(self, nprobes: int) -> LanceVectorQueryBuilder:
@@ -674,11 +689,13 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
vector_column=self._vector_column,
with_row_id=self._with_row_id,
offset=self._offset,
fast_search=self._fast_search,
)
result_set = self._table._execute_query(query, batch_size)
if self._reranker is not None:
rs_table = result_set.read_all()
result_set = self._reranker.rerank_vector(self._str_query, rs_table)
check_reranker_result(result_set)
# convert result_set back to RecordBatchReader
result_set = pa.RecordBatchReader.from_batches(
result_set.schema, result_set.to_batches()
@@ -811,6 +828,7 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
results = results.read_all()
if self._reranker is not None:
results = self._reranker.rerank_fts(self._query, results)
check_reranker_result(results)
return results
def tantivy_to_arrow(self) -> pa.Table:
@@ -953,8 +971,8 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
def __init__(
self,
table: "Table",
query: str = None,
vector_column: str = None,
query: Optional[str] = None,
vector_column: Optional[str] = None,
fts_columns: Union[str, List[str]] = [],
):
super().__init__(table)
@@ -1060,10 +1078,7 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
self._fts_query._query, vector_results, fts_results
)
if not isinstance(results, pa.Table): # Enforce type
raise TypeError(
f"rerank_hybrid must return a pyarrow.Table, got {type(results)}"
)
check_reranker_result(results)
# apply limit after reranking
results = results.slice(length=self._limit)
@@ -1112,8 +1127,8 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
def rerank(
self,
normalize="score",
reranker: Reranker = RRFReranker(),
normalize: str = "score",
) -> LanceHybridQueryBuilder:
"""
Rerank the hybrid search results using the specified reranker. The reranker
@@ -1121,12 +1136,12 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
Parameters
----------
reranker: Reranker, default RRFReranker()
The reranker to use. Must be an instance of Reranker class.
normalize: str, default "score"
The method to normalize the scores. Can be "rank" or "score". If "rank",
the scores are converted to ranks and then normalized. If "score", the
scores are normalized directly.
reranker: Reranker, default RRFReranker()
The reranker to use. Must be an instance of Reranker class.
Returns
-------
LanceHybridQueryBuilder

View File

@@ -12,9 +12,12 @@
# limitations under the License.
import abc
from dataclasses import dataclass
from datetime import timedelta
from typing import List, Optional
import attrs
from lancedb import __version__
import pyarrow as pa
from pydantic import BaseModel
@@ -47,6 +50,8 @@ class VectorQuery(BaseModel):
vector_column: str = VECTOR_COLUMN_NAME
fast_search: bool = False
@attrs.define
class VectorQueryResult:
@@ -62,3 +67,109 @@ class LanceDBClient(abc.ABC):
def query(self, table_name: str, query: VectorQuery) -> VectorQueryResult:
"""Query the LanceDB server for the given table and query."""
pass
@dataclass
class TimeoutConfig:
"""Timeout configuration for remote HTTP client.
Attributes
----------
connect_timeout: Optional[timedelta]
The timeout for establishing a connection. Default is 120 seconds (2 minutes).
This can also be set via the environment variable
`LANCE_CLIENT_CONNECT_TIMEOUT`, as an integer number of seconds.
read_timeout: Optional[timedelta]
The timeout for reading data from the server. Default is 300 seconds
(5 minutes). This can also be set via the environment variable
`LANCE_CLIENT_READ_TIMEOUT`, as an integer number of seconds.
pool_idle_timeout: Optional[timedelta]
The timeout for keeping idle connections in the connection pool. Default
is 300 seconds (5 minutes). This can also be set via the environment variable
`LANCE_CLIENT_CONNECTION_TIMEOUT`, as an integer number of seconds.
"""
connect_timeout: Optional[timedelta] = None
read_timeout: Optional[timedelta] = None
pool_idle_timeout: Optional[timedelta] = None
@staticmethod
def __to_timedelta(value) -> Optional[timedelta]:
if value is None:
return None
elif isinstance(value, timedelta):
return value
elif isinstance(value, (int, float)):
return timedelta(seconds=value)
else:
raise ValueError(
f"Invalid value for timeout: {value}, must be a timedelta "
"or number of seconds"
)
def __post_init__(self):
self.connect_timeout = self.__to_timedelta(self.connect_timeout)
self.read_timeout = self.__to_timedelta(self.read_timeout)
self.pool_idle_timeout = self.__to_timedelta(self.pool_idle_timeout)
@dataclass
class RetryConfig:
"""Retry configuration for the remote HTTP client.
Attributes
----------
retries: Optional[int]
The maximum number of retries for a request. Default is 3. You can also set this
via the environment variable `LANCE_CLIENT_MAX_RETRIES`.
connect_retries: Optional[int]
The maximum number of retries for connection errors. Default is 3. You can also
set this via the environment variable `LANCE_CLIENT_CONNECT_RETRIES`.
read_retries: Optional[int]
The maximum number of retries for read errors. Default is 3. You can also set
this via the environment variable `LANCE_CLIENT_READ_RETRIES`.
backoff_factor: Optional[float]
The backoff factor to apply between retries. Default is 0.25. Between each retry
the client will wait for the amount of seconds:
`{backoff factor} * (2 ** ({number of previous retries}))`. So for the default
of 0.25, the first retry will wait 0.25 seconds, the second retry will wait 0.5
seconds, the third retry will wait 1 second, etc.
You can also set this via the environment variable
`LANCE_CLIENT_RETRY_BACKOFF_FACTOR`.
backoff_jitter: Optional[float]
The jitter to apply to the backoff factor, in seconds. Default is 0.25.
A random value between 0 and `backoff_jitter` will be added to the backoff
factor in seconds. So for the default of 0.25 seconds, between 0 and 250
milliseconds will be added to the sleep between each retry.
You can also set this via the environment variable
`LANCE_CLIENT_RETRY_BACKOFF_JITTER`.
statuses: Optional[List[int]
The HTTP status codes for which to retry the request. Default is
[429, 500, 502, 503].
You can also set this via the environment variable
`LANCE_CLIENT_RETRY_STATUSES`. Use a comma-separated list of integers.
"""
retries: Optional[int] = None
connect_retries: Optional[int] = None
read_retries: Optional[int] = None
backoff_factor: Optional[float] = None
backoff_jitter: Optional[float] = None
statuses: Optional[List[int]] = None
@dataclass
class ClientConfig:
user_agent: str = f"LanceDB-Python-Client/{__version__}"
retry_config: Optional[RetryConfig] = None
timeout_config: Optional[TimeoutConfig] = None
def __post_init__(self):
if isinstance(self.retry_config, dict):
self.retry_config = RetryConfig(**self.retry_config)
if isinstance(self.timeout_config, dict):
self.timeout_config = TimeoutConfig(**self.timeout_config)

View File

@@ -103,19 +103,29 @@ class RestfulLanceDBClient:
@staticmethod
def _check_status(resp: requests.Response):
# Leaving request id empty for now, as we'll be replacing this impl
# with the Rust one shortly.
if resp.status_code == 404:
raise LanceDBClientError(f"Not found: {resp.text}")
raise LanceDBClientError(
f"Not found: {resp.text}", request_id="", status_code=404
)
elif 400 <= resp.status_code < 500:
raise LanceDBClientError(
f"Bad Request: {resp.status_code}, error: {resp.text}"
f"Bad Request: {resp.status_code}, error: {resp.text}",
request_id="",
status_code=resp.status_code,
)
elif 500 <= resp.status_code < 600:
raise LanceDBClientError(
f"Internal Server Error: {resp.status_code}, error: {resp.text}"
f"Internal Server Error: {resp.status_code}, error: {resp.text}",
request_id="",
status_code=resp.status_code,
)
elif resp.status_code != 200:
raise LanceDBClientError(
f"Unknown Error: {resp.status_code}, error: {resp.text}"
f"Unknown Error: {resp.status_code}, error: {resp.text}",
request_id="",
status_code=resp.status_code,
)
@_check_not_closed

View File

@@ -12,5 +12,102 @@
# limitations under the License.
from typing import Optional
class LanceDBClientError(RuntimeError):
"""An error that occurred in the LanceDB client.
Attributes
----------
message: str
The error message.
request_id: str
The id of the request that failed. This can be provided in error reports
to help diagnose the issue.
status_code: int
The HTTP status code of the response. May be None if the request
failed before the response was received.
"""
def __init__(
self, message: str, request_id: str, status_code: Optional[int] = None
):
super().__init__(message)
self.request_id = request_id
self.status_code = status_code
class HttpError(LanceDBClientError):
"""An error that occurred during an HTTP request.
Attributes
----------
message: str
The error message.
request_id: str
The id of the request that failed. This can be provided in error reports
to help diagnose the issue.
status_code: int
The HTTP status code of the response. May be None if the request
failed before the response was received.
"""
pass
class RetryError(LanceDBClientError):
"""An error that occurs when the client has exceeded the maximum number of retries.
The retry strategy can be adjusted by setting the
[retry_config](lancedb.remote.ClientConfig.retry_config) in the client
configuration. This is passed in the `client_config` argument of
[connect](lancedb.connect) and [connect_async](lancedb.connect_async).
The __cause__ attribute of this exception will be the last exception that
caused the retry to fail. It will be an
[HttpError][lancedb.remote.errors.HttpError] instance.
Attributes
----------
message: str
The retry error message, which will describe which retry limit was hit.
request_id: str
The id of the request that failed. This can be provided in error reports
to help diagnose the issue.
request_failures: int
The number of request failures.
connect_failures: int
The number of connect failures.
read_failures: int
The number of read failures.
max_request_failures: int
The maximum number of request failures.
max_connect_failures: int
The maximum number of connect failures.
max_read_failures: int
The maximum number of read failures.
status_code: int
The HTTP status code of the last response. May be None if the request
failed before the response was received.
"""
def __init__(
self,
message: str,
request_id: str,
request_failures: int,
connect_failures: int,
read_failures: int,
max_request_failures: int,
max_connect_failures: int,
max_read_failures: int,
status_code: Optional[int],
):
super().__init__(message, request_id, status_code)
self.request_failures = request_failures
self.connect_failures = connect_failures
self.read_failures = read_failures
self.max_request_failures = max_request_failures
self.max_connect_failures = max_connect_failures
self.max_read_failures = max_read_failures

View File

@@ -26,7 +26,7 @@ from lancedb.embeddings import EmbeddingFunctionRegistry
from ..query import LanceVectorQueryBuilder, LanceQueryBuilder
from ..table import Query, Table, _sanitize_data
from ..util import inf_vector_column_query, value_to_sql
from ..util import value_to_sql, infer_vector_column_name
from .arrow import to_ipc_binary
from .client import ARROW_STREAM_CONTENT_TYPE
from .db import RemoteDBConnection
@@ -266,10 +266,11 @@ class RemoteTable(Table):
def search(
self,
query: Union[VEC, str],
query: Union[VEC, str] = None,
vector_column_name: Optional[str] = None,
query_type="auto",
fts_columns: Optional[Union[str, List[str]]] = None,
fast_search: bool = False,
) -> LanceVectorQueryBuilder:
"""Create a search query to find the nearest neighbors
of the given query vector. We currently support [vector search][search]
@@ -305,8 +306,6 @@ class RemoteTable(Table):
- *default None*.
Acceptable types are: list, np.ndarray, PIL.Image.Image
- If None then the select/where/limit clauses are applied to filter
the table
vector_column_name: str, optional
The name of the vector column to search.
@@ -316,6 +315,12 @@ class RemoteTable(Table):
- If the table has multiple vector columns then the *vector_column_name*
needs to be specified. Otherwise, an error is raised.
fast_search: bool, optional
Skip a flat search of unindexed data. This may improve
search performance but search results will not include unindexed data.
- *default False*.
Returns
-------
LanceQueryBuilder
@@ -329,11 +334,15 @@ class RemoteTable(Table):
- and also the "_distance" column which is the distance between the query
vector and the returned vector.
"""
if vector_column_name is None and query is not None and query_type != "fts":
try:
vector_column_name = inf_vector_column_query(self.schema)
except Exception as e:
raise e
# empty query builder is not supported in saas, raise error
if query is None and query_type != "hybrid":
raise ValueError("Empty query is not supported")
vector_column_name = infer_vector_column_name(
schema=self.schema,
query_type=query_type,
query=query,
vector_column_name=vector_column_name,
)
return LanceQueryBuilder.create(
self,
@@ -341,6 +350,7 @@ class RemoteTable(Table):
query_type,
vector_column_name=vector_column_name,
fts_columns=fts_columns,
fast_search=fast_search,
)
def _execute_query(

View File

@@ -105,7 +105,7 @@ class Reranker(ABC):
query: str,
vector_results: pa.Table,
fts_results: pa.Table,
):
) -> pa.Table:
"""
Rerank function receives the individual results from the vector and FTS search
results. You can choose to use any of the results to generate the final results,

View File

@@ -11,6 +11,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from numpy import NaN
import pyarrow as pa
from .base import Reranker
@@ -58,14 +59,42 @@ class LinearCombinationReranker(Reranker):
def merge_results(
self, vector_results: pa.Table, fts_results: pa.Table, fill: float
):
# If both are empty then just return an empty table
if len(vector_results) == 0 and len(fts_results) == 0:
return vector_results
# If one is empty then return the other
# If one is empty then return the other and add _relevance_score
# column equal the existing vector or fts score
if len(vector_results) == 0:
return fts_results
results = fts_results.append_column(
"_relevance_score",
pa.array(fts_results["_score"], type=pa.float32()),
)
if self.score == "relevance":
results = self._keep_relevance_score(results)
elif self.score == "all":
results = results.append_column(
"_distance",
pa.array([NaN] * len(fts_results), type=pa.float32()),
)
return results
if len(fts_results) == 0:
return vector_results
# invert the distance to relevance score
results = vector_results.append_column(
"_relevance_score",
pa.array(
[
self._invert_score(distance)
for distance in vector_results["_distance"].to_pylist()
],
type=pa.float32(),
),
)
if self.score == "relevance":
results = self._keep_relevance_score(results)
elif self.score == "all":
results = results.append_column(
"_score",
pa.array([NaN] * len(vector_results), type=pa.float32()),
)
return results
# sort both input tables on _rowid
combined_list = []

View File

@@ -0,0 +1,19 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The Lance Authors
import pyarrow as pa
def check_reranker_result(result):
if not isinstance(result, pa.Table): # Enforce type
raise TypeError(
f"rerank_hybrid must return a pyarrow.Table, got {type(result)}"
)
# Enforce that `_relevance_score` column is present in the result of every
# rerank_hybrid method
if "_relevance_score" not in result.column_names:
raise ValueError(
"rerank_hybrid must return a pyarrow.Table with a column"
"named `_relevance_score`"
)

View File

@@ -31,7 +31,6 @@ import pyarrow.compute as pc
import pyarrow.fs as pa_fs
from lance import LanceDataset
from lance.dependencies import _check_for_hugging_face
from lance.vector import vec_to_table
from .common import DATA, VEC, VECTOR_COLUMN_NAME
from .embeddings import EmbeddingFunctionConfig, EmbeddingFunctionRegistry
@@ -50,7 +49,7 @@ from .query import (
from .util import (
fs_from_uri,
get_uri_scheme,
inf_vector_column_query,
infer_vector_column_name,
join_uri,
safe_import_pandas,
safe_import_polars,
@@ -87,6 +86,9 @@ def _coerce_to_table(data, schema: Optional[pa.Schema] = None) -> pa.Table:
if isinstance(data, LanceModel):
raise ValueError("Cannot add a single LanceModel to a table. Use a list.")
if isinstance(data, dict):
raise ValueError("Cannot add a single dictionary to a table. Use a list.")
if isinstance(data, list):
# convert to list of dict if data is a bunch of LanceModels
if isinstance(data[0], LanceModel):
@@ -98,8 +100,6 @@ def _coerce_to_table(data, schema: Optional[pa.Schema] = None) -> pa.Table:
return pa.Table.from_batches(data, schema=schema)
else:
return pa.Table.from_pylist(data)
elif isinstance(data, dict):
return vec_to_table(data)
elif _check_for_pandas(data) and isinstance(data, pd.DataFrame):
# Do not add schema here, since schema may contains the vector column
table = pa.Table.from_pandas(data, preserve_index=False)
@@ -554,7 +554,7 @@ class Table(ABC):
data: DATA
The data to insert into the table. Acceptable types are:
- dict or list-of-dict
- list-of-dict
- pandas.DataFrame
@@ -1409,7 +1409,7 @@ class LanceTable(Table):
Parameters
----------
data: list-of-dict, dict, pd.DataFrame
data: list-of-dict, pd.DataFrame
The data to insert into the table.
mode: str
The mode to use when writing the data. Valid values are
@@ -1630,11 +1630,12 @@ class LanceTable(Table):
and also the "_distance" column which is the distance between the query
vector and the returned vector.
"""
if vector_column_name is None and query is not None and query_type != "fts":
try:
vector_column_name = inf_vector_column_query(self.schema)
except Exception as e:
raise e
vector_column_name = infer_vector_column_name(
schema=self.schema,
query_type=query_type,
query=query,
vector_column_name=vector_column_name,
)
return LanceQueryBuilder.create(
self,
@@ -1998,22 +1999,26 @@ def _sanitize_vector_column(
data, fill_value, on_bad_vectors, vec_arr, vector_column_name
)
vec_arr = data[vector_column_name].combine_chunks()
vec_arr = ensure_fixed_size_list(vec_arr)
data = data.set_column(
data.column_names.index(vector_column_name), vector_column_name, vec_arr
)
elif not pa.types.is_fixed_size_list(vec_arr.type):
raise TypeError(f"Unsupported vector column type: {vec_arr.type}")
vec_arr = ensure_fixed_size_list(vec_arr)
data = data.set_column(
data.column_names.index(vector_column_name), vector_column_name, vec_arr
)
# Use numpy to check for NaNs, because as pyarrow 14.0.2 does not have `is_nan`
# kernel over f16 types.
values_np = vec_arr.values.to_numpy(zero_copy_only=False)
if np.isnan(values_np).any():
data = _sanitize_nans(
data, fill_value, on_bad_vectors, vec_arr, vector_column_name
)
if pa.types.is_float16(vec_arr.values.type):
# Use numpy to check for NaNs, because as pyarrow does not have `is_nan`
# kernel over f16 types yet.
values_np = vec_arr.values.to_numpy(zero_copy_only=True)
if np.isnan(values_np).any():
data = _sanitize_nans(
data, fill_value, on_bad_vectors, vec_arr, vector_column_name
)
else:
if pc.any(pc.is_null(vec_arr.values, nan_is_null=True)).as_py():
data = _sanitize_nans(
data, fill_value, on_bad_vectors, vec_arr, vector_column_name
)
return data
@@ -2057,8 +2062,15 @@ def _sanitize_jagged(data, fill_value, on_bad_vectors, vec_arr, vector_column_na
return data
def _sanitize_nans(data, fill_value, on_bad_vectors, vec_arr, vector_column_name):
def _sanitize_nans(
data,
fill_value,
on_bad_vectors,
vec_arr: pa.FixedSizeListArray,
vector_column_name: str,
):
"""Sanitize NaNs in vectors"""
assert pa.types.is_fixed_size_list(vec_arr.type)
if on_bad_vectors == "error":
raise ValueError(
f"Vector column {vector_column_name} has NaNs. "
@@ -2078,9 +2090,11 @@ def _sanitize_nans(data, fill_value, on_bad_vectors, vec_arr, vector_column_name
data.column_names.index(vector_column_name), vector_column_name, vec_arr
)
elif on_bad_vectors == "drop":
is_value_nan = pc.is_nan(vec_arr.values).to_numpy(zero_copy_only=False)
is_full = np.any(~is_value_nan.reshape(-1, vec_arr.type.list_size), axis=1)
data = data.filter(is_full)
# Drop is very slow to be able to filter out NaNs in a fixed size list array
np_arr = np.isnan(vec_arr.values.to_numpy(zero_copy_only=False))
np_arr = np_arr.reshape(-1, vec_arr.type.list_size)
not_nulls = np.any(np_arr, axis=1)
data = data.filter(~not_nulls)
return data
@@ -2334,7 +2348,7 @@ class AsyncTable:
data: DATA
The data to insert into the table. Acceptable types are:
- dict or list-of-dict
- list-of-dict
- pandas.DataFrame
@@ -2450,7 +2464,31 @@ class AsyncTable:
on_bad_vectors: str,
fill_value: float,
):
pass
schema = await self.schema()
if on_bad_vectors is None:
on_bad_vectors = "error"
if fill_value is None:
fill_value = 0.0
data, _ = _sanitize_data(
new_data,
schema,
metadata=schema.metadata,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
)
if isinstance(data, pa.Table):
data = pa.RecordBatchReader.from_batches(data.schema, data.to_batches())
await self._inner.execute_merge_insert(
data,
dict(
on=merge._on,
when_matched_update_all=merge._when_matched_update_all,
when_matched_update_all_condition=merge._when_matched_update_all_condition,
when_not_matched_insert_all=merge._when_not_matched_insert_all,
when_not_matched_by_source_delete=merge._when_not_matched_by_source_delete,
when_not_matched_by_source_condition=merge._when_not_matched_by_source_condition,
),
)
async def delete(self, where: str):
"""Delete rows from the table.
@@ -2669,6 +2707,26 @@ class AsyncTable:
"""
return await self._inner.list_indices()
async def index_stats(self, index_name: str) -> Optional[IndexStatistics]:
"""
Retrieve statistics about an index
Parameters
----------
index_name: str
The name of the index to retrieve statistics for
Returns
-------
IndexStatistics or None
The statistics about the index. Returns None if the index does not exist.
"""
stats = await self._inner.index_stats(index_name)
if stats is None:
return None
else:
return IndexStatistics(**stats)
async def uses_v2_manifest_paths(self) -> bool:
"""
Check if the table is using the new v2 manifest paths.
@@ -2699,3 +2757,31 @@ class AsyncTable:
to check if the table is already using the new path style.
"""
await self._inner.migrate_manifest_paths_v2()
@dataclass
class IndexStatistics:
"""
Statistics about an index.
Attributes
----------
num_indexed_rows: int
The number of rows that are covered by this index.
num_unindexed_rows: int
The number of rows that are not covered by this index.
index_type: str
The type of index that was created.
distance_type: Optional[str]
The distance type used by the index.
num_indices: Optional[int]
The number of parts the index is split into.
"""
num_indexed_rows: int
num_unindexed_rows: int
index_type: Literal[
"IVF_PQ", "IVF_HNSW_PQ", "IVF_HNSW_SQ", "FTS", "BTREE", "BITMAP", "LABEL_LIST"
]
distance_type: Optional[Literal["l2", "cosine", "dot"]] = None
num_indices: Optional[int] = None

View File

@@ -9,7 +9,7 @@ import pathlib
import warnings
from datetime import date, datetime
from functools import singledispatch
from typing import Tuple, Union
from typing import Tuple, Union, Optional, Any
from urllib.parse import urlparse
import numpy as np
@@ -212,6 +212,23 @@ def inf_vector_column_query(schema: pa.Schema) -> str:
return vector_col_name
def infer_vector_column_name(
schema: pa.Schema,
query_type: str,
query: Optional[Any], # inferred later in query builder
vector_column_name: Optional[str],
):
if (vector_column_name is None and query is not None and query_type != "fts") or (
vector_column_name is None and query_type == "hybrid"
):
try:
vector_column_name = inf_vector_column_query(schema)
except Exception as e:
raise e
return vector_column_name
@singledispatch
def value_to_sql(value):
raise NotImplementedError("SQL conversion is not implemented for this type")

View File

@@ -354,7 +354,7 @@ async def test_create_mode_async(tmp_path):
)
await db.create_table("test", data=data)
with pytest.raises(RuntimeError):
with pytest.raises(ValueError, match="already exists"):
await db.create_table("test", data=data)
new_data = pd.DataFrame(
@@ -382,7 +382,7 @@ async def test_create_exist_ok_async(tmp_path):
)
tbl = await db.create_table("test", data=data)
with pytest.raises(RuntimeError):
with pytest.raises(ValueError, match="already exists"):
await db.create_table("test", data=data)
# open the table but don't add more rows

View File

@@ -11,6 +11,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Union
from unittest.mock import MagicMock, patch
import lance
import lancedb
@@ -25,6 +26,7 @@ from lancedb.embeddings import (
)
from lancedb.embeddings.base import TextEmbeddingFunction
from lancedb.embeddings.registry import get_registry, register
from lancedb.embeddings.utils import retry
from lancedb.pydantic import LanceModel, Vector
@@ -86,6 +88,47 @@ def test_embedding_function(tmp_path):
assert np.allclose(actual, expected)
def test_embedding_with_bad_results(tmp_path):
@register("mock-embedding")
class MockEmbeddingFunction(TextEmbeddingFunction):
def ndims(self):
return 128
def generate_embeddings(
self, texts: Union[List[str], np.ndarray]
) -> list[Union[np.array, None]]:
return [
None if i % 2 == 0 else np.random.randn(self.ndims())
for i in range(len(texts))
]
db = lancedb.connect(tmp_path)
registry = EmbeddingFunctionRegistry.get_instance()
model = registry.get("mock-embedding").create()
class Schema(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
table = db.create_table("test", schema=Schema, mode="overwrite")
table.add(
[{"text": "hello world"}, {"text": "bar"}],
on_bad_vectors="drop",
)
df = table.to_pandas()
assert len(table) == 1
assert df.iloc[0]["text"] == "bar"
# table = db.create_table("test2", schema=Schema, mode="overwrite")
# table.add(
# [{"text": "hello world"}, {"text": "bar"}],
# )
# assert len(table) == 2
# tbl = table.to_arrow()
# assert tbl["vector"].null_count == 1
@pytest.mark.slow
def test_embedding_function_rate_limit(tmp_path):
def _get_schema_from_model(model):
@@ -142,3 +185,54 @@ def test_add_optional_vector(tmp_path):
expected = LanceSchema(id="id", text="text")
tbl.add([expected])
assert not (np.abs(tbl.to_pandas()["vector"][0]) < 1e-6).all()
@pytest.mark.parametrize(
"embedding_type",
[
"openai",
"sentence-transformers",
"huggingface",
"ollama",
"cohere",
"instructor",
],
)
def test_embedding_function_safe_model_dump(embedding_type):
registry = get_registry()
# Note: Some embedding types might require specific parameters
try:
model = registry.get(embedding_type).create()
except Exception as e:
pytest.skip(f"Skipping {embedding_type} due to error: {str(e)}")
dumped_model = model.safe_model_dump()
assert all(
not k.startswith("_") for k in dumped_model.keys()
), f"{embedding_type}: Dumped model contains keys starting with underscore"
assert (
"max_retries" in dumped_model
), f"{embedding_type}: Essential field 'max_retries' is missing from dumped model"
assert isinstance(
dumped_model, dict
), f"{embedding_type}: Dumped model is not a dictionary"
for key in model.__dict__:
if key.startswith("_"):
assert key not in dumped_model, (
f"{embedding_type}: Private attribute '{key}' "
f"is present in dumped model"
)
@patch("time.sleep")
def test_retry(mock_sleep):
test_function = MagicMock(side_effect=[Exception] * 9 + ["result"])
test_function = retry()(test_function)
result = test_function()
assert mock_sleep.call_count == 9
assert result == "result"

View File

@@ -442,3 +442,42 @@ def test_watsonx_embedding(tmp_path):
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == model.ndims()
assert tbl.search("hello").limit(1).to_pandas()["text"][0] == "hello world"
@pytest.mark.slow
@pytest.mark.skipif(
importlib.util.find_spec("ollama") is None, reason="Ollama not installed"
)
def test_ollama_embedding(tmp_path):
model = get_registry().get("ollama").create(max_retries=0)
class TextModel(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect(tmp_path)
tbl = db.create_table("test", schema=TextModel, mode="overwrite")
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == model.ndims()
result = tbl.search("hello").limit(1).to_pandas()
assert result["text"][0] == "hello world"
# Test safe_model_dump
dumped_model = model.safe_model_dump()
assert isinstance(dumped_model, dict)
assert "name" in dumped_model
assert "max_retries" in dumped_model
assert dumped_model["max_retries"] == 0
assert all(not k.startswith("_") for k in dumped_model.keys())
# Test serialization of the dumped model
import json
try:
json.dumps(dumped_model)
except TypeError:
pytest.fail("Failed to JSON serialize the dumped model")

View File

@@ -63,17 +63,24 @@ async def test_create_scalar_index(some_table: AsyncTable):
@pytest.mark.asyncio
async def test_create_bitmap_index(some_table: AsyncTable):
await some_table.create_index("id", config=Bitmap())
# TODO: Fix via https://github.com/lancedb/lance/issues/2039
# indices = await some_table.list_indices()
# assert str(indices) == '[Index(Bitmap, columns=["id"])]'
indices = await some_table.list_indices()
assert str(indices) == '[Index(Bitmap, columns=["id"])]'
indices = await some_table.list_indices()
assert len(indices) == 1
index_name = indices[0].name
stats = await some_table.index_stats(index_name)
assert stats.index_type == "BITMAP"
assert stats.distance_type is None
assert stats.num_indexed_rows == await some_table.count_rows()
assert stats.num_unindexed_rows == 0
assert stats.num_indices == 1
@pytest.mark.asyncio
async def test_create_label_list_index(some_table: AsyncTable):
await some_table.create_index("tags", config=LabelList())
# TODO: Fix via https://github.com/lancedb/lance/issues/2039
# indices = await some_table.list_indices()
# assert str(indices) == '[Index(LabelList, columns=["id"])]'
indices = await some_table.list_indices()
assert str(indices) == '[Index(LabelList, columns=["tags"])]'
@pytest.mark.asyncio
@@ -91,6 +98,14 @@ async def test_create_vector_index(some_table: AsyncTable):
assert len(indices) == 1
assert indices[0].index_type == "IvfPq"
assert indices[0].columns == ["vector"]
assert indices[0].name == "vector_idx"
stats = await some_table.index_stats("vector_idx")
assert stats.index_type == "IVF_PQ"
assert stats.distance_type == "l2"
assert stats.num_indexed_rows == await some_table.count_rows()
assert stats.num_unindexed_rows == 0
assert stats.num_indices == 1
@pytest.mark.asyncio

View File

@@ -1,11 +1,17 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
import contextlib
import http.server
import threading
from unittest.mock import MagicMock
import uuid
import lancedb
from lancedb.remote.errors import HttpError, RetryError
import pyarrow as pa
from lancedb.remote.client import VectorQuery, VectorQueryResult
import pytest
class FakeLanceDBClient:
@@ -81,3 +87,106 @@ def test_create_table_with_recordbatches():
table = conn.create_table("test", [batch], schema=batch.schema)
assert table.name == "test"
assert client.post.call_args[0][0] == "/v1/table/test/create/"
def make_mock_http_handler(handler):
class MockLanceDBHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
handler(self)
def do_POST(self):
handler(self)
return MockLanceDBHandler
@contextlib.asynccontextmanager
async def mock_lancedb_connection(handler):
with http.server.HTTPServer(
("localhost", 8080), make_mock_http_handler(handler)
) as server:
handle = threading.Thread(target=server.serve_forever)
handle.start()
db = await lancedb.connect_async(
"db://dev",
api_key="fake",
host_override="http://localhost:8080",
client_config={
"retry_config": {"retries": 2},
"timeout_config": {
"connect_timeout": 1,
},
},
)
try:
yield db
finally:
server.shutdown()
handle.join()
@pytest.mark.asyncio
async def test_async_remote_db():
def handler(request):
# We created a UUID request id
request_id = request.headers["x-request-id"]
assert uuid.UUID(request_id).version == 4
# We set a user agent with the current library version
user_agent = request.headers["User-Agent"]
assert user_agent == f"LanceDB-Python-Client/{lancedb.__version__}"
request.send_response(200)
request.send_header("Content-Type", "application/json")
request.end_headers()
request.wfile.write(b'{"tables": []}')
async with mock_lancedb_connection(handler) as db:
table_names = await db.table_names()
assert table_names == []
@pytest.mark.asyncio
async def test_http_error():
request_id_holder = {"request_id": None}
def handler(request):
request_id_holder["request_id"] = request.headers["x-request-id"]
request.send_response(507)
request.end_headers()
request.wfile.write(b"Internal Server Error")
async with mock_lancedb_connection(handler) as db:
with pytest.raises(HttpError, match="Internal Server Error") as exc_info:
await db.table_names()
assert exc_info.value.request_id == request_id_holder["request_id"]
assert exc_info.value.status_code == 507
@pytest.mark.asyncio
async def test_retry_error():
request_id_holder = {"request_id": None}
def handler(request):
request_id_holder["request_id"] = request.headers["x-request-id"]
request.send_response(429)
request.end_headers()
request.wfile.write(b"Try again later")
async with mock_lancedb_connection(handler) as db:
with pytest.raises(RetryError, match="Hit retry limit") as exc_info:
await db.table_names()
assert exc_info.value.request_id == request_id_holder["request_id"]
assert exc_info.value.status_code == 429
cause = exc_info.value.__cause__
assert isinstance(cause, HttpError)
assert "Try again later" in str(cause)
assert cause.request_id == request_id_holder["request_id"]
assert cause.status_code == 429

Some files were not shown because too many files have changed in this diff Show More