Compare commits

...

19 Commits

Author SHA1 Message Date
Lance Release
e702e0e537 Bump version: 0.21.0 → 0.21.1-beta.0 2025-07-07 21:00:58 +00:00
Lance Release
d4bb59b542 Bump version: 0.24.0 → 0.24.1-beta.0 2025-07-07 21:00:38 +00:00
Wyatt Alt
6b2dd6de51 chore: update lance to 31.1-beta.2 (#2487) 2025-07-07 12:53:16 -07:00
BubbleCal
dbccd9e4f1 chore: upgrade lance to 0.31.1-beta.1 (#2486)
this also upgrades:
- datafusion 47.0 -> 48.0
- half 2.5.0 -> 2.6.0

to be consistent with lance

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-07 22:16:43 +08:00
Will Jones
b12ebfed4c fix: only monotonically update dataset (#2479)
Make sure we only update the latest version if it's actually newer. This
is important if there are concurrent queries, as they can take different
amounts of time.
2025-07-01 08:29:37 -07:00
Weston Pace
1dadb2aefa feat: upgrade to lance 0.31.0-beta.1 (#2469)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Updated dependencies to newer versions for improved compatibility and
stability.

* **Refactor**
* Improved internal handling of data ranges and stream lifetimes for
enhanced performance and reliability.
* Simplified code style for Python query object conversions without
affecting functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-30 11:10:53 -07:00
Haoyu Weng
eb9784d7f2 feat(python): batch Ollama embed calls (#2453)
Other embedding integrations such as Cohere and OpenAI already send
requests in batches. We should do that for Ollama too to improve
throughput.

The Ollama [`.embed`
API](63ca747622/ollama/_client.py (L359-L378))
was added in version 0.3.0 (almost a year ago) so I updated the version
requirement in pyproject.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved compatibility with newer versions of the "ollama" package by
requiring version 0.3.0 or higher.
- Enhanced embedding generation to process batches of texts more
efficiently and reliably.
- **Refactor**
	- Improved type consistency and clarity for embedding-related methods.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-30 08:28:14 -07:00
Kilerd Chan
ba755626cc fix: expose parsing error coming from invalid object store uri (#2475)
this PR is to expose the error from `ListingCatalog::open_path` which
unwrap the Result coming from `ObjectStore::from_uri` to avoid panic
2025-06-30 10:33:18 +08:00
Keming
7760799cb8 docs: fix multivector notebook markdown style (#2447)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Improved formatting and clarity in instructional text within the
Multivector on LanceDB notebook.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-27 15:34:01 -07:00
Will Jones
4beb2d2877 fix(python): make sure explain_plan works with FTS queries (#2466)
## Summary

Fixes issue #2465 where FTS explain plans only showed basic `LanceScan`
instead of detailed execution plans with FTS query details, limits, and
offsets.

## Root Cause

The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing
the `.full_text_search()` call before calling explain/analyze plan,
causing them to operate on the base query without FTS context.

## Changes

- **Fixed** `explain_plan()` and `analyze_plan()` in `src/query.rs` to
call `.full_text_search()`
- **Added comprehensive test coverage** for FTS explain plans with
limits, offsets, and filters
- **Updated existing tests** to expect correct behavior instead of buggy
behavior

## Before/After

**Before (broken):**
```
LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true
```

**After (fixed):**
```
ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score]
  Take: columns="_rowid, _score, (id), (text)"
    CoalesceBatchesExec: target_batch_size=1024
      GlobalLimitExec: skip=2, fetch=4
        MatchQuery: query=test
```

## Test Plan

- [x] All new FTS explain plan tests pass 
- [x] Existing tests continue to pass
- [x] FTS queries now show proper execution plans with MatchQuery,
limits, filters

Closes #2465

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added new test cases to verify explain plan output for full-text
search, vector queries with pagination, and queries with filters.

* **Bug Fixes**
* Improved the accuracy of explain plan and analysis output for
full-text search queries, ensuring the correct query details are
reflected.

* **Refactor**
* Enhanced the formatting and hierarchical structure of execution plans
for hybrid queries, providing clearer and more detailed plan
representations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-26 23:35:14 -07:00
Lance Release
a00b8595d1 Bump version: 0.21.0-beta.0 → 0.21.0 2025-06-20 05:47:06 +00:00
Lance Release
9c8314b4fd Bump version: 0.20.1-beta.2 → 0.21.0-beta.0 2025-06-20 05:46:27 +00:00
Lance Release
c625b6f2b2 Bump version: 0.24.0-beta.0 → 0.24.0 2025-06-20 05:46:05 +00:00
Lance Release
bec8fe6547 Bump version: 0.23.1-beta.2 → 0.24.0-beta.0 2025-06-20 05:46:04 +00:00
BubbleCal
dc1150c011 chore: upgrade lance to 0.30.0 (#2451)
lance [release
details](https://github.com/lancedb/lance/releases/tag/v0.30.0)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated dependency specifications to use exact version numbers instead
of referencing a git repository and tag.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-20 11:27:20 +08:00
Will Jones
afaefc6264 ci: fix package lock again (#2449)
We are able to push commits over here:
cb7293e073/.github/workflows/make-release-commit.yml (L88-L95)

So I think it's safe to assume this will work.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated workflow configuration to improve authentication and branch
targeting for automated release processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-19 08:51:48 -07:00
BubbleCal
cb70ff8cee feat!: switch default FTS to native lance FTS (#2428)
This switches the default FTS to native lance FTS for Python sync table
API, the other APIs have switched to native implementation already

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The default behavior for creating a full-text search index now uses
the new implementation rather than the legacy one.
- **Bug Fixes**
- Improved handling and error messages for phrase queries in full-text
search.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-19 10:38:34 +08:00
BubbleCal
cbb5a841b1 feat: support prefix matching and must_not clause (#2441) 2025-06-19 10:32:32 +08:00
Lance Release
c72f6770fd Bump version: 0.20.1-beta.1 → 0.20.1-beta.2 2025-06-18 23:33:57 +00:00
40 changed files with 507 additions and 309 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.20.1-beta.1"
current_version = "0.21.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -550,6 +550,9 @@ jobs:
bash ci/update_lockfiles.sh
- name: Push new commit
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
branch: main
- name: Notify Slack Action
uses: ravsamhq/notify-slack-action@2.3.0
if: ${{ always() }}

306
Cargo.lock generated
View File

@@ -357,6 +357,8 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "73a47aa0c771b5381de2b7f16998d351a6f4eb839f1e13d48353e17e873d969b"
dependencies = [
"bitflags 2.9.1",
"serde",
"serde_json",
]
[[package]]
@@ -1816,9 +1818,9 @@ dependencies = [
[[package]]
name = "datafusion"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ffe060b978f74ab446be722adb8a274e052e005bf6dfd171caadc3abaad10080"
checksum = "8a11e19a7ccc5bb979c95c1dceef663eab39c9061b3bbf8d1937faf0f03bf41f"
dependencies = [
"arrow",
"arrow-ipc",
@@ -1841,7 +1843,6 @@ dependencies = [
"datafusion-functions-nested",
"datafusion-functions-table",
"datafusion-functions-window",
"datafusion-macros",
"datafusion-optimizer",
"datafusion-physical-expr",
"datafusion-physical-expr-common",
@@ -1852,9 +1853,9 @@ dependencies = [
"futures",
"itertools 0.14.0",
"log",
"object_store 0.12.2",
"object_store",
"parking_lot",
"rand 0.8.5",
"rand 0.9.1",
"regex",
"sqlparser 0.55.0",
"tempfile",
@@ -1865,9 +1866,9 @@ dependencies = [
[[package]]
name = "datafusion-catalog"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "61fe34f401bd03724a1f96d12108144f8cd495a3cdda2bf5e091822fb80b7e66"
checksum = "94985e67cab97b1099db2a7af11f31a45008b282aba921c1e1d35327c212ec18"
dependencies = [
"arrow",
"async-trait",
@@ -1884,16 +1885,16 @@ dependencies = [
"futures",
"itertools 0.14.0",
"log",
"object_store 0.12.2",
"object_store",
"parking_lot",
"tokio",
]
[[package]]
name = "datafusion-catalog-listing"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a4411b8e3bce5e0fc7521e44f201def2e2d5d1b5f176fb56e8cdc9942c890f00"
checksum = "e002df133bdb7b0b9b429d89a69aa77b35caeadee4498b2ce1c7c23a99516988"
dependencies = [
"arrow",
"async-trait",
@@ -1908,15 +1909,15 @@ dependencies = [
"datafusion-session",
"futures",
"log",
"object_store 0.12.2",
"object_store",
"tokio",
]
[[package]]
name = "datafusion-common"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0734015d81c8375eb5d4869b7f7ecccc2ee8d6cb81948ef737cd0e7b743bd69c"
checksum = "e13242fc58fd753787b0a538e5ae77d356cb9d0656fa85a591a33c5f106267f6"
dependencies = [
"ahash",
"arrow",
@@ -1927,7 +1928,7 @@ dependencies = [
"indexmap 2.9.0",
"libc",
"log",
"object_store 0.12.2",
"object_store",
"paste",
"sqlparser 0.55.0",
"tokio",
@@ -1936,9 +1937,9 @@ dependencies = [
[[package]]
name = "datafusion-common-runtime"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5167bb1d2ccbb87c6bc36c295274d7a0519b14afcfdaf401d53cbcaa4ef4968b"
checksum = "d2239f964e95c3a5d6b4a8cde07e646de8995c1396a7fd62c6e784f5341db499"
dependencies = [
"futures",
"log",
@@ -1947,9 +1948,9 @@ dependencies = [
[[package]]
name = "datafusion-datasource"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "04e602dcdf2f50c2abf297cc2203c73531e6f48b29516af7695d338cf2a778b1"
checksum = "2cf792579bc8bf07d1b2f68c2d5382f8a63679cce8fbebfd4ba95742b6e08864"
dependencies = [
"arrow",
"async-trait",
@@ -1967,17 +1968,17 @@ dependencies = [
"glob",
"itertools 0.14.0",
"log",
"object_store 0.12.2",
"rand 0.8.5",
"object_store",
"rand 0.9.1",
"tokio",
"url",
]
[[package]]
name = "datafusion-datasource-csv"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e3bb2253952dc32296ed5b84077cb2e0257fea4be6373e1c376426e17ead4ef6"
checksum = "cfc114f9a1415174f3e8d2719c371fc72092ef2195a7955404cfe6b2ba29a706"
dependencies = [
"arrow",
"async-trait",
@@ -1993,16 +1994,16 @@ dependencies = [
"datafusion-physical-plan",
"datafusion-session",
"futures",
"object_store 0.12.2",
"object_store",
"regex",
"tokio",
]
[[package]]
name = "datafusion-datasource-json"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b8c7f47a5d2fe03bfa521ec9bafdb8a5c82de8377f60967c3663f00c8790352"
checksum = "d88dd5e215c420a52362b9988ecd4cefd71081b730663d4f7d886f706111fc75"
dependencies = [
"arrow",
"async-trait",
@@ -2018,22 +2019,22 @@ dependencies = [
"datafusion-physical-plan",
"datafusion-session",
"futures",
"object_store 0.12.2",
"object_store",
"serde_json",
"tokio",
]
[[package]]
name = "datafusion-doc"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a91f8c2c5788ef32f48ff56c68e5b545527b744822a284373ac79bba1ba47292"
checksum = "e0e7b648387b0c1937b83cb328533c06c923799e73a9e3750b762667f32662c0"
[[package]]
name = "datafusion-execution"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "06f004d100f49a3658c9da6fb0c3a9b760062d96cd4ad82ccc3b7b69a9fb2f84"
checksum = "9609d83d52ff8315283c6dad3b97566e877d8f366fab4c3297742f33dcd636c7"
dependencies = [
"arrow",
"dashmap",
@@ -2041,18 +2042,18 @@ dependencies = [
"datafusion-expr",
"futures",
"log",
"object_store 0.12.2",
"object_store",
"parking_lot",
"rand 0.8.5",
"rand 0.9.1",
"tempfile",
"url",
]
[[package]]
name = "datafusion-expr"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a4e4ce3802609be38eeb607ee72f6fe86c3091460de9dbfae9e18db423b3964"
checksum = "e75230cd67f650ef0399eb00f54d4a073698f2c0262948298e5299fc7324da63"
dependencies = [
"arrow",
"chrono",
@@ -2070,9 +2071,9 @@ dependencies = [
[[package]]
name = "datafusion-expr-common"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "422ac9cf3b22bbbae8cdf8ceb33039107fde1b5492693168f13bd566b1bcc839"
checksum = "70fafb3a045ed6c49cfca0cd090f62cf871ca6326cc3355cb0aaf1260fa760b6"
dependencies = [
"arrow",
"datafusion-common",
@@ -2083,9 +2084,9 @@ dependencies = [
[[package]]
name = "datafusion-functions"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2ddf0a0a2db5d2918349c978d42d80926c6aa2459cd8a3c533a84ec4bb63479e"
checksum = "cdf9a9cf655265861a20453b1e58357147eab59bdc90ce7f2f68f1f35104d3bb"
dependencies = [
"arrow",
"arrow-buffer",
@@ -2103,7 +2104,7 @@ dependencies = [
"itertools 0.14.0",
"log",
"md-5",
"rand 0.8.5",
"rand 0.9.1",
"regex",
"sha2",
"unicode-segmentation",
@@ -2112,9 +2113,9 @@ dependencies = [
[[package]]
name = "datafusion-functions-aggregate"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "408a05dafdc70d05a38a29005b8b15e21b0238734dab1e98483fcb58038c5aba"
checksum = "7f07e49733d847be0a05235e17b884d326a2fd402c97a89fe8bcf0bfba310005"
dependencies = [
"ahash",
"arrow",
@@ -2133,9 +2134,9 @@ dependencies = [
[[package]]
name = "datafusion-functions-aggregate-common"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "756d21da2dd6c9bef97af1504970ff56cbf35d03fbd4ffd62827f02f4d2279d4"
checksum = "4512607e10d72b0b0a1dc08f42cb5bd5284cb8348b7fea49dc83409493e32b1b"
dependencies = [
"ahash",
"arrow",
@@ -2146,9 +2147,9 @@ dependencies = [
[[package]]
name = "datafusion-functions-nested"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8d8d50f6334b378930d992d801a10ac5b3e93b846b39e4a05085742572844537"
checksum = "2ab331806e34f5545e5f03396e4d5068077395b1665795d8f88c14ec4f1e0b7a"
dependencies = [
"arrow",
"arrow-ord",
@@ -2167,9 +2168,9 @@ dependencies = [
[[package]]
name = "datafusion-functions-table"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc9a97220736c8fff1446e936be90d57216c06f28969f9ffd3b72ac93c958c8a"
checksum = "d4ac2c0be983a06950ef077e34e0174aa0cb9e346f3aeae459823158037ade37"
dependencies = [
"arrow",
"async-trait",
@@ -2183,10 +2184,11 @@ dependencies = [
[[package]]
name = "datafusion-functions-window"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cefc2d77646e1aadd1d6a9c40088937aedec04e68c5f0465939912e1291f8193"
checksum = "36f3d92731de384c90906941d36dcadf6a86d4128409a9c5cd916662baed5f53"
dependencies = [
"arrow",
"datafusion-common",
"datafusion-doc",
"datafusion-expr",
@@ -2200,9 +2202,9 @@ dependencies = [
[[package]]
name = "datafusion-functions-window-common"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dd4aff082c42fa6da99ce0698c85addd5252928c908eb087ca3cfa64ff16b313"
checksum = "c679f8bf0971704ec8fd4249fcbb2eb49d6a12cc3e7a840ac047b4928d3541b5"
dependencies = [
"datafusion-common",
"datafusion-physical-expr-common",
@@ -2210,9 +2212,9 @@ dependencies = [
[[package]]
name = "datafusion-macros"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df6f88d7ee27daf8b108ba910f9015176b36fbc72902b1ca5c2a5f1d1717e1a1"
checksum = "2821de7cb0362d12e75a5196b636a59ea3584ec1e1cc7dc6f5e34b9e8389d251"
dependencies = [
"datafusion-expr",
"quote",
@@ -2221,9 +2223,9 @@ dependencies = [
[[package]]
name = "datafusion-optimizer"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "084d9f979c4b155346d3c34b18f4256e6904ded508e9554d90fed416415c3515"
checksum = "1594c7a97219ede334f25347ad8d57056621e7f4f35a0693c8da876e10dd6a53"
dependencies = [
"arrow",
"chrono",
@@ -2239,9 +2241,9 @@ dependencies = [
[[package]]
name = "datafusion-physical-expr"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "64c536062b0076f4e30084065d805f389f9fe38af0ca75bcbac86bc5e9fbab65"
checksum = "dc6da0f2412088d23f6b01929dedd687b5aee63b19b674eb73d00c3eb3c883b7"
dependencies = [
"ahash",
"arrow",
@@ -2256,14 +2258,14 @@ dependencies = [
"itertools 0.14.0",
"log",
"paste",
"petgraph",
"petgraph 0.8.2",
]
[[package]]
name = "datafusion-physical-expr-common"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8a92b53b3193fac1916a1c5b8e3f4347c526f6822e56b71faa5fb372327a863"
checksum = "dcb0dbd9213078a593c3fe28783beaa625a4e6c6a6c797856ee2ba234311fb96"
dependencies = [
"ahash",
"arrow",
@@ -2275,9 +2277,9 @@ dependencies = [
[[package]]
name = "datafusion-physical-optimizer"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6fa0a5ac94c7cf3da97bedabd69d6bbca12aef84b9b37e6e9e8c25286511b5e2"
checksum = "6d140854b2db3ef8ac611caad12bfb2e1e1de827077429322a6188f18fc0026a"
dependencies = [
"arrow",
"datafusion-common",
@@ -2293,9 +2295,9 @@ dependencies = [
[[package]]
name = "datafusion-physical-plan"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "690c615db468c2e5fe5085b232d8b1c088299a6c63d87fd960a354a71f7acb55"
checksum = "b46cbdf21a01206be76d467f325273b22c559c744a012ead5018dfe79597de08"
dependencies = [
"ahash",
"arrow",
@@ -2323,9 +2325,9 @@ dependencies = [
[[package]]
name = "datafusion-session"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ad229a134c7406c057ece00c8743c0c34b97f4e72f78b475fe17b66c5e14fa4f"
checksum = "3a72733766ddb5b41534910926e8da5836622316f6283307fd9fb7e19811a59c"
dependencies = [
"arrow",
"async-trait",
@@ -2340,16 +2342,16 @@ dependencies = [
"futures",
"itertools 0.14.0",
"log",
"object_store 0.12.2",
"object_store",
"parking_lot",
"tokio",
]
[[package]]
name = "datafusion-sql"
version = "47.0.0"
version = "48.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "64f6ab28b72b664c21a27b22a2ff815fd390ed224c26e89a93b5a8154a4e8607"
checksum = "c5162338cdec9cc7ea13a0e6015c361acad5ec1d88d83f7c86301f789473971f"
dependencies = [
"arrow",
"bigdecimal",
@@ -2813,8 +2815,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"rand 0.8.5",
]
@@ -3280,9 +3282,9 @@ dependencies = [
[[package]]
name = "half"
version = "2.5.0"
version = "2.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7db2ff139bba50379da6aa0766b52fdcb62cb5b263009b09ed58ba604e14bbd1"
checksum = "459196ed295495a68f7d7fe1d84f6c4b7ff0e21fe3017b2f283c6fac3ad803c9"
dependencies = [
"bytemuck",
"cfg-if",
@@ -3906,8 +3908,8 @@ dependencies = [
[[package]]
name = "lance"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-arith",
@@ -3947,10 +3949,9 @@ dependencies = [
"lance-io",
"lance-linalg",
"lance-table",
"lazy_static",
"log",
"moka",
"object_store 0.11.2",
"object_store",
"permutation",
"pin-project",
"prost",
@@ -3970,8 +3971,8 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -3988,8 +3989,8 @@ dependencies = [
[[package]]
name = "lance-core"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4003,13 +4004,12 @@ dependencies = [
"deepsize",
"futures",
"lance-arrow",
"lazy_static",
"libc",
"log",
"mock_instant",
"moka",
"num_cpus",
"object_store 0.11.2",
"object_store",
"pin-project",
"prost",
"rand 0.8.5",
@@ -4025,8 +4025,8 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-array",
@@ -4043,7 +4043,6 @@ dependencies = [
"lance-arrow",
"lance-core",
"lance-datagen",
"lazy_static",
"log",
"pin-project",
"prost",
@@ -4055,8 +4054,8 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-array",
@@ -4067,12 +4066,13 @@ dependencies = [
"hex",
"rand 0.8.5",
"rand_xoshiro",
"random_word 0.5.0",
]
[[package]]
name = "lance-encoding"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrayref",
"arrow",
@@ -4093,7 +4093,6 @@ dependencies = [
"itertools 0.13.0",
"lance-arrow",
"lance-core",
"lazy_static",
"log",
"lz4",
"num-traits",
@@ -4106,13 +4105,14 @@ dependencies = [
"snafu",
"tokio",
"tracing",
"xxhash-rust",
"zstd",
]
[[package]]
name = "lance-file"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4133,7 +4133,7 @@ dependencies = [
"lance-io",
"log",
"num-traits",
"object_store 0.11.2",
"object_store",
"prost",
"prost-build",
"prost-types",
@@ -4146,8 +4146,8 @@ dependencies = [
[[package]]
name = "lance-index"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-array",
@@ -4180,11 +4180,10 @@ dependencies = [
"lance-io",
"lance-linalg",
"lance-table",
"lazy_static",
"log",
"moka",
"num-traits",
"object_store 0.11.2",
"object_store",
"prost",
"prost-build",
"rand 0.8.5",
@@ -4202,8 +4201,8 @@ dependencies = [
[[package]]
name = "lance-io"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-arith",
@@ -4225,9 +4224,8 @@ dependencies = [
"futures",
"lance-arrow",
"lance-core",
"lazy_static",
"log",
"object_store 0.11.2",
"object_store",
"path_abs",
"pin-project",
"prost",
@@ -4242,8 +4240,8 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow-array",
"arrow-ord",
@@ -4255,7 +4253,6 @@ dependencies = [
"half",
"lance-arrow",
"lance-core",
"lazy_static",
"log",
"num-traits",
"rand 0.8.5",
@@ -4266,8 +4263,8 @@ dependencies = [
[[package]]
name = "lance-table"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow",
"arrow-array",
@@ -4286,9 +4283,8 @@ dependencies = [
"lance-core",
"lance-file",
"lance-io",
"lazy_static",
"log",
"object_store 0.11.2",
"object_store",
"prost",
"prost-build",
"prost-types",
@@ -4306,8 +4302,8 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "0.30.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.30.0-beta.1#a499cfa06b7221b895bc13908cfc2ee7aadba46e"
version = "0.31.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.31.1-beta.2#dff098a5aa66866197cfcd7ae7ca004aed02928f"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4318,7 +4314,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.20.1-beta.1"
version = "0.21.0"
dependencies = [
"arrow",
"arrow-array",
@@ -4365,12 +4361,12 @@ dependencies = [
"log",
"moka",
"num-traits",
"object_store 0.11.2",
"object_store",
"pin-project",
"polars",
"polars-arrow",
"rand 0.9.1",
"random_word",
"random_word 0.4.3",
"regex",
"reqwest",
"rstest",
@@ -4405,7 +4401,7 @@ dependencies = [
[[package]]
name = "lancedb-node"
version = "0.20.1-beta.1"
version = "0.21.0"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4422,7 +4418,7 @@ dependencies = [
"lancedb",
"lzma-sys",
"neon",
"object_store 0.11.2",
"object_store",
"once_cell",
"snafu",
"tokio",
@@ -4430,7 +4426,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.20.1-beta.1"
version = "0.21.0"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4450,7 +4446,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.23.1-beta.1"
version = "0.24.0"
dependencies = [
"arrow",
"env_logger",
@@ -5159,38 +5155,6 @@ dependencies = [
"memchr",
]
[[package]]
name = "object_store"
version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3cfccb68961a56facde1163f9319e0d15743352344e7808a11795fb99698dcaf"
dependencies = [
"async-trait",
"base64 0.22.1",
"bytes",
"chrono",
"futures",
"httparse",
"humantime",
"hyper 1.6.0",
"itertools 0.13.0",
"md-5",
"parking_lot",
"percent-encoding",
"quick-xml",
"rand 0.8.5",
"reqwest",
"ring",
"rustls-pemfile 2.2.0",
"serde",
"serde_json",
"snafu",
"tokio",
"tracing",
"url",
"walkdir",
]
[[package]]
name = "object_store"
version = "0.12.2"
@@ -5198,14 +5162,28 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7781f96d79ed0f961a7021424ab01840efbda64ae7a505aaea195efc91eaaec4"
dependencies = [
"async-trait",
"base64 0.22.1",
"bytes",
"chrono",
"form_urlencoded",
"futures",
"http 1.3.1",
"http-body-util",
"httparse",
"humantime",
"hyper 1.6.0",
"itertools 0.14.0",
"md-5",
"parking_lot",
"percent-encoding",
"quick-xml",
"rand 0.9.1",
"reqwest",
"ring",
"rustls-pemfile 2.2.0",
"serde",
"serde_json",
"serde_urlencoded",
"thiserror 2.0.12",
"tokio",
"tracing",
@@ -5383,6 +5361,18 @@ dependencies = [
"indexmap 2.9.0",
]
[[package]]
name = "petgraph"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "54acf3a685220b533e437e264e4d932cfbdc4cc7ec0cd232ed73c08d03b8a7ca"
dependencies = [
"fixedbitset",
"hashbrown 0.15.4",
"indexmap 2.9.0",
"serde",
]
[[package]]
name = "phf"
version = "0.11.3"
@@ -5915,7 +5905,7 @@ dependencies = [
"log",
"multimap",
"once_cell",
"petgraph",
"petgraph 0.7.1",
"prettyplease",
"prost",
"prost-types",
@@ -6257,6 +6247,20 @@ dependencies = [
"unicase",
]
[[package]]
name = "random_word"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bcd87d2e3f99cc11e6c7fc518f09e63e194f7243b4cf30c979b0c524d04fbd90"
dependencies = [
"ahash",
"brotli",
"once_cell",
"paste",
"rand 0.8.5",
"unicase",
]
[[package]]
name = "rangemap"
version = "1.5.1"

View File

@@ -21,14 +21,14 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.30.0", "features" = ["dynamodb"], tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.30.0", tag = "v0.30.0-beta.1", git="https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git", features = ["dynamodb"] }
lance-io = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { "version" = "=0.31.1", tag="v0.31.1-beta.2", git="https://github.com/lancedb/lance.git" }
# Note that this one does not include pyarrow
arrow = { version = "55.1", optional = false }
arrow-array = "55.1"
@@ -39,20 +39,20 @@ arrow-schema = "55.1"
arrow-arith = "55.1"
arrow-cast = "55.1"
async-trait = "0"
datafusion = { version = "47.0", default-features = false }
datafusion-catalog = "47.0"
datafusion-common = { version = "47.0", default-features = false }
datafusion-execution = "47.0"
datafusion-expr = "47.0"
datafusion-physical-plan = "47.0"
datafusion = { version = "48.0", default-features = false }
datafusion-catalog = "48.0"
datafusion-common = { version = "48.0", default-features = false }
datafusion-execution = "48.0"
datafusion-expr = "48.0"
datafusion-physical-plan = "48.0"
env_logger = "0.11"
half = { "version" = "=2.5.0", default-features = false, features = [
half = { "version" = "=2.6.0", default-features = false, features = [
"num-traits",
] }
futures = "0"
log = "0.4"
moka = { version = "0.12", features = ["future"] }
object_store = "0.11.0"
object_store = "0.12.0"
pin-project = "1.0.7"
snafu = "0.8"
url = "2"

View File

@@ -428,7 +428,7 @@
"\n",
"**Why?** \n",
"Embedding the UFO dataset and ingesting it into LanceDB takes **~2 hours on a T4 GPU**. To save time: \n",
"- **Use the pre-prepared table with index created ** (provided below) to proceed directly to step7: search. \n",
"- **Use the pre-prepared table with index created** (provided below) to proceed directly to **Step 7**: search. \n",
"- **Step 5a** contains the full ingestion code for reference (run it only if necessary). \n",
"- **Step 6** contains the details on creating the index on the multivector column"
]

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.20.1-beta.1</version>
<version>0.21.1-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.20.1-beta.1</version>
<version>0.21.1-beta.0</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

44
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.20.1-beta.1",
"version": "0.21.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.20.1-beta.1",
"version": "0.21.0",
"cpu": [
"x64",
"arm64"
@@ -52,11 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.20.1-beta.1",
"@lancedb/vectordb-darwin-x64": "0.20.1-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.1-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.20.1-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.20.1-beta.1"
"@lancedb/vectordb-darwin-arm64": "0.21.0",
"@lancedb/vectordb-darwin-x64": "0.21.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.21.0",
"@lancedb/vectordb-linux-x64-gnu": "0.21.0",
"@lancedb/vectordb-win32-x64-msvc": "0.21.0"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -327,9 +327,9 @@
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.20.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.20.1-beta.1.tgz",
"integrity": "sha512-DPD8gwFQz5aENYYbTFS/l3YX/rqzS6Kj2B4IZERccVFULQsdR5YwtaAfFwTMp7NSnsjWKwJAknohiMZlJr4njQ==",
"version": "0.21.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.21.0.tgz",
"integrity": "sha512-FTKbdYG36mvQ75tId+esyRfRjIBzryRhAp/6h51tiXy8gsq/TButuiPdqIXeonNModEjhu8wkzsGFwgjCcePow==",
"cpu": [
"arm64"
],
@@ -339,9 +339,9 @@
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.20.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.20.1-beta.1.tgz",
"integrity": "sha512-lTPtlRSTC08UgQW5Bv8WYhdbogAgUJ+9ejg+UE+fwP9gEsgEKXL/SHBm+9gmAlTo7LbrxJjg0CtCde/mW68UTw==",
"version": "0.21.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.21.0.tgz",
"integrity": "sha512-vGaFBr2sQZWE0mudg3LGTHiRE7p2Qce2ogiE2VAf1DLAJ4MrIhgVmEttf966ausIwNCgml+5AzUntw6zC0Oyuw==",
"cpu": [
"x64"
],
@@ -351,9 +351,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.20.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.20.1-beta.1.tgz",
"integrity": "sha512-w/3O9FvwQiGegYsM21yZ0FezfOFVsW7HttYwwPzZMZaCpK3/i+LvZVSqwO4qXHHJBtHgKevonINyvVlg5487aQ==",
"version": "0.21.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.21.0.tgz",
"integrity": "sha512-KlxqhnX4eBN6rDqrPgf/x/vLpnHK2UcIzNLpiOZzSAhooCmKmnNpfs/EXt+KRFloEQMy25AHpMpqkSPv1Q2oDA==",
"cpu": [
"arm64"
],
@@ -363,9 +363,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.20.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.20.1-beta.1.tgz",
"integrity": "sha512-rq7Q6Lq9kJmBcgwplYQVJmRbyeP+xPVmXyyQfAO3IjekqeSsyjj1HoCZYqZIfBZyN5ELiSvIJB0731aKf9pr1A==",
"version": "0.21.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.21.0.tgz",
"integrity": "sha512-t7dkFV6kga3rqXR1rH460GdpSVuY0tw7CIc0KqsIIkBcXzUPA1n0QDoazdwPQ1MXzG/+F5WWCTp3dYWx2vP0Lw==",
"cpu": [
"x64"
],
@@ -375,9 +375,9 @@
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.20.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.20.1-beta.1.tgz",
"integrity": "sha512-kHra0SEXeMKdgqi5h0igsqHcBr73hKBhEVJBa8VTv1DUv6Jvazwl4B4ueqllcyD4k3vvOTb2XzZomm7dhQ9QnA==",
"version": "0.21.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.21.0.tgz",
"integrity": "sha512-yovkW61RECBTsu0S527BX1uW0jCAZK9MAsJTknXmDjp78figx4/AyI5ajT63u/Uo4EKoheeNiiLdyU4v+A9YVw==",
"cpu": [
"x64"
],

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -89,10 +89,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.20.1-beta.1",
"@lancedb/vectordb-darwin-arm64": "0.20.1-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.20.1-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.1-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.20.1-beta.1"
"@lancedb/vectordb-darwin-x64": "0.21.1-beta.0",
"@lancedb/vectordb-darwin-arm64": "0.21.1-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.21.1-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.21.1-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.21.1-beta.0"
}
}

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.20.1-beta.1"
version = "0.21.1-beta.0"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -368,9 +368,9 @@ describe("merge insert", () => {
{ a: 4, b: "z" },
];
expect(
JSON.parse(JSON.stringify((await table.toArrow()).toArray())),
).toEqual(expected);
const result = (await table.toArrow()).toArray().sort((a, b) => a.a - b.a);
expect(result.map((row) => ({ ...row }))).toEqual(expected);
});
test("conditional update", async () => {
const newData = [
@@ -1650,13 +1650,25 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(resultSet.has("fob")).toBe(true);
expect(resultSet.has("fo")).toBe(true);
expect(resultSet.has("food")).toBe(true);
const prefixResults = await table
.search(
new MatchQuery("foo", "text", { fuzziness: 3, prefixLength: 3 }),
)
.toArray();
expect(prefixResults.length).toBe(2);
const resultSet2 = new Set(prefixResults.map((r) => r.text));
expect(resultSet2.has("foo")).toBe(true);
expect(resultSet2.has("food")).toBe(true);
});
test("full text search boolean query", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: "hello world", vector: [0.1, 0.2, 0.3] },
{ text: "goodbye world", vector: [0.4, 0.5, 0.6] },
{ text: "The cat and dog are playing" },
{ text: "The cat is sleeping" },
{ text: "The dog is barking" },
{ text: "The dog chases the cat" },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
@@ -1666,22 +1678,32 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
const shouldResults = await table
.search(
new BooleanQuery([
[Occur.Should, new MatchQuery("hello", "text")],
[Occur.Should, new MatchQuery("goodbye", "text")],
[Occur.Should, new MatchQuery("cat", "text")],
[Occur.Should, new MatchQuery("dog", "text")],
]),
)
.toArray();
expect(shouldResults.length).toBe(2);
expect(shouldResults.length).toBe(4);
const mustResults = await table
.search(
new BooleanQuery([
[Occur.Must, new MatchQuery("hello", "text")],
[Occur.Must, new MatchQuery("world", "text")],
[Occur.Must, new MatchQuery("cat", "text")],
[Occur.Must, new MatchQuery("dog", "text")],
]),
)
.toArray();
expect(mustResults.length).toBe(1);
expect(mustResults.length).toBe(2);
const mustNotResults = await table
.search(
new BooleanQuery([
[Occur.Must, new MatchQuery("cat", "text")],
[Occur.MustNot, new MatchQuery("dog", "text")],
]),
)
.toArray();
expect(mustNotResults.length).toBe(1);
});
test.each([

View File

@@ -812,10 +812,12 @@ export enum Operator {
*
* - `Must`: The term must be present in the document.
* - `Should`: The term should contribute to the document score, but is not required.
* - `MustNot`: The term must not be present in the document.
*/
export enum Occur {
Must = "MUST",
Should = "SHOULD",
Must = "MUST",
MustNot = "MUST_NOT",
}
/**
@@ -856,6 +858,7 @@ export class MatchQuery implements FullTextQuery {
* - `fuzziness`: The fuzziness level for the query (default is 0).
* - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
* - `prefixLength`: The number of beginning characters being unchanged for fuzzy matching.
*/
constructor(
query: string,
@@ -865,6 +868,7 @@ export class MatchQuery implements FullTextQuery {
fuzziness?: number;
maxExpansions?: number;
operator?: Operator;
prefixLength?: number;
},
) {
let fuzziness = options?.fuzziness;
@@ -878,6 +882,7 @@ export class MatchQuery implements FullTextQuery {
fuzziness,
options?.maxExpansions ?? 50,
options?.operator ?? Operator.Or,
options?.prefixLength ?? 0,
);
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.20.1-beta.1",
"version": "0.21.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.20.1-beta.1",
"version": "0.21.0",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.20.1-beta.1",
"version": "0.21.1-beta.0",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -335,6 +335,7 @@ impl JsFullTextQuery {
fuzziness: Option<u32>,
max_expansions: u32,
operator: String,
prefix_length: u32,
) -> napi::Result<Self> {
Ok(Self {
inner: MatchQuery::new(query)
@@ -347,6 +348,7 @@ impl JsFullTextQuery {
napi::Error::from_reason(format!("Invalid operator: {}", e))
})?,
)
.with_prefix_length(prefix_length)
.into(),
})
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.23.1-beta.2"
current_version = "0.24.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.23.1-beta.2"
version = "0.24.1-beta.0"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -85,7 +85,7 @@ embeddings = [
"boto3>=1.28.57",
"awscli>=1.29.57",
"botocore>=1.31.57",
"ollama",
"ollama>=0.3.0",
"ibm-watsonx-ai>=1.1.2",
]
azure = ["adlfs>=2024.2.0"]

View File

@@ -2,14 +2,15 @@
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from functools import cached_property
from typing import TYPE_CHECKING, List, Optional, Union
from typing import TYPE_CHECKING, List, Optional, Sequence, Union
import numpy as np
from ..util import attempt_import_or_raise
from .base import TextEmbeddingFunction
from .registry import register
if TYPE_CHECKING:
import numpy as np
import ollama
@@ -28,23 +29,21 @@ class OllamaEmbeddings(TextEmbeddingFunction):
keep_alive: Optional[Union[float, str]] = None
ollama_client_kwargs: Optional[dict] = {}
def ndims(self):
def ndims(self) -> int:
return len(self.generate_embeddings(["foo"])[0])
def _compute_embedding(self, text) -> Union["np.array", None]:
return (
self._ollama_client.embeddings(
model=self.name,
prompt=text,
options=self.options,
keep_alive=self.keep_alive,
)["embedding"]
or None
def _compute_embedding(self, text: Sequence[str]) -> Sequence[Sequence[float]]:
response = self._ollama_client.embed(
model=self.name,
input=text,
options=self.options,
keep_alive=self.keep_alive,
)
return response.embeddings
def generate_embeddings(
self, texts: Union[List[str], "np.ndarray"]
) -> list[Union["np.array", None]]:
self, texts: Union[List[str], np.ndarray]
) -> list[Union[np.array, None]]:
"""
Get the embeddings for the given texts
@@ -54,8 +53,8 @@ class OllamaEmbeddings(TextEmbeddingFunction):
The texts to embed
"""
# TODO retry, rate limit, token limit
embeddings = [self._compute_embedding(text) for text in texts]
return embeddings
embeddings = self._compute_embedding(texts)
return list(embeddings)
@cached_property
def _ollama_client(self) -> "ollama.Client":

View File

@@ -101,8 +101,9 @@ class FullTextOperator(str, Enum):
class Occur(str, Enum):
MUST = "MUST"
SHOULD = "SHOULD"
MUST = "MUST"
MUST_NOT = "MUST_NOT"
@pydantic.dataclasses.dataclass
@@ -181,6 +182,9 @@ class MatchQuery(FullTextQuery):
Can be either `AND` or `OR`.
If `AND`, all terms in the query must match.
If `OR`, at least one term in the query must match.
prefix_length : int, optional
The number of beginning characters being unchanged for fuzzy matching.
This is useful to achieve prefix matching.
"""
query: str
@@ -189,6 +193,7 @@ class MatchQuery(FullTextQuery):
fuzziness: int = pydantic.Field(0, kw_only=True)
max_expansions: int = pydantic.Field(50, kw_only=True)
operator: FullTextOperator = pydantic.Field(FullTextOperator.OR, kw_only=True)
prefix_length: int = pydantic.Field(0, kw_only=True)
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MATCH
@@ -1446,10 +1451,13 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
query = self._query
if self._phrase_query:
raise NotImplementedError(
"Phrase query is not yet supported in Lance FTS. "
"Use tantivy-based index instead for now."
)
if isinstance(query, str):
if not query.startswith('"') or not query.endswith('"'):
query = f'"{query}"'
elif isinstance(query, FullTextQuery) and not isinstance(
query, PhraseQuery
):
raise TypeError("Please use PhraseQuery for phrase queries.")
query = self.to_query_object()
results = self._table._execute_query(query, timeout=timeout)
results = results.read_all()
@@ -3034,15 +3042,21 @@ class AsyncHybridQuery(AsyncQueryBase, AsyncVectorQueryBase):
>>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
Vector Search Plan:
ProjectionExec: expr=[vector@0 as vector, text@3 as text, _distance@2 as _distance]
Take: columns="vector, _rowid, _distance, (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
FilterExec: _distance@2 IS NOT NULL
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2
LanceScan: uri=..., projection=[vector], row_id=true, row_addr=false, ordered=false
Take: columns="vector, _rowid, _distance, (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
FilterExec: _distance@2 IS NOT NULL
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2
LanceScan: uri=..., projection=[vector], row_id=true, row_addr=false, ordered=false
<BLANKLINE>
FTS Search Plan:
LanceScan: uri=..., projection=[vector, text], row_id=false, row_addr=false, ordered=true
ProjectionExec: expr=[vector@2 as vector, text@3 as text, _score@1 as _score]
Take: columns="_rowid, _score, (vector), (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
MatchQuery: query=hello
<BLANKLINE>
Parameters
----------

View File

@@ -827,7 +827,7 @@ class Table(ABC):
ordering_field_names: Optional[Union[str, List[str]]] = None,
replace: bool = False,
writer_heap_size: Optional[int] = 1024 * 1024 * 1024,
use_tantivy: bool = True,
use_tantivy: bool = False,
tokenizer_name: Optional[str] = None,
with_position: bool = False,
# tokenizer configs:
@@ -864,7 +864,7 @@ class Table(ABC):
The tokenizer to use for the index. Can be "raw", "default" or the 2 letter
language code followed by "_stem". So for english it would be "en_stem".
For available languages see: https://docs.rs/tantivy/latest/tantivy/tokenizer/enum.Language.html
use_tantivy: bool, default True
use_tantivy: bool, default False
If True, use the legacy full-text search implementation based on tantivy.
If False, use the new full-text search implementation based on lance-index.
with_position: bool, default False
@@ -1970,7 +1970,7 @@ class LanceTable(Table):
ordering_field_names: Optional[Union[str, List[str]]] = None,
replace: bool = False,
writer_heap_size: Optional[int] = 1024 * 1024 * 1024,
use_tantivy: bool = True,
use_tantivy: bool = False,
tokenizer_name: Optional[str] = None,
with_position: bool = False,
# tokenizer configs:

View File

@@ -6,7 +6,7 @@ import lancedb
# --8<-- [end:import-lancedb]
# --8<-- [start:import-numpy]
from lancedb.query import BoostQuery, MatchQuery
from lancedb.query import BooleanQuery, BoostQuery, MatchQuery, Occur
import numpy as np
import pyarrow as pa
@@ -191,6 +191,15 @@ def test_fts_fuzzy_query():
"food", # 1 insertion
}
results = table.search(
MatchQuery("foo", "text", fuzziness=1, prefix_length=3)
).to_pandas()
assert len(results) == 2
assert set(results["text"].to_list()) == {
"foo",
"food",
}
@pytest.mark.skipif(
os.name == "nt", reason="Need to fix https://github.com/lancedb/lance/issues/3905"
@@ -240,6 +249,60 @@ def test_fts_boost_query():
)
@pytest.mark.skipif(
os.name == "nt", reason="Need to fix https://github.com/lancedb/lance/issues/3905"
)
def test_fts_boolean_query(tmp_path):
uri = tmp_path / "boolean-example"
db = lancedb.connect(uri)
table = db.create_table(
"my_table_fts_boolean",
data=[
{"text": "The cat and dog are playing"},
{"text": "The cat is sleeping"},
{"text": "The dog is barking"},
{"text": "The dog chases the cat"},
],
mode="overwrite",
)
table.create_fts_index("text", use_tantivy=False, replace=True)
# SHOULD
results = table.search(
MatchQuery("cat", "text") | MatchQuery("dog", "text")
).to_pandas()
assert len(results) == 4
assert set(results["text"].to_list()) == {
"The cat and dog are playing",
"The cat is sleeping",
"The dog is barking",
"The dog chases the cat",
}
# MUST
results = table.search(
MatchQuery("cat", "text") & MatchQuery("dog", "text")
).to_pandas()
assert len(results) == 2
assert set(results["text"].to_list()) == {
"The cat and dog are playing",
"The dog chases the cat",
}
# MUST NOT
results = table.search(
BooleanQuery(
[
(Occur.MUST, MatchQuery("cat", "text")),
(Occur.MUST_NOT, MatchQuery("dog", "text")),
]
)
).to_pandas()
assert len(results) == 1
assert set(results["text"].to_list()) == {
"The cat is sleeping",
}
@pytest.mark.skipif(
os.name == "nt", reason="Need to fix https://github.com/lancedb/lance/issues/3905"
)

View File

@@ -775,6 +775,82 @@ async def test_explain_plan_async(table_async: AsyncTable):
assert "KNN" in plan
@pytest.mark.asyncio
async def test_explain_plan_fts(table_async: AsyncTable):
"""Test explain plan for FTS queries"""
# Create FTS index
from lancedb.index import FTS
await table_async.create_index("text", config=FTS())
# Test pure FTS query
query = await table_async.search("dog", query_type="fts", fts_columns="text")
plan = await query.explain_plan()
# Should show FTS details (issue #2465 is now fixed)
assert "MatchQuery: query=dog" in plan
assert "GlobalLimitExec" in plan # Default limit
# Test FTS query with limit
query_with_limit = await table_async.search(
"dog", query_type="fts", fts_columns="text"
)
plan_with_limit = await query_with_limit.limit(1).explain_plan()
assert "MatchQuery: query=dog" in plan_with_limit
assert "GlobalLimitExec: skip=0, fetch=1" in plan_with_limit
# Test FTS query with offset and limit
query_with_offset = await table_async.search(
"dog", query_type="fts", fts_columns="text"
)
plan_with_offset = await query_with_offset.offset(1).limit(1).explain_plan()
assert "MatchQuery: query=dog" in plan_with_offset
assert "GlobalLimitExec: skip=1, fetch=1" in plan_with_offset
@pytest.mark.asyncio
async def test_explain_plan_vector_with_limit_offset(table_async: AsyncTable):
"""Test explain plan for vector queries with limit and offset"""
# Test vector query with limit
plan_with_limit = await (
table_async.query().nearest_to(pa.array([1, 2])).limit(1).explain_plan()
)
assert "KNN" in plan_with_limit
assert "GlobalLimitExec: skip=0, fetch=1" in plan_with_limit
# Test vector query with offset and limit
plan_with_offset = await (
table_async.query()
.nearest_to(pa.array([1, 2]))
.offset(1)
.limit(1)
.explain_plan()
)
assert "KNN" in plan_with_offset
assert "GlobalLimitExec: skip=1, fetch=1" in plan_with_offset
@pytest.mark.asyncio
async def test_explain_plan_with_filters(table_async: AsyncTable):
"""Test explain plan for queries with filters"""
# Test vector query with filter
plan_with_filter = await (
table_async.query().nearest_to(pa.array([1, 2])).where("id = 1").explain_plan()
)
assert "KNN" in plan_with_filter
assert "FilterExec" in plan_with_filter
# Test FTS query with filter
from lancedb.index import FTS
await table_async.create_index("text", config=FTS())
query_fts_filter = await table_async.search(
"dog", query_type="fts", fts_columns="text"
)
plan_fts_filter = await query_fts_filter.where("id = 1").explain_plan()
assert "MatchQuery: query=dog" in plan_fts_filter
assert "FilterExec: id@" in plan_fts_filter # Should show filter details
@pytest.mark.asyncio
async def test_query_camelcase_async(tmp_path):
db = await lancedb.connect_async(tmp_path)

View File

@@ -245,7 +245,7 @@ def test_s3_dynamodb_sync(s3_bucket: str, commit_table: str, monkeypatch):
NotImplementedError,
match="Full-text search is only supported on the local filesystem",
):
table.create_fts_index("x")
table.create_fts_index("x", use_tantivy=True)
# make sure list tables still works
assert db.table_names() == ["test_ddb_sync"]

View File

@@ -50,8 +50,9 @@ impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
let fuzziness = ob.getattr("fuzziness")?.extract()?;
let max_expansions = ob.getattr("max_expansions")?.extract()?;
let operator = ob.getattr("operator")?.extract::<String>()?;
let prefix_length = ob.getattr("prefix_length")?.extract()?;
Ok(PyLanceDB(
Ok(Self(
MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
@@ -60,6 +61,7 @@ impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
.with_operator(Operator::try_from(operator.as_str()).map_err(|e| {
PyValueError::new_err(format!("Invalid operator: {}", e))
})?)
.with_prefix_length(prefix_length)
.into(),
))
}
@@ -68,7 +70,7 @@ impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
let column = ob.getattr("column")?.extract()?;
let slop = ob.getattr("slop")?.extract()?;
Ok(PyLanceDB(
Ok(Self(
PhraseQuery::new(query)
.with_column(Some(column))
.with_slop(slop)
@@ -76,10 +78,10 @@ impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
))
}
"BoostQuery" => {
let positive: PyLanceDB<FtsQuery> = ob.getattr("positive")?.extract()?;
let negative: PyLanceDB<FtsQuery> = ob.getattr("negative")?.extract()?;
let positive: Self = ob.getattr("positive")?.extract()?;
let negative: Self = ob.getattr("negative")?.extract()?;
let negative_boost = ob.getattr("negative_boost")?.extract()?;
Ok(PyLanceDB(
Ok(Self(
BoostQuery::new(positive.0, negative.0, negative_boost).into(),
))
}
@@ -101,18 +103,17 @@ impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
let op = Operator::try_from(operator.as_str())
.map_err(|e| PyValueError::new_err(format!("Invalid operator: {}", e)))?;
Ok(PyLanceDB(q.with_operator(op).into()))
Ok(Self(q.with_operator(op).into()))
}
"BooleanQuery" => {
let queries: Vec<(String, PyLanceDB<FtsQuery>)> =
ob.getattr("queries")?.extract()?;
let queries: Vec<(String, Self)> = ob.getattr("queries")?.extract()?;
let mut sub_queries = Vec::with_capacity(queries.len());
for (occur, q) in queries {
let occur = Occur::try_from(occur.as_str())
.map_err(|e| PyValueError::new_err(e.to_string()))?;
sub_queries.push((occur, q.0));
}
Ok(PyLanceDB(BooleanQuery::new(sub_queries).into()))
Ok(Self(BooleanQuery::new(sub_queries).into()))
}
name => Err(PyValueError::new_err(format!(
"Unsupported FTS query type: {}",
@@ -139,7 +140,8 @@ impl<'py> IntoPyObject<'py> for PyLanceDB<FtsQuery> {
kwargs.set_item("boost", query.boost)?;
kwargs.set_item("fuzziness", query.fuzziness)?;
kwargs.set_item("max_expansions", query.max_expansions)?;
kwargs.set_item("operator", operator_to_str(query.operator))?;
kwargs.set_item::<_, &str>("operator", query.operator.into())?;
kwargs.set_item("prefix_length", query.prefix_length)?;
namespace
.getattr(intern!(py, "MatchQuery"))?
.call((query.terms, query.column.unwrap()), Some(&kwargs))
@@ -152,8 +154,8 @@ impl<'py> IntoPyObject<'py> for PyLanceDB<FtsQuery> {
.call((query.terms, query.column.unwrap()), Some(&kwargs))
}
FtsQuery::Boost(query) => {
let positive = PyLanceDB(query.positive.as_ref().clone()).into_pyobject(py)?;
let negative = PyLanceDB(query.negative.as_ref().clone()).into_pyobject(py)?;
let positive = Self(query.positive.as_ref().clone()).into_pyobject(py)?;
let negative = Self(query.negative.as_ref().clone()).into_pyobject(py)?;
let kwargs = PyDict::new(py);
kwargs.set_item("negative_boost", query.negative_boost)?;
namespace
@@ -169,19 +171,25 @@ impl<'py> IntoPyObject<'py> for PyLanceDB<FtsQuery> {
.unzip();
let kwargs = PyDict::new(py);
kwargs.set_item("boosts", boosts)?;
kwargs.set_item("operator", operator_to_str(first.operator))?;
kwargs.set_item::<_, &str>("operator", first.operator.into())?;
namespace
.getattr(intern!(py, "MultiMatchQuery"))?
.call((first.terms.clone(), columns), Some(&kwargs))
}
FtsQuery::Boolean(query) => {
let mut queries = Vec::with_capacity(query.must.len() + query.should.len());
for q in query.must {
queries.push((occur_to_str(Occur::Must), PyLanceDB(q).into_pyobject(py)?));
}
let mut queries: Vec<(&str, Bound<'py, PyAny>)> = Vec::with_capacity(
query.should.len() + query.must.len() + query.must_not.len(),
);
for q in query.should {
queries.push((occur_to_str(Occur::Should), PyLanceDB(q).into_pyobject(py)?));
queries.push((Occur::Should.into(), Self(q).into_pyobject(py)?));
}
for q in query.must {
queries.push((Occur::Must.into(), Self(q).into_pyobject(py)?));
}
for q in query.must_not {
queries.push((Occur::MustNot.into(), Self(q).into_pyobject(py)?));
}
namespace
.getattr(intern!(py, "BooleanQuery"))?
.call1((queries,))
@@ -190,21 +198,6 @@ impl<'py> IntoPyObject<'py> for PyLanceDB<FtsQuery> {
}
}
fn operator_to_str(op: Operator) -> &'static str {
match op {
Operator::And => "AND",
Operator::Or => "OR",
}
}
fn occur_to_str(occur: Occur) -> &'static str {
match occur {
Occur::Must => "MUST",
Occur::Should => "SHOULD",
Occur::MustNot => "MUST NOT",
}
}
// Python representation of query vector(s)
#[derive(Clone)]
pub struct PyQueryVectors(Vec<Arc<dyn Array>>);
@@ -569,7 +562,10 @@ impl FTSQuery {
}
pub fn explain_plan(self_: PyRef<'_, Self>, verbose: bool) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
let inner = self_
.inner
.clone()
.full_text_search(self_.fts_query.clone());
future_into_py(self_.py(), async move {
inner
.explain_plan(verbose)
@@ -579,7 +575,10 @@ impl FTSQuery {
}
pub fn analyze_plan(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
let inner = self_
.inner
.clone()
.full_text_search(self_.fts_query.clone());
future_into_py(self_.py(), async move {
inner
.analyze_plan()

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.20.1-beta.1"
version = "0.21.1-beta.0"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.20.1-beta.1"
version = "0.21.1-beta.0"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -105,7 +105,7 @@ impl ListingCatalog {
}
async fn open_path(path: &str) -> Result<Self> {
let (object_store, base_path) = ObjectStore::from_uri(path).await.unwrap();
let (object_store, base_path) = ObjectStore::from_uri(path).await?;
if object_store.is_local() {
Self::try_create_dir(path).context(CreateDirSnafu { path })?;
}

View File

@@ -107,7 +107,7 @@ impl ObjectStore for MirroringObjectStore {
self.primary.delete(location).await
}
fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, Result<ObjectMeta>> {
fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, Result<ObjectMeta>> {
self.primary.list(prefix)
}

View File

@@ -119,7 +119,7 @@ impl ObjectStore for IoTrackingStore {
let result = self.target.get(location).await;
if let Ok(result) = &result {
let num_bytes = result.range.end - result.range.start;
self.record_read(num_bytes as u64);
self.record_read(num_bytes);
}
result
}
@@ -128,12 +128,12 @@ impl ObjectStore for IoTrackingStore {
let result = self.target.get_opts(location, options).await;
if let Ok(result) = &result {
let num_bytes = result.range.end - result.range.start;
self.record_read(num_bytes as u64);
self.record_read(num_bytes);
}
result
}
async fn get_range(&self, location: &Path, range: std::ops::Range<usize>) -> OSResult<Bytes> {
async fn get_range(&self, location: &Path, range: std::ops::Range<u64>) -> OSResult<Bytes> {
let result = self.target.get_range(location, range).await;
if let Ok(result) = &result {
self.record_read(result.len() as u64);
@@ -144,7 +144,7 @@ impl ObjectStore for IoTrackingStore {
async fn get_ranges(
&self,
location: &Path,
ranges: &[std::ops::Range<usize>],
ranges: &[std::ops::Range<u64>],
) -> OSResult<Vec<Bytes>> {
let result = self.target.get_ranges(location, ranges).await;
if let Ok(result) = &result {
@@ -170,7 +170,7 @@ impl ObjectStore for IoTrackingStore {
self.target.delete_stream(locations)
}
fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, OSResult<ObjectMeta>> {
fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, OSResult<ObjectMeta>> {
self.record_read(0);
self.target.list(prefix)
}
@@ -179,7 +179,7 @@ impl ObjectStore for IoTrackingStore {
&self,
prefix: Option<&Path>,
offset: &Path,
) -> BoxStream<'_, OSResult<ObjectMeta>> {
) -> BoxStream<'static, OSResult<ObjectMeta>> {
self.record_read(0);
self.target.list_with_offset(prefix, offset)
}

View File

@@ -392,9 +392,18 @@ pub mod tests {
} else {
expected_line.trim()
};
assert_eq!(&actual_trimmed[..expected_trimmed.len()], expected_trimmed);
assert_eq!(
&actual_trimmed[..expected_trimmed.len()],
expected_trimmed,
"\nactual:\n{physical_plan}\nexpected:\n{expected}"
);
}
assert_eq!(lines_checked, expected.lines().count());
assert_eq!(
lines_checked,
expected.lines().count(),
"\nlines_checked:\n{lines_checked}\nexpected:\n{}",
expected.lines().count()
);
}
}
@@ -477,9 +486,9 @@ pub mod tests {
TestFixture::check_plan(
plan,
"MetadataEraserExec
RepartitionExec:...
CoalesceBatchesExec:...
FilterExec: i@0 >= 5
RepartitionExec:...
ProjectionExec:...
LanceScan:...",
)

View File

@@ -129,7 +129,9 @@ impl DatasetRef {
dataset: ref mut ds,
..
} => {
*ds = dataset;
if dataset.manifest().version > ds.manifest().version {
*ds = dataset;
}
}
_ => unreachable!("Dataset should be in latest mode at this point"),
}