Compare commits

..

114 Commits

Author SHA1 Message Date
Lance Release
27d9e5c596 Bump version: 0.22.0-beta.5 → 0.22.0-beta.6 2025-04-08 06:16:14 +00:00
BubbleCal
ec8271931f feat: support to create FTS index on list of strings (#2317)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal library dependencies to the latest beta version for
improved system stability.
- **Tests**
- Added automated tests to validate full-text search functionality on
list-based text fields.
- **Refactor**
- Enhanced the search processing logic to provide robust support for
list-type text data, ensuring more reliable results.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-08 14:12:35 +08:00
Lance Release
6c6966600c Updating package-lock.json 2025-04-04 22:56:57 +00:00
Lance Release
2e170c3c7b Updating package-lock.json 2025-04-04 21:50:28 +00:00
Lance Release
fd92e651d1 Updating package-lock.json 2025-04-04 21:50:12 +00:00
Lance Release
c298482ee1 Bump version: 0.19.0-beta.4 → 0.19.0-beta.5 2025-04-04 21:49:53 +00:00
Lance Release
d59f64b5a3 Bump version: 0.22.0-beta.4 → 0.22.0-beta.5 2025-04-04 21:49:34 +00:00
fzowl
30ed8c4c43 fix: voyageai regression multimodal supercedes text models (#2268)
fix #2160
2025-04-04 14:45:56 -07:00
Will Jones
4a2cdbf299 ci: provide token for deprecate call (#2309)
This should prevent the failures we are seeing in Node release.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chore**
- Enhanced the package deprecation process with improved security
measures, ensuring smoother and more reliable updates during package
deprecation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 14:44:58 -07:00
Will Jones
657843d9e9 perf: remove redundant checkout latest (#2310)
This bug was introduced in https://github.com/lancedb/lancedb/pull/2281

Likely introduced during a rebase when fixing merge conflicts.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Updated the refresh process so that reloading now uses the existing
dataset version instead of automatically updating to the latest version.
This change may affect workflows that rely on immediate data updates
during refresh.
  
- **New Features**
- Introduced a new module for tracking I/O statistics in object store
operations, enhancing monitoring capabilities.
- Added a new test module to validate the functionality of the dataset
operations.

- **Bug Fixes**
- Reintroduced the `write_options` method in the `CreateTableBuilder`,
ensuring consistent functionality across different builder variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:56:02 -07:00
Will Jones
1cd76b8498 feat: add timeout to query execution options (#2288)
Closes #2287


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added configurable timeout support for query executions. Users can now
specify maximum wait times for queries, enhancing control over
long-running operations across various integrations.
- **Tests**
- Expanded test coverage to validate timeout behavior in both
synchronous and asynchronous query flows, ensuring timely error
responses when query execution exceeds the specified limit.
- Introduced a new test suite to verify query operations when a timeout
is reached, checking for appropriate error handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:34:41 -07:00
Lei Xu
a38f784081 chore: add numpy as dependency (#2308) 2025-04-04 10:33:39 -07:00
Will Jones
647dee4e94 ci: check release builds when we change dependencies (#2299)
The issue we fixed in https://github.com/lancedb/lancedb/pull/2296 was
caused by an upgrade in dependencies. This could have been caught if we
had run these CI jobs when we did the dependency change.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated our automated pipeline to trigger additional stability checks
when dependency configurations change, ensuring smoother build and
release processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-03 16:19:00 -07:00
Lance Release
0844c2dd64 Updating package-lock.json 2025-04-02 21:23:50 +00:00
Lance Release
fd2692295c Updating package-lock.json 2025-04-02 21:23:34 +00:00
Lance Release
d4ea50fba1 Bump version: 0.19.0-beta.3 → 0.19.0-beta.4 2025-04-02 21:23:19 +00:00
Lance Release
0d42297cf8 Bump version: 0.22.0-beta.3 → 0.22.0-beta.4 2025-04-02 21:23:02 +00:00
Weston Pace
a6d4125cbf feat: upgrade lance to 0.25.3b2 (#2304)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
	- Updated core dependency versions to v0.25.3-beta.2.
	- Enabled additional functionality with a new "dynamodb" feature.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-02 14:22:30 -07:00
Lance Release
5c32a99e61 Updating package-lock.json 2025-04-02 09:28:46 +00:00
Lance Release
cefaa75b24 Updating package-lock.json 2025-04-02 09:28:30 +00:00
Lance Release
bd62c2384f Bump version: 0.19.0-beta.2 → 0.19.0-beta.3 2025-04-02 09:28:14 +00:00
Lance Release
f0bc08c0d7 Bump version: 0.22.0-beta.2 → 0.22.0-beta.3 2025-04-02 09:27:55 +00:00
BubbleCal
e52ac79c69 fix: can't do structured FTS in python (#2300)
missed to support it in `search()` API and there were some pydantic
errors

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced full-text search capabilities by incorporating additional
parameters, enabling more flexible query definitions.
- Extended table search functionality to support full-text queries
alongside existing search types.

- **Tests**
- Introduced new tests that validate both structured and conditional
full-text search behaviors.
- Expanded test coverage for various query types, including MatchQuery,
BoostQuery, MultiMatchQuery, and PhraseQuery.

- **Bug Fixes**
- Fixed a logic issue in query processing to ensure correct handling of
full-text search queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-02 17:27:15 +08:00
Will Jones
f091f57594 ci: fix lancedb musl builds (#2296)
Fixes #2255


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Enhanced the build process to improve performance and reliability
across Linux platforms.
  - Updated environment settings for more accurate compiler integration.
- Activated previously inactive build configurations to support advanced
feature support.
- Added support for the x86_64 architecture on Linux systems utilizing
the musl C library.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 14:44:27 -07:00
Lance Release
a997fd4108 Updating package-lock.json 2025-04-01 17:28:57 +00:00
Lance Release
1486514ccc Updating package-lock.json 2025-04-01 17:28:40 +00:00
Lance Release
a505bc3965 Bump version: 0.19.0-beta.1 → 0.19.0-beta.2 2025-04-01 17:28:21 +00:00
Lance Release
c1738250a3 Bump version: 0.22.0-beta.1 → 0.22.0-beta.2 2025-04-01 17:27:57 +00:00
Weston Pace
1ee63984f5 feat: allow FSB to be used for btree indices (#2297)
We recently allowed this for lance but there was a check in lancedb as
well that was preventing it

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for indexing fixed-size binary data using B-tree
structures for efficient data storage and retrieval.
- **Tests**
- Implemented automated tests to ensure the new binary indexing works
correctly and meets the expected configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 10:27:22 -07:00
Lance Release
2eb2c8862a Updating package-lock.json 2025-04-01 14:27:26 +00:00
Lance Release
4ea8e178d3 Updating package-lock.json 2025-04-01 14:27:07 +00:00
Lance Release
e4485a630e Bump version: 0.19.0-beta.0 → 0.19.0-beta.1 2025-04-01 14:26:47 +00:00
Lance Release
fb95f9b3bd Bump version: 0.22.0-beta.0 → 0.22.0-beta.1 2025-04-01 14:26:28 +00:00
Weston Pace
625bab3f21 feat: update to lance 0.25.3b1 (#2294)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependency versions for improved performance and
compatibility.

- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.

- **Bug Fixes**
  - Improved error handling and reporting for full-text search queries.

- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
2025-04-01 06:36:42 -07:00
Will Jones
e59f9382a0 ci: deprecate vectordb each release (#2292)
I released each time we published, the new package was no longer
deprecated. This re-deprecated the package after a new publish.
2025-03-31 12:03:04 -07:00
Lance Release
fdee7ba477 Updating package-lock.json 2025-03-30 19:09:17 +00:00
Lance Release
c44fa3abc4 Updating package-lock.json 2025-03-30 18:05:07 +00:00
Lance Release
fc43aac0ed Updating package-lock.json 2025-03-30 18:04:51 +00:00
Lance Release
e67cd0baf9 Bump version: 0.18.3-beta.0 → 0.19.0-beta.0 2025-03-30 18:04:32 +00:00
Lance Release
26dab93f2a Bump version: 0.21.3-beta.0 → 0.22.0-beta.0 2025-03-30 18:04:14 +00:00
LuQQiu
b9bdb8d937 fix: fix remote restore api to always checkout latest version (#2291)
Fix restore to always checkout latest version, following local restore
api implementation

a1d1833a40/rust/lancedb/src/table.rs (L1910)
Otherwise
table.create_table -> version 1
table.add_table -> version 2
table.checkout(1), table.restore() -> the version remains at 1 (should
checkout_latest inside restore method to update version to latest
version and allow write operation)
table.checkout_latest() -> version is 3
can do write operations
2025-03-29 22:46:57 -07:00
LuQQiu
a1d1833a40 feat: add analyze_plan api (#2280)
add analyze plan api to allow executing the queries and see runtime
metrics.
Which help identify the query IO overhead and help identify query
slowness
2025-03-28 14:28:52 -07:00
Will Jones
a547c523c2 feat!: change default read_consistency_interval=5s (#2281)
Previously, when we loaded the next version of the table, we would block
all reads with a write lock. Now, we only do that if
`read_consistency_interval=0`. Otherwise, we load the next version
asynchronously in the background. This should mean that
`read_consistency_interval > 0` won't have a meaningful impact on
latency.

Along with this change, I felt it was safe to change the default
consistency interval to 5 seconds. The current default is `None`, which
means we will **never** check for a new version by default. I think that
default is contrary to most users expectations.
2025-03-28 11:04:31 -07:00
Lance Release
dc8b75feab Updating package-lock.json 2025-03-28 17:15:17 +00:00
Lance Release
c1600cdc06 Updating package-lock.json 2025-03-28 16:04:01 +00:00
Lance Release
f5dee46970 Updating package-lock.json 2025-03-28 16:03:46 +00:00
Lance Release
346cbf8bf7 Bump version: 0.18.2-beta.0 → 0.18.3-beta.0 2025-03-28 16:03:31 +00:00
Lance Release
3c7dfe9f28 Bump version: 0.21.2-beta.0 → 0.21.3-beta.0 2025-03-28 16:03:17 +00:00
Lei Xu
f52d05d3fa feat: add columns using pyarrow schema (#2284) 2025-03-28 08:51:50 -07:00
vinoyang
c321cccc12 chore(java): make rust release to be a switch option (#2277) 2025-03-28 11:26:24 +08:00
LuQQiu
cba14a5743 feat: add restore remote api (#2282) 2025-03-27 16:33:52 -07:00
vinoyang
72057b743d chore(java): introduce spotless plugin (#2278) 2025-03-27 10:38:39 +08:00
LuQQiu
698f329598 feat: add explain plan remote api (#2263)
Add explain plan remote api
2025-03-26 11:22:40 -07:00
BubbleCal
79fa745130 feat: upgrade lance to v0.25.1-beta.3 (#2276)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-26 23:14:27 +08:00
vinoyang
2ad71bdeca fix(java): make test work for jdk8 (#2269) 2025-03-25 10:57:49 -07:00
vinoyang
7c13615096 fix(java): add .gitignore file (#2270) 2025-03-25 10:56:08 -07:00
Wyatt Alt
f882f5b69a fix: update Query pydoc (#2273)
Removes reference of nonexistent method.
2025-03-25 08:50:23 -07:00
Benjamin Clavié
a68311a893 fix: answerdotai rerankers argument passing (#2117)
This fixes an issue for people wishing to use different kinds of
rerankers in lancedb via AnswerDotAI rerankers. Currently, the arguments
are passed sequentially, but they don't match the[Reranker class
implementation](d604a8c47d/rerankers/reranker.py (L179)):
the second argument is expected to be an optional "lang" for default
models, while model_type should be passed explicitly.

The one line changes in this PR fixes it and enables the use of other
methods (eg LLMs-as-rerankers)
2025-03-24 12:31:59 +05:30
Ayush Chaurasia
846a5cea33 fix: handle light and dark mode logo (#2265) 2025-03-22 10:21:05 -07:00
QianZhu
e3dec647b5 docs: replace banner as an image (#2262) 2025-03-21 18:35:35 -07:00
QianZhu
c58104cecc docs: add banner for LanceDB Cloud in public beta (#2261) 2025-03-21 17:54:34 -07:00
QianZhu
b3b5362632 docs: replace Lancedb Cloud link (#2259)
* direct users to cloud.lancedb.com since LanceDB Cloud is in public
beta
* removed the `cast vector dimension` from alter columns as we don't
support it
2025-03-21 17:43:00 -07:00
Will Jones
abe06fee3d feat(python): warn on fork (#2258)
Closes #768
2025-03-21 17:18:10 -07:00
Will Jones
93a82fd371 ci: allow dry run on PR to Python release (#2245)
This just makes it easier to test in the future.
2025-03-21 16:14:32 -07:00
Will Jones
0d379e6ffa ci(node): setup URL so auth token is picked up (#2257)
Should fix failure seen here:
https://github.com/lancedb/lancedb/actions/runs/13999958170/job/39207039825
2025-03-21 16:14:24 -07:00
Lance Release
e1388bdfdd Updating package-lock.json 2025-03-21 20:46:53 +00:00
Lance Release
315a24c2bc Updating package-lock.json 2025-03-21 20:03:43 +00:00
Lance Release
6dd4cf6038 Updating package-lock.json 2025-03-21 20:03:27 +00:00
Lance Release
f97e751b3c Bump version: 0.18.1 → 0.18.2-beta.0 2025-03-21 20:02:59 +00:00
Lance Release
e803a626a1 Bump version: 0.21.1 → 0.21.2-beta.0 2025-03-21 20:02:25 +00:00
Weston Pace
9403254442 feat: add to_query_object method (#2239)
This PR adds a `to_query_object` method to the various query builders
(except not hybrid queries yet). This makes it possible to inspect the
query that is built.

In addition this PR does some normalization between the sync and async
query paths. A few custom defaults were removed in favor of None (with
the default getting set once, in rust).

Also, the synchronous to_batches method will now actually stream results

Also, the remote API now defaults to prefiltering
2025-03-21 13:01:51 -07:00
Will Jones
b2a38ac366 fix: make pylance optional again (#2209)
The two remaining blockers were:

* A method `with_embeddings` that was deprecated a year ago
* A typecheck for `LanceDataset`
2025-03-21 11:26:32 -07:00
BubbleCal
bdb6c09c3b feat: support binary vector and IVF_FLAT in TypeScript (#2221)
resolve #2218

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:57:08 -07:00
Will Jones
2bfdef2624 ci: refactor node releases (#2223)
This PR fixes build issues associated with `aws-lc-rs`, while
simplifying the build process. Previously, we used custom scripts for
the musl and Windows ARM builds. These were complicated and prone to
breaking. This PR switches to a setup that mirrors
https://github.com/napi-rs/package-template/blob/main/.github/workflows/CI.yml.

* linux glibc and musl builds now use the Docker images provided by the
napi project
* Windows ARM build now just cross compiles from Windows x64, which
turns out to work quite well.
2025-03-21 10:56:29 -07:00
Samuel Colvin
7982d5c082 fix: correct rust install docs (#2253)
I'm pretty sure you mean `cargo add lancedb` here, `cargo install
lancedb` fails right now.
2025-03-21 10:12:53 -07:00
BubbleCal
7ff6ec7fe3 feat: upgrade to lance v0.25.0-beta.5 (#2248)
- adds `loss` into the index stats for vector index
- now `optimize` can retrain the vector index

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:12:23 -07:00
Ayush Chaurasia
ba1ded933a fix: add better check for empty results in hybrid search (#2252)
fixes: https://github.com/lancedb/lancedb/issues/2249
2025-03-21 13:05:05 +05:30
Will Jones
b595d8a579 fix(nodejs): workaround for apache-arrow null vector issue (#2244)
Fixes #2240
2025-03-20 08:07:10 -07:00
Will Jones
2a1d6d8abf ci: simplify windows builds (#2243)
We soon won't rely on cross compiling from Linux to windows, so can
remove this check. Instead, check that we can cross compile from Windows
between architectures.
2025-03-20 08:06:56 -07:00
Will Jones
440a466a13 ci: remove OpenSSL as dependency in favor of rustls (#2242)
`object_store` already hard codes `rustls` as the TLS implementation, so
we have been shipping a mix of `rustls` and `openssl`. For simplicity of
builds, we should consolidate to one, and that has to be `rustls`.
2025-03-20 08:06:45 -07:00
Ayush Chaurasia
b9afd9c860 docs: add late interaction, multi-vector guide & link example (#2231)
1/2 docs update for this week. Addesses issues from this docs epic -
https://github.com/lancedb/lancedb/issues/1476
2025-03-20 20:29:32 +05:30
Will Jones
a6b6f6a806 ci: drop vectordb support for musl, windows ARM (#2241)
vectordb is deprecated, and these platforms are particularly difficult
to maintain. Removing now to prevent further headaches.

We will keep these platforms supported on `@lancedb/lancedb`.
2025-03-19 12:23:46 -07:00
Ayush Chaurasia
ae1548b507 docs: add cloud & enterprise cta (#2235)
2/2 docs update this week
- Add cloud & enterprise CTA
- remove outdated projects/examples from landing page
2025-03-19 10:55:05 -07:00
Weston Pace
4e03ee82bc refactor: rework catalog/database options (#2213)
The `ConnectRequest` has a set of properties that only make sense for
listing databases / catalogs and a set of properties that only make
sense for remote databases.

This PR reduces all options to a single `HashMap<String, String>`. This
makes it easier to add new database / catalog implementations and makes
it clearer to users which options are applicable in which situations.

I don't believe there are any breaking changes here. The closest thing
is that I placed the `ConnectBuilder` methods `api_key`, `region`, and
`host_override` behind a `remote` feature gate. This is not strictly
needed and I could remove the feature gate but it seemed appropriate.
Since using these methods without the remote feature would have been
meaningless I don't feel this counts as a breaking change.

We could look at removing these methods entirely from the
`ConnectBuilder` (and encouraging users to use `RemoteDatabaseOptions`
instead) but I'm not sure how I feel about that.

Another approach we could take is to move these methods into a
`RemoteConnectBuilderExt` trait (and there could be a similar
`ListingConnectBuilderExt` trait to add methods for the listing database
/ catalog).

For now though my main goal is to simplify `ConnectRequest` as much as
possible (I see this being part of the key public API for database /
catalog integrations, similar to the `BaseTable`, `Catalog`, and
`Database` traits and I'd like it to be simple).
2025-03-18 10:13:59 -07:00
Weston Pace
46a6846d07 refactor: remove dataset reference from base table (#2226) 2025-03-17 06:27:33 -07:00
Will Jones
a207213358 fix: insert structs in non-alphabetical order (#2222)
Closes #2114

Starting in #1965, we no longer pass the table schema into
`pa.Table.from_pylist()`. This means PyArrow is choosing the order of
the struct subfields, and apparently it does them in alphabetical order.
This is fine in theory, since in Lance we support providing fields in
any order. However, before we pass it to Lance, we call
`pa.Table.cast()` to align column types to the table types.
`pa.Table.cast()` is strict about field order, so we need to create a
cast target schema that aligns with the input data. We were doing this
at the top-level fields, but weren't doing this in nested fields. This
PR adds support to do this for nested ones.
2025-03-13 14:46:05 -07:00
BubbleCal
6c321c694a feat: upgrade lance to 0.25.0-beta2 (#2220)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-13 14:12:54 -07:00
Bob Liu
5c00b2904c feat: add get dataset method on NativeTable (#2021)
I want to public the dataset method from native table, then I can use
more lance method like order_by which is not exposed in the lancedb
crate.
2025-03-13 11:15:28 -07:00
Gagan Bhullar
14677d7c18 fix: metric type inconsistency (#2122)
PR fixes #2113

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-03-12 10:28:37 -07:00
Martin Schorfmann
dd22a379b2 fix: use Self return type annotation for abstract query builder (#2127)
Hello LanceDB team,

while developing using `lancedb` as a library I encountered a typing
problem affecting IDE hints and completions during development.

---

## Current Situation

Currently, the abstract base class `lancedb.query:LanceQueryBuilder`
uses method chaining to build up the search parameters, where the
methods have `LanceQueryBuilder` as a return type hint.

This leads to two issues:
1. Implementing subclasses of `LanceQueryBuilder` need to override
methods to modify the return type hint, even when they don't need to
change its implementation, just to ensure adequate IDE hints and
completions.
2. When using method chaining the first method directly inherited from
the abstract `LanceQueryBuilder` causes the inferred type to switch back
to `LanceQueryBuilder`. So even when the type starts from
`lancdb.table:LanceTable.search(query_type="vector", ...)` and therefor
correctly is inferred as `LanceVectorQueryBuilder`, after calling e.g.
`LanceVectorQueryBuilder.limit(...)` it is seen as the abstract
`LanceQueryBuilder` from that point on.

### Example of current situation


![image](https://github.com/user-attachments/assets/09678727-8722-43bd-a8a2-67d9b5fc0db5)

## Proposed changes

I propose to change the return type hints of the corresponding methods
(including classmethod `create()`) in the abstract base class
`LanceQueryBuilder` from `LanceQueryBuilder` to `Self`.
`Self` is already imported in the module:

```py
    if sys.version_info >= (3, 11):
        from typing import Self
    else:
        from typing_extensions import Self
```

### Further possible changes

Additionally, the implementing subclasses could also change the return
type hints to `Self` to potentially allow for further inheritance
easily.
> [!NOTE]
> **However this is not part of this pull request as of writing.**

### Example after proposed changes


![image](https://github.com/user-attachments/assets/a9aea636-e426-477a-86ee-2dad3af2876f)

---

Best regards
Martin
2025-03-12 10:08:25 -07:00
Will Jones
7747c9bcbf feat(node): parse arrow types in alterColumns() (#2208)
Previously, users could only specify new data types in `alterColumns` as
strings:

```ts
await tbl.alterColumns([
  path: "price",
  dataType: "float"
]);
```

But this has some problems:

1. It wasn't clear what were valid types
2. It was impossible to specify nested types, like lists and vector
columns.

This PR changes it to take an Arrow data type, similar to how the Python
API works. This allows casting vector types:

```ts
await tbl.alterColumns([
  {
    path: "vector",
    dataType: new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float16(), false),
    ),
  },
]);
```

Closes #2185
2025-03-12 09:57:36 -07:00
QianZhu
c9d6fc43a6 docs: use bypass_vector_index() instead of use_index=false (#2115) 2025-03-12 09:31:09 -07:00
Martin Schorfmann
581bcfbb88 docs: fix docstring of EmbeddingFunction (#2118)
Hello LanceDB team,

---

I have fixed a discrepancy in the class docstring of
`lancedb.embeddings.base:EmbeddingFunction` and made consistency
alignments to that docstring.

### Changes made

1. The docstring referred to the abstract method
`get_source_embeddings()`.
  This method does not exist in the repository at the current state.
I have changed the mention to refer to the actual abstract method
`compute_source_embeddings()`.
2. Also, I aligned the consistency within the ordered list which is
describing the methods to be implemented by concrete embedding
functions.

---

Thank you for developing this useful library. 👍

Best regards
Martin
2025-03-12 09:30:01 -07:00
vinoyang
3750639b5f feat(rust): add connect_catalog method to support connect catalog via url (#2177) 2025-03-12 05:19:03 -07:00
Lance Release
e744d54460 Updating package-lock.json 2025-03-11 14:00:55 +00:00
Lance Release
9d1ce4b5a5 Updating package-lock.json 2025-03-11 13:15:18 +00:00
Lance Release
729ce5e542 Updating package-lock.json 2025-03-11 13:15:03 +00:00
Lance Release
de6739e7ec Bump version: 0.18.1-beta.0 → 0.18.1 2025-03-11 13:14:49 +00:00
Lance Release
495216efdb Bump version: 0.18.0 → 0.18.1-beta.0 2025-03-11 13:14:44 +00:00
Lance Release
a3b45a4d00 Bump version: 0.21.1-beta.0 → 0.21.1 2025-03-11 13:14:30 +00:00
Lance Release
c316c2f532 Bump version: 0.21.0 → 0.21.1-beta.0 2025-03-11 13:14:29 +00:00
Weston Pace
3966b16b63 fix: restore pylance as mandatory dependency (#2204)
We attempted to make pylance optional in
https://github.com/lancedb/lancedb/pull/2156 but it appears this did not
quite work. Users are unable to use lancedb from a fresh install. This
reverts the optional-ness so we can get back in a working state while we
fix the issue.
2025-03-11 06:13:52 -07:00
Lance Release
5661cc15ac Updating package-lock.json 2025-03-10 23:53:56 +00:00
Lance Release
4e7220400f Updating package-lock.json 2025-03-10 23:13:52 +00:00
Lance Release
ae4928fe77 Updating package-lock.json 2025-03-10 23:13:36 +00:00
Lance Release
e80a405dee Bump version: 0.18.0-beta.1 → 0.18.0 2025-03-10 23:13:18 +00:00
Lance Release
a53e19e386 Bump version: 0.18.0-beta.0 → 0.18.0-beta.1 2025-03-10 23:13:13 +00:00
Lance Release
c0097c5f0a Bump version: 0.21.0-beta.2 → 0.21.0 2025-03-10 23:12:56 +00:00
Lance Release
c199708e64 Bump version: 0.21.0-beta.1 → 0.21.0-beta.2 2025-03-10 23:12:56 +00:00
Weston Pace
4a47150ae7 feat: upgrade to lance 0.24.1 (#2199) 2025-03-10 15:18:37 -07:00
Wyatt Alt
f86b20a564 fix: delete tables from DDB on drop_all_tables (#2194)
Prior to this commit, issuing drop_all_tables on a listing database with
an external manifest store would delete physical tables but leave
references behind in the manifest store. The table drop would succeed,
but subsequent creation of a table with the same name would fail with a
conflict.

With this patch, the external manifest store is updated to account for
the dropped tables so that dropped table names can be reused.
2025-03-10 15:00:53 -07:00
msu-reevo
cc81f3e1a5 fix(python): typing (#2167)
@wjones127 is there a standard way you guys setup your virtualenv? I can
either relist all the dependencies in the pyright precommit section, or
specify a venv, or the user has to be in the virtual environment when
they run git commit. If the venv location was standardized or a python
manager like `uv` was used it would be easier to avoid duplicating the
pyright dependency list.

Per your suggestion, in `pyproject.toml` I added in all the passing
files to the `includes` section.

For ruff I upgraded the version and removed "TCH" which doesn't exist as
an option.

I added a `pyright_report.csv` which contains a list of all files sorted
by pyright errors ascending as a todo list to work on.

I fixed about 30 issues in `table.py` stemming from str's being passed
into methods that required a string within a set of string Literals by
extracting them into `types.py`

Can you verify in the rust bridge that the schema should be a property
and not a method here? If it's a method, then there's another place in
the code where `inner.schema` should be `inner.schema()`
``` python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
```

Also unless the `_lancedb.pyi` file is wrong, then there is no
`__anext__` here for `__inner` when it's not an `AsyncGenerator` and
only `next` is defined:
``` python
    async def __anext__(self) -> pa.RecordBatch:
        return await self._inner.__anext__()
        if isinstance(self._inner, AsyncGenerator):
            batch = await self._inner.__anext__()
        else:
            batch = await self._inner.next()
        if batch is None:
            raise StopAsyncIteration
        return batch
```
in the else statement, `_inner` is a `RecordBatchStream`
```python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
    async def next(self) -> Optional[pa.RecordBatch]: ...
```

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-03-10 09:01:23 -07:00
Weston Pace
bc49c4db82 feat: respect datafusion's batch size when running as a table provider (#2187)
Datafusion makes the batch size available as part of the `SessionState`.
We should use that to set the `max_batch_length` property in the
`QueryExecutionOptions`.
2025-03-07 05:53:36 -08:00
Weston Pace
d2eec46f17 feat: add support for streaming input to create_table (#2175)
This PR makes it possible to create a table using an asynchronous stream
of input data. Currently only a synchronous iterator is supported. There
are a number of follow-ups not yet tackled:

* Support for embedding functions (the embedding functions wrapper needs
to be re-written to be async, should be an easy lift)
* Support for async input into the remote table (the make_ipc_batch
needs to change to accept async input, leaving undone for now because I
think we want to support actual streaming uploads into the remote table
soon)
* Support for async input into the add function (pretty essential, but
it is a fairly distinct code path, so saving for a different PR)
2025-03-06 11:55:00 -08:00
168 changed files with 9297 additions and 2979 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.18.0-beta.0"
current_version = "0.19.0-beta.5"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.
@@ -87,26 +87,11 @@ glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-arm64-musl\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-arm64-musl\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-musl\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-musl\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-x64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-x64-msvc\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-arm64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-arm64-msvc\": \"{current_version}\""
# Cargo files
# ------------
[[tool.bumpversion.files]]

View File

@@ -34,6 +34,10 @@ rustflags = ["-C", "target-cpu=haswell", "-C", "target-feature=+avx2,+fma,+f16c"
[target.x86_64-unknown-linux-musl]
rustflags = ["-C", "target-cpu=haswell", "-C", "target-feature=-crt-static,+avx2,+fma,+f16c"]
[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"
rustflags = ["-C", "target-feature=-crt-static"]
[target.aarch64-apple-darwin]
rustflags = ["-C", "target-cpu=apple-m1", "-C", "target-feature=+neon,+fp16,+fhm,+dotprod"]
@@ -44,4 +48,4 @@ rustflags = ["-Ctarget-feature=+crt-static"]
# Experimental target for Arm64 Windows
[target.aarch64-pc-windows-msvc]
rustflags = ["-Ctarget-feature=+crt-static"]
rustflags = ["-Ctarget-feature=+crt-static"]

View File

@@ -36,8 +36,7 @@ runs:
args: ${{ inputs.args }}
before-script-linux: |
set -e
yum install -y openssl-devel \
&& curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-$(uname -m).zip > /tmp/protoc.zip \
curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-$(uname -m).zip > /tmp/protoc.zip \
&& unzip /tmp/protoc.zip -d /usr/local \
&& rm /tmp/protoc.zip
- name: Build Arm Manylinux Wheel
@@ -52,7 +51,7 @@ runs:
args: ${{ inputs.args }}
before-script-linux: |
set -e
yum install -y openssl-devel clang \
yum install -y clang \
&& curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-aarch_64.zip > /tmp/protoc.zip \
&& unzip /tmp/protoc.zip -d /usr/local \
&& rm /tmp/protoc.zip

View File

@@ -43,7 +43,7 @@ jobs:
- uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: "1.79.0"
toolchain: "1.81.0"
cache-workspaces: "./java/core/lancedb-jni"
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
@@ -97,7 +97,7 @@ jobs:
- name: Dry run
if: github.event_name == 'pull_request'
run: |
mvn --batch-mode -DskipTests package
mvn --batch-mode -DskipTests -Drust.release.build=true package
- name: Set github
run: |
git config --global user.email "LanceDB Github Runner"
@@ -108,7 +108,7 @@ jobs:
echo "use-agent" >> ~/.gnupg/gpg.conf
echo "pinentry-mode loopback" >> ~/.gnupg/gpg.conf
export GPG_TTY=$(tty)
mvn --batch-mode -DskipTests -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
mvn --batch-mode -DskipTests -Drust.release.build=true -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
env:
SONATYPE_USER: ${{ secrets.SONATYPE_USER }}
SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }}

File diff suppressed because it is too large Load Diff

View File

@@ -4,6 +4,11 @@ on:
push:
tags:
- 'python-v*'
pull_request:
# This should trigger a dry run (we skip the final publish step)
paths:
- .github/workflows/pypi-publish.yml
- Cargo.toml # Change in dependency frequently breaks builds
jobs:
linux:
@@ -46,6 +51,7 @@ jobs:
arm-build: ${{ matrix.config.platform == 'aarch64' }}
manylinux: ${{ matrix.config.manylinux }}
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
@@ -75,6 +81,7 @@ jobs:
python-minor-version: 8
args: "--release --strip --target ${{ matrix.config.target }} --features fp16kernels"
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
@@ -96,10 +103,12 @@ jobs:
args: "--release --strip"
vcpkg_token: ${{ secrets.VCPKG_GITHUB_PACKAGES }}
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
gh-release:
if: startsWith(github.ref, 'refs/tags/python-v')
runs-on: ubuntu-latest
permissions:
contents: write

View File

@@ -13,6 +13,11 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# Color output for pytest is off by default.
PYTEST_ADDOPTS: "--color=yes"
FORCE_COLOR: "1"
jobs:
lint:
name: "Lint"
@@ -33,11 +38,41 @@ jobs:
python-version: "3.12"
- name: Install ruff
run: |
pip install ruff==0.8.4
pip install ruff==0.9.9
- name: Format check
run: ruff format --check .
- name: Lint
run: ruff check .
type-check:
name: "Type Check"
timeout-minutes: 30
runs-on: "ubuntu-22.04"
defaults:
run:
shell: bash
working-directory: python
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install protobuf compiler
run: |
sudo apt update
sudo apt install -y protobuf-compiler
pip install toml
- name: Install dependencies
run: |
python ../ci/parse_requirements.py pyproject.toml --extras dev,tests,embeddings > requirements.txt
pip install -r requirements.txt
- name: Run pyright
run: pyright
doctest:
name: "Doctest"
timeout-minutes: 30
@@ -101,6 +136,10 @@ jobs:
- uses: ./.github/workflows/run_tests
with:
integration: true
- name: Test without pylance
run: |
pip uninstall -y pylance
pytest -vv python/tests/test_table.py
# Make sure wheels are not included in the Rust cache
- name: Delete wheels
run: rm -rf target/wheels

View File

@@ -157,153 +157,33 @@ jobs:
windows:
runs-on: windows-2022
strategy:
matrix:
target:
- x86_64-pc-windows-msvc
- aarch64-pc-windows-msvc
defaults:
run:
working-directory: rust/lancedb
steps:
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install Protoc v21.12
working-directory: C:\
run: choco install --no-progress protoc
- name: Build
run: |
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
7z x protoc.zip
Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
rustup target add ${{ matrix.target }}
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo build --features remote --tests --locked --target ${{ matrix.target }}
- name: Run tests
# Can only run tests when target matches host
if: ${{ matrix.target == 'x86_64-pc-windows-msvc' }}
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo test --features remote --locked
windows-arm64-cross:
# We cross compile in Node releases, so we want to make sure
# this can run successfully.
runs-on: ubuntu-latest
container: alpine:edge
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependencies (part 1)
run: |
set -e
apk add protobuf-dev curl clang lld llvm19 grep npm bash msitools sed
- name: Install rust
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
target: aarch64-pc-windows-msvc
- name: Install dependencies (part 2)
run: |
set -e
mkdir -p sysroot
cd sysroot
sh ../ci/sysroot-aarch64-pc-windows-msvc.sh
- name: Check
env:
CC: clang
AR: llvm-ar
C_INCLUDE_PATH: /usr/aarch64-pc-windows-msvc/usr/include
CARGO_BUILD_TARGET: aarch64-pc-windows-msvc
RUSTFLAGS: -Ctarget-feature=+crt-static,+neon,+fp16,+fhm,+dotprod -Clinker=lld -Clink-arg=/LIBPATH:/usr/aarch64-pc-windows-msvc/usr/lib -Clink-arg=arm64rt.lib
run: |
source $HOME/.cargo/env
cargo check --features remote --locked
windows-arm64:
runs-on: windows-4x-arm
steps:
- name: Install Git
run: |
Invoke-WebRequest -Uri "https://github.com/git-for-windows/git/releases/download/v2.44.0.windows.1/Git-2.44.0-64-bit.exe" -OutFile "git-installer.exe"
Start-Process -FilePath "git-installer.exe" -ArgumentList "/VERYSILENT", "/NORESTART" -Wait
shell: powershell
- name: Add Git to PATH
run: |
Add-Content $env:GITHUB_PATH "C:\Program Files\Git\bin"
$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
shell: powershell
- name: Configure Git symlinks
run: git config --global core.symlinks true
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.13"
- name: Install Visual Studio Build Tools
run: |
Invoke-WebRequest -Uri "https://aka.ms/vs/17/release/vs_buildtools.exe" -OutFile "vs_buildtools.exe"
Start-Process -FilePath "vs_buildtools.exe" -ArgumentList "--quiet", "--wait", "--norestart", "--nocache", `
"--installPath", "C:\BuildTools", `
"--add", "Microsoft.VisualStudio.Component.VC.Tools.ARM64", `
"--add", "Microsoft.VisualStudio.Component.VC.Tools.x86.x64", `
"--add", "Microsoft.VisualStudio.Component.Windows11SDK.22621", `
"--add", "Microsoft.VisualStudio.Component.VC.ATL", `
"--add", "Microsoft.VisualStudio.Component.VC.ATLMFC", `
"--add", "Microsoft.VisualStudio.Component.VC.Llvm.Clang" -Wait
shell: powershell
- name: Add Visual Studio Build Tools to PATH
run: |
$vsPath = "C:\BuildTools\VC\Tools\MSVC"
$latestVersion = (Get-ChildItem $vsPath | Sort-Object {[version]$_.Name} -Descending)[0].Name
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\MSVC\$latestVersion\bin\Hostx64\arm64"
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\MSVC\$latestVersion\bin\Hostx64\x64"
Add-Content $env:GITHUB_PATH "C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\arm64"
Add-Content $env:GITHUB_PATH "C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64"
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\Llvm\x64\bin"
# Add MSVC runtime libraries to LIB
$env:LIB = "C:\BuildTools\VC\Tools\MSVC\$latestVersion\lib\arm64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\um\arm64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\ucrt\arm64"
Add-Content $env:GITHUB_ENV "LIB=$env:LIB"
# Add INCLUDE paths
$env:INCLUDE = "C:\BuildTools\VC\Tools\MSVC\$latestVersion\include;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\um;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared"
Add-Content $env:GITHUB_ENV "INCLUDE=$env:INCLUDE"
shell: powershell
- name: Install Rust
run: |
Invoke-WebRequest https://win.rustup.rs/x86_64 -OutFile rustup-init.exe
.\rustup-init.exe -y --default-host aarch64-pc-windows-msvc --default-toolchain 1.83.0
shell: powershell
- name: Add Rust to PATH
run: |
Add-Content $env:GITHUB_PATH "$env:USERPROFILE\.cargo\bin"
shell: powershell
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install 7-Zip ARM
run: |
New-Item -Path 'C:\7zip' -ItemType Directory
Invoke-WebRequest https://7-zip.org/a/7z2408-arm64.exe -OutFile C:\7zip\7z-installer.exe
Start-Process -FilePath C:\7zip\7z-installer.exe -ArgumentList '/S' -Wait
shell: powershell
- name: Add 7-Zip to PATH
run: Add-Content $env:GITHUB_PATH "C:\Program Files\7-Zip"
shell: powershell
- name: Install Protoc v21.12
working-directory: C:\
run: |
if (Test-Path 'C:\protoc') {
Write-Host "Protoc directory exists, skipping installation"
return
}
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
& 'C:\Program Files\7-Zip\7z.exe' x protoc.zip
shell: powershell
- name: Add Protoc to PATH
run: Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
- name: Run tests
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo test --target aarch64-pc-windows-msvc --features remote --locked
msrv:
# Check the minimum supported Rust version
name: MSRV Check - Rust v${{ matrix.msrv }}

View File

@@ -1,21 +1,27 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.8.4
rev: v0.9.9
hooks:
- id: ruff
- repo: local
hooks:
- id: local-biome-check
name: biome check
entry: npx @biomejs/biome@1.8.3 check --config-path nodejs/biome.json nodejs/
language: system
types: [text]
files: "nodejs/.*"
exclude: nodejs/lancedb/native.d.ts|nodejs/dist/.*|nodejs/examples/.*
- id: ruff
# - repo: https://github.com/RobertCraigie/pyright-python
# rev: v1.1.395
# hooks:
# - id: pyright
# args: ["--project", "python"]
# additional_dependencies: [pyarrow-stubs]
- repo: local
hooks:
- id: local-biome-check
name: biome check
entry: npx @biomejs/biome@1.8.3 check --config-path nodejs/biome.json nodejs/
language: system
types: [text]
files: "nodejs/.*"
exclude: nodejs/lancedb/native.d.ts|nodejs/dist/.*|nodejs/examples/.*

1685
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -21,32 +21,32 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.24.0", "features" = [
lance = { "version" = "=0.25.3", "features" = [
"dynamodb",
], git = "https://github.com/lancedb/lance.git", tag = "v0.24.0-beta.1" }
lance-io = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.24.0", tag = "v0.24.0-beta.1", git = "https://github.com/lancedb/lance.git" }
], tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-io = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-index = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-linalg = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-table = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-testing = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-datafusion = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
lance-encoding = { version = "=0.25.3", tag = "v0.25.3-beta.4", git = "https://github.com/lancedb/lance" }
# Note that this one does not include pyarrow
arrow = { version = "53.2", optional = false }
arrow-array = "53.2"
arrow-data = "53.2"
arrow-ipc = "53.2"
arrow-ord = "53.2"
arrow-schema = "53.2"
arrow-arith = "53.2"
arrow-cast = "53.2"
arrow = { version = "54.1", optional = false }
arrow-array = "54.1"
arrow-data = "54.1"
arrow-ipc = "54.1"
arrow-ord = "54.1"
arrow-schema = "54.1"
arrow-arith = "54.1"
arrow-cast = "54.1"
async-trait = "0"
datafusion = { version = "44.0", default-features = false }
datafusion-catalog = "44.0"
datafusion-common = { version = "44.0", default-features = false }
datafusion-execution = "44.0"
datafusion-expr = "44.0"
datafusion-physical-plan = "44.0"
datafusion = { version = "46.0", default-features = false }
datafusion-catalog = "46.0"
datafusion-common = { version = "46.0", default-features = false }
datafusion-execution = "46.0"
datafusion-expr = "46.0"
datafusion-physical-plan = "46.0"
env_logger = "0.11"
half = { "version" = "=2.4.1", default-features = false, features = [
"num-traits",
@@ -72,3 +72,6 @@ base64ct = "=1.6.0"
# Workaround for: https://github.com/eira-fransham/crunchy/issues/13
crunchy = "=0.2.2"
# Workaround for: https://github.com/Lokathor/bytemuck/issues/306
bytemuck_derive = ">=1.8.1, <1.9.0"

View File

@@ -1,9 +1,17 @@
<a href="https://cloud.lancedb.com" target="_blank">
<img src="https://github.com/user-attachments/assets/92dad0a2-2a37-4ce1-b783-0d1b4f30a00c" alt="LanceDB Cloud Public Beta" width="100%" style="max-width: 100%;">
</a>
<div align="center">
<p align="center">
<img width="275" alt="LanceDB Logo" src="https://github.com/lancedb/lancedb/assets/5846846/37d7c7ad-c2fd-4f56-9f16-fffb0d17c73a">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/user-attachments/assets/ac270358-333e-4bea-a132-acefaa94040e">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/user-attachments/assets/b864d814-0d29-4784-8fd9-807297c758c0">
<img alt="LanceDB Logo" src="https://github.com/user-attachments/assets/b864d814-0d29-4784-8fd9-807297c758c0" width=300>
</picture>
**Developer-friendly, database for multimodal AI**
**Search More, Manage Less**
<a href='https://github.com/lancedb/vectordb-recipes/tree/main' target="_blank"><img alt='LanceDB' src='https://img.shields.io/badge/VectorDB_Recipes-100000?style=for-the-badge&logo=LanceDB&logoColor=white&labelColor=645cfb&color=645cfb'/></a>
<a href='https://lancedb.github.io/lancedb/' target="_blank"><img alt='lancdb' src='https://img.shields.io/badge/DOCS-100000?style=for-the-badge&logo=lancdb&logoColor=white&labelColor=645cfb&color=645cfb'/></a>

View File

@@ -1,21 +0,0 @@
#!/bin/bash
set -e
ARCH=${1:-x86_64}
# We pass down the current user so that when we later mount the local files
# into the container, the files are accessible by the current user.
pushd ci/manylinux_node
docker build \
-t lancedb-node-manylinux-$ARCH \
--build-arg="ARCH=$ARCH" \
--build-arg="DOCKER_USER=$(id -u)" \
--progress=plain \
.
popd
# We turn on memory swap to avoid OOM killer
docker run \
-v $(pwd):/io -w /io \
--memory-swap=-1 \
lancedb-node-manylinux-$ARCH \
bash ci/manylinux_node/build_lancedb.sh $ARCH

View File

@@ -1,34 +0,0 @@
# Builds the macOS artifacts (nodejs binaries).
# Usage: ./ci/build_macos_artifacts_nodejs.sh [target]
# Targets supported: x86_64-apple-darwin aarch64-apple-darwin
set -e
prebuild_rust() {
# Building here for the sake of easier debugging.
pushd rust/lancedb
echo "Building rust library for $1"
export RUST_BACKTRACE=1
cargo build --release --target $1
popd
}
build_node_binaries() {
pushd nodejs
echo "Building nodejs library for $1"
export RUST_TARGET=$1
npm run build-release
popd
}
if [ -n "$1" ]; then
targets=$1
else
targets="x86_64-apple-darwin aarch64-apple-darwin"
fi
echo "Building artifacts for targets: $targets"
for target in $targets
do
prebuild_rust $target
build_node_binaries $target
done

View File

@@ -1,5 +1,5 @@
# Many linux dockerfile with Rust, Node, and Lance dependencies installed.
# This container allows building the node modules native libraries in an
# This container allows building the node modules native libraries in an
# environment with a very old glibc, so that we are compatible with a wide
# range of linux distributions.
ARG ARCH=x86_64
@@ -9,10 +9,6 @@ FROM quay.io/pypa/manylinux_2_28_${ARCH}
ARG ARCH=x86_64
ARG DOCKER_USER=default_user
# Install static openssl
COPY install_openssl.sh install_openssl.sh
RUN ./install_openssl.sh ${ARCH} > /dev/null
# Protobuf is also installed as root.
COPY install_protobuf.sh install_protobuf.sh
RUN ./install_protobuf.sh ${ARCH}
@@ -21,7 +17,7 @@ ENV DOCKER_USER=${DOCKER_USER}
# Create a group and user, but only if it doesn't exist
RUN echo ${ARCH} && id -u ${DOCKER_USER} >/dev/null 2>&1 || adduser --user-group --create-home --uid ${DOCKER_USER} build_user
# We switch to the user to install Rust and Node, since those like to be
# We switch to the user to install Rust and Node, since those like to be
# installed at the user level.
USER ${DOCKER_USER}

View File

@@ -1,19 +0,0 @@
#!/bin/bash
# Builds the nodejs module for manylinux. Invoked by ci/build_linux_artifacts_nodejs.sh.
set -e
ARCH=${1:-x86_64}
if [ "$ARCH" = "x86_64" ]; then
export OPENSSL_LIB_DIR=/usr/local/lib64/
else
export OPENSSL_LIB_DIR=/usr/local/lib/
fi
export OPENSSL_STATIC=1
export OPENSSL_INCLUDE_DIR=/usr/local/include/openssl
#Alpine doesn't have .bashrc
FILE=$HOME/.bashrc && test -f $FILE && source $FILE
cd nodejs
npm ci
npm run build-release

View File

@@ -4,14 +4,6 @@ set -e
ARCH=${1:-x86_64}
TARGET_TRIPLE=${2:-x86_64-unknown-linux-gnu}
if [ "$ARCH" = "x86_64" ]; then
export OPENSSL_LIB_DIR=/usr/local/lib64/
else
export OPENSSL_LIB_DIR=/usr/local/lib/
fi
export OPENSSL_STATIC=1
export OPENSSL_INCLUDE_DIR=/usr/local/include/openssl
#Alpine doesn't have .bashrc
FILE=$HOME/.bashrc && test -f $FILE && source $FILE

View File

@@ -1,26 +0,0 @@
#!/bin/bash
# Builds openssl from source so we can statically link to it
# this is to avoid the error we get with the system installation:
# /usr/bin/ld: <library>: version node not found for symbol SSLeay@@OPENSSL_1.0.1
# /usr/bin/ld: failed to set dynamic section sizes: Bad value
set -e
git clone -b OpenSSL_1_1_1v \
--single-branch \
https://github.com/openssl/openssl.git
pushd openssl
if [[ $1 == x86_64* ]]; then
ARCH=linux-x86_64
else
# gnu target
ARCH=linux-aarch64
fi
./Configure no-shared $ARCH
make
make install

41
ci/parse_requirements.py Normal file
View File

@@ -0,0 +1,41 @@
import argparse
import toml
def parse_dependencies(pyproject_path, extras=None):
with open(pyproject_path, "r") as file:
pyproject = toml.load(file)
dependencies = pyproject.get("project", {}).get("dependencies", [])
for dependency in dependencies:
print(dependency)
optional_dependencies = pyproject.get("project", {}).get(
"optional-dependencies", {}
)
if extras:
for extra in extras.split(","):
for dep in optional_dependencies.get(extra, []):
print(dep)
def main():
parser = argparse.ArgumentParser(
description="Generate requirements.txt from pyproject.toml"
)
parser.add_argument("path", type=str, help="Path to pyproject.toml")
parser.add_argument(
"--extras",
type=str,
help="Comma-separated list of extras to include",
default="",
)
args = parser.parse_args()
parse_dependencies(args.path, args.extras)
if __name__ == "__main__":
main()

View File

@@ -124,6 +124,9 @@ nav:
- Overview: hybrid_search/hybrid_search.md
- Comparing Rerankers: hybrid_search/eval.md
- Airbnb financial data example: notebooks/hybrid_search.ipynb
- Late interaction with MultiVector search:
- Overview: guides/multi-vector.md
- Example: notebooks/Multivector_on_LanceDB.ipynb
- RAG:
- Vanilla RAG: rag/vanilla_rag.md
- Multi-head RAG: rag/multi_head_rag.md
@@ -233,13 +236,6 @@ nav:
- 👾 JavaScript (vectordb): javascript/modules.md
- 👾 JavaScript (lancedb): js/globals.md
- 🦀 Rust: https://docs.rs/lancedb/latest/lancedb/
- ☁️ LanceDB Cloud:
- Overview: cloud/index.md
- API reference:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
- Quick start: basic.md
- Concepts:
@@ -260,6 +256,9 @@ nav:
- Overview: hybrid_search/hybrid_search.md
- Comparing Rerankers: hybrid_search/eval.md
- Airbnb financial data example: notebooks/hybrid_search.ipynb
- Late interaction with MultiVector search:
- Overview: guides/multi-vector.md
- Document search Example: notebooks/Multivector_on_LanceDB.ipynb
- RAG:
- Vanilla RAG: rag/vanilla_rag.md
- Multi-head RAG: rag/multi_head_rag.md
@@ -363,13 +362,6 @@ nav:
- Javascript (vectordb): javascript/modules.md
- Javascript (lancedb): js/globals.md
- Rust: https://docs.rs/lancedb/latest/lancedb/index.html
- LanceDB Cloud:
- Overview: cloud/index.md
- API reference:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
extra_css:
- styles/global.css

View File

@@ -171,7 +171,7 @@ paths:
distance_type:
type: string
description: |
The distance metric to use for search. L2, Cosine, Dot and Hamming are supported. Default is L2.
The distance metric to use for search. l2, Cosine, Dot and Hamming are supported. Default is l2.
bypass_vector_index:
type: boolean
description: |
@@ -450,7 +450,7 @@ paths:
type: string
nullable: false
description: |
The metric type to use for the index. L2, Cosine, Dot are supported.
The metric type to use for the index. l2, Cosine, Dot are supported.
index_type:
type: string
responses:

View File

@@ -69,7 +69,7 @@ Lance supports `IVF_PQ` index type by default.
The following IVF_PQ paramters can be specified:
- **distance_type**: The distance metric to use. By default it uses euclidean distance "`L2`".
- **distance_type**: The distance metric to use. By default it uses euclidean distance "`l2`".
We also support "cosine" and "dot" distance as well.
- **num_partitions**: The number of partitions in the index. The default is the square root
of the number of rows.

View File

@@ -2,7 +2,7 @@
LanceDB Cloud is a SaaS (software-as-a-service) solution that runs serverless in the cloud, clearly separating storage from compute. It's designed to be highly scalable without breaking the bank. LanceDB Cloud is currently in private beta with general availability coming soon, but you can apply for early access with the private beta release by signing up below.
[Try out LanceDB Cloud](https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms){ .md-button .md-button--primary }
[Try out LanceDB Cloud (Public Beta)](https://cloud.lancedb.com){ .md-button .md-button--primary }
## Architecture

View File

@@ -59,7 +59,7 @@ Then the greedy search routine operates as follows:
There are three key parameters to set when constructing an HNSW index:
* `metric`: Use an `L2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `metric`: Use an `l2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `m`: The number of neighbors to select for each vector in the HNSW graph.
* `ef_construction`: The number of candidates to evaluate during the construction of the HNSW graph.

View File

@@ -47,7 +47,7 @@ We can combine the above concepts to understand how to build and query an IVF-PQ
There are three key parameters to set when constructing an IVF-PQ index:
* `metric`: Use an `L2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `metric`: Use an `l2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `num_partitions`: The number of partitions in the IVF portion of the index.
* `num_sub_vectors`: The number of sub-vectors that will be created during Product Quantization (PQ).
@@ -56,7 +56,7 @@ In Python, the index can be created as follows:
```python
# Create and train the index for a 1536-dimensional vector
# Make sure you have enough data in the table for an effective training step
tbl.create_index(metric="L2", num_partitions=256, num_sub_vectors=96)
tbl.create_index(metric="l2", num_partitions=256, num_sub_vectors=96)
```
!!! note
`num_partitions`=256 and `num_sub_vectors`=96 does not work for every dataset. Those values needs to be adjusted for your particular dataset.

View File

@@ -54,7 +54,7 @@ As mentioned, after creating embedding, each data point is represented as a vect
Points that are close to each other in vector space are considered similar (or appear in similar contexts), and points that are far away are considered dissimilar. To quantify this closeness, we use distance as a metric which can be measured in the following way -
1. **Euclidean Distance (L2)**: It calculates the straight-line distance between two points (vectors) in a multidimensional space.
1. **Euclidean Distance (l2)**: It calculates the straight-line distance between two points (vectors) in a multidimensional space.
2. **Cosine Similarity**: It measures the cosine of the angle between two vectors, providing a normalized measure of similarity based on their direction.
3. **Dot product**: It is calculated as the sum of the products of their corresponding components. To measure relatedness it considers both the magnitude and direction of the vectors.

View File

@@ -8,15 +8,5 @@ LanceDB provides language APIs, allowing you to embed a database in your languag
* 👾 [JavaScript](examples_js.md) examples
* 🦀 Rust examples (coming soon)
## Python Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Ultralytics Explorer 🚀**<br>[![Ultralytics](https://img.shields.io/badge/Ultralytics-Docs-green?labelColor=0f3bc4&style=flat-square&logo=https://cdn.prod.website-files.com/646dd1f1a3703e451ba81ecc/64994922cf2a6385a4bf4489_UltralyticsYOLO_mark_blue.svg&link=https://docs.ultralytics.com/datasets/explorer/)](https://docs.ultralytics.com/datasets/explorer/)<br>[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb) | - 🔍 **Explore CV Datasets**: Semantic search, SQL queries, vector similarity, natural language.<br>- 🖥️ **GUI & Python API**: Seamless dataset interaction.<br>- ⚡ **Efficient & Scalable**: Leverages LanceDB for large datasets.<br>- 📊 **Detailed Analysis**: Easily analyze data patterns.<br>- 🌐 **Browser GUI Demo**: Create embeddings, search images, run queries. |
| **Website Chatbot🤖**<br>[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/lancedb/lancedb-vercel-chatbot)<br>[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Flancedb%2Flancedb-vercel-chatbot&amp;env=OPENAI_API_KEY&amp;envDescription=OpenAI%20API%20Key%20for%20chat%20completion.&amp;project-name=lancedb-vercel-chatbot&amp;repository-name=lancedb-vercel-chatbot&amp;demo-title=LanceDB%20Chatbot%20Demo&amp;demo-description=Demo%20website%20chatbot%20with%20LanceDB.&amp;demo-url=https%3A%2F%2Flancedb.vercel.app&amp;demo-image=https%3A%2F%2Fi.imgur.com%2FazVJtvr.png) | - 🌐 **Chatbot from Sitemap/Docs**: Create a chatbot using site or document context.<br>- 🚀 **Embed LanceDB in Next.js**: Lightweight, on-prem storage.<br>- 🧠 **AI-Powered Context Retrieval**: Efficiently access relevant data.<br>- 🔧 **Serverless & Native JS**: Seamless integration with Next.js.<br>- ⚡ **One-Click Deploy on Vercel**: Quick and easy setup.. |
## Nodejs Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Langchain Writing Assistant✍ **<br>[![Github](../assets/github.svg)](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/lanchain_writing_assistant) | - **📂 Data Source Integration**: Use your own data by specifying data source file, and the app instantly processes it to provide insights. <br>- **🧠 Intelligent Suggestions**: Powered by LangChain.js and LanceDB, it improves writing productivity and accuracy. <br>- **💡 Enhanced Writing Experience**: It delivers real-time contextual insights and factual suggestions while the user writes. |
!!! tip "Hosted LanceDB"
If you want S3 cost-efficiency and local performance via a simple serverless API, checkout **LanceDB Cloud**. For private deployments, high performance at extreme scale, or if you have strict security requirements, talk to us about **LanceDB Enterprise**. [Learn more](https://docs.lancedb.com/)

View File

@@ -0,0 +1,85 @@
# Late interaction & MultiVector embedding type
Late interaction is a technique used in retrieval that calculates the relevance of a query to a document by comparing their multi-vector representations. The key difference between late interaction and other popular methods:
![late interaction vs other methods](https://raw.githubusercontent.com/lancedb/assets/b035a0ceb2c237734e0d393054c146d289792339/docs/assets/integration/colbert-blog-interaction.svg)
[ Illustration from https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter-in-search/]
<b>No interaction:</b> Refers to independently embedding the query and document, that are compared to calcualte similarity without any interaction between them. This is typically used in vector search operations.
<b>Partial interaction</b> Refers to a specific approach where the similarity computation happens primarily between query vectors and document vectors, without extensive interaction between individual components of each. An example of this is dual-encoder models like BERT.
<b>Early full interaction</b> Refers to techniques like cross-encoders that process query and docs in pairs with full interaction across various stages of encoding. This is a powerful, but relatively slower technique. Because it requires processing query and docs in pairs, doc embeddings can't be pre-computed for fast retrieval. This is why cross encoders are typically used as reranking models combined with vector search. Learn more about [LanceDB Reranking support](https://lancedb.github.io/lancedb/reranking/).
<b>Late interaction</b> Late interaction is a technique that calculates the doc and query similarity independently and then the interaction or evaluation happens during the retrieval process. This is typically used in retrieval models like ColBERT. Unlike early interaction, It allows speeding up the retrieval process without compromising the depth of semantic analysis.
## Internals of ColBERT
Let's take a look at the steps involved in performing late interaction based retrieval using ColBERT:
• ColBERT employs BERT-based encoders for both queries `(fQ)` and documents `(fD)`
• A single BERT model is shared between query and document encoders and special tokens distinguish input types: `[Q]` for queries and `[D]` for documents
**Query Encoder (fQ):**
• Query q is tokenized into WordPiece tokens: `q1, q2, ..., ql`. `[Q]` token is prepended right after BERT's `[CLS]` token
• If query length < Nq, it's padded with [MASK] tokens up to Nq.
The padded sequence goes through BERT's transformer architecture
Final embeddings are L2-normalized.
**Document Encoder (fD):**
Document d is tokenized into tokens `d1, d2, ..., dm`. `[D]` token is prepended after `[CLS]` token
Unlike queries, documents are NOT padded with `[MASK]` tokens
Document tokens are processed through BERT and the same linear layer
**Late Interaction:**
Late interaction estimates relevance score `S(q,d)` using embedding `Eq` and `Ed`. Late interaction happens after independent encoding
For each query embedding, maximum similarity is computed against all document embeddings
The similarity measure can be cosine similarity or squared L2 distance
**MaxSim Calculation:**
```
S(q,d) := Σ max(Eqi⋅EdjT)
i∈|Eq| j∈|Ed|
```
This finds the best matching document embedding for each query embedding
Captures relevance based on strongest local matches between contextual embeddings
## LanceDB MultiVector type
LanceDB supports multivector type, this is useful when you have multiple vectors for a single item (e.g. with ColBert and ColPali).
You can index on a column with multivector type and search on it, the query can be single vector or multiple vectors. For now, only cosine metric is supported for multivector search. The vector value type can be float16, float32 or float64. LanceDB integrateds [ConteXtualized Token Retriever(XTR)](https://arxiv.org/abs/2304.01982), which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.
```python
import lancedb
import numpy as np
import pyarrow as pa
db = lancedb.connect("data/multivector_demo")
schema = pa.schema(
[
pa.field("id", pa.int64()),
# float16, float32, and float64 are supported
pa.field("vector", pa.list_(pa.list_(pa.float32(), 256))),
]
)
data = [
{
"id": i,
"vector": np.random.random(size=(2, 256)).tolist(),
}
for i in range(1024)
]
tbl = db.create_table("my_table", data=data, schema=schema)
# only cosine similarity is supported for multi-vectors
tbl.create_index(metric="cosine")
# query with single vector
query = np.random.random(256).astype(np.float16)
tbl.search(query).to_arrow()
# query with multiple vectors
query = np.random.random(size=(2, 256))
tbl.search(query).to_arrow()
```
Find more about vector search in LanceDB [here](https://lancedb.github.io/lancedb/search/#multivector-type).

View File

@@ -1001,9 +1001,11 @@ In LanceDB OSS, users can set the `read_consistency_interval` parameter on conne
There are three possible settings for `read_consistency_interval`:
1. **Unset (default)**: The database does not check for updates to tables made by other processes. This provides the best query performance, but means that clients may not see the most up-to-date data. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS.
3. **Custom interval (Eventual consistency)**: The database checks for updates at a custom interval, such as every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.
1. **Unset**: The database does not check for updates to tables made by other processes. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS. For best performance, combine this setting with the storage option `new_table_enable_v2_manifest_paths` set to `true`.
3. **Custom interval (Eventual consistency, the default)**: The database checks for updates at a custom interval. By default, this is every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.
You can always force a synchronization by calling `checkout_latest()` / `checkoutLatest()` on a table.
!!! tip "Consistency in LanceDB Cloud"
@@ -1041,7 +1043,21 @@ There are three possible settings for `read_consistency_interval`:
--8<-- "python/python/tests/docs/test_guide_tables.py:table_async_eventual_consistency"
```
By default, a `Table` will never check for updates from other writers. To manually check for updates you can use `checkout_latest`:
For no consistency, use `None`:
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:table_no_consistency"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:table_async_no_consistency"
```
To manually check for updates you can use `checkout_latest`:
=== "Sync API"
@@ -1059,15 +1075,25 @@ There are three possible settings for `read_consistency_interval`:
To set strong consistency, use `0`:
```ts
const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 0 });
const tbl = await db.openTable("my_table");
--8<-- "nodejs/examples/basic.test.ts:table_strong_consistency"
```
For eventual consistency, specify the update interval as seconds:
```ts
const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 5 });
const tbl = await db.openTable("my_table");
--8<-- "nodejs/examples/basic.test.ts:table_eventual_consistency"
```
For no consistency, use `null`:
```ts
--8<-- "nodejs/examples/basic.test.ts:table_no_consistency"
```
To manually check for updates you can use `checkoutLatest`:
```ts
--8<-- "nodejs/examples/basic.test.ts:table_checkout_latest"
```
<!-- Node doesn't yet support the version time travel: https://github.com/lancedb/lancedb/issues/1007

View File

@@ -4,6 +4,9 @@ LanceDB is an open-source vector database for AI that's designed to store, manag
Both the database and the underlying data format are designed from the ground up to be **easy-to-use**, **scalable** and **cost-effective**.
!!! tip "Hosted LanceDB"
If you want S3 cost-efficiency and local performance via a simple serverless API, checkout **LanceDB Cloud**. For private deployments, high performance at extreme scale, or if you have strict security requirements, talk to us about **LanceDB Enterprise**. [Learn more](https://docs.lancedb.com/)
![](assets/lancedb_and_lance.png)
## Truly multi-modal
@@ -20,7 +23,7 @@ LanceDB **OSS** is an **open-source**, batteries-included embedded vector databa
LanceDB **Cloud** is a SaaS (software-as-a-service) solution that runs serverless in the cloud, making the storage clearly separated from compute. It's designed to be cost-effective and highly scalable without breaking the bank. LanceDB Cloud is currently in private beta with general availability coming soon, but you can apply for early access with the private beta release by signing up below.
[Try out LanceDB Cloud](https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms){ .md-button .md-button--primary }
[Try out LanceDB Cloud (Public Beta) Now](https://cloud.lancedb.com){ .md-button .md-button--primary }
## Why use LanceDB?

View File

@@ -108,7 +108,7 @@ This method creates a scalar(for non-vector cols) or a vector index on a table.
|:---|:---|:---|:---|
|`vector_col`|`Optional[str]`| Provide if you want to create index on a vector column. |`None`|
|`col_name`|`Optional[str]`| Provide if you want to create index on a non-vector column. |`None`|
|`metric`|`Optional[str]` |Provide the metric to use for vector index. choice of metrics: 'L2', 'dot', 'cosine'. |`L2`|
|`metric`|`Optional[str]` |Provide the metric to use for vector index. choice of metrics: 'l2', 'dot', 'cosine'. |`l2`|
|`num_partitions`|`Optional[int]`|Number of partitions to use for the index.|`256`|
|`num_sub_vectors`|`Optional[int]` |Number of sub-vectors to use for the index.|`96`|
|`index_cache_size`|`Optional[int]` |Size of the index cache.|`None`|

View File

@@ -125,7 +125,7 @@ The exhaustive list of parameters for `LanceDBVectorStore` vector store are :
```
- **_table_exists(self, tbl_name: `Optional[str]` = `None`) -> `bool`** : Returns `True` if `tbl_name` exists in database.
- __create_index(
self, scalar: `Optional[bool]` = False, col_name: `Optional[str]` = None, num_partitions: `Optional[int]` = 256, num_sub_vectors: `Optional[int]` = 96, index_cache_size: `Optional[int]` = None, metric: `Optional[str]` = "L2",
self, scalar: `Optional[bool]` = False, col_name: `Optional[str]` = None, num_partitions: `Optional[int]` = 256, num_sub_vectors: `Optional[int]` = 96, index_cache_size: `Optional[int]` = None, metric: `Optional[str]` = "l2",
) -> `None`__ : Creates a scalar(for non-vector cols) or a vector index on a table.
Make sure your vector column has enough data before creating an index on it.

View File

@@ -10,7 +10,7 @@ Distance metrics type.
- [Cosine](MetricType.md#cosine)
- [Dot](MetricType.md#dot)
- [L2](MetricType.md#l2)
- [l2](MetricType.md#l2)
## Enumeration Members

View File

@@ -85,7 +85,7 @@ ___
`Optional` **metric\_type**: [`MetricType`](../enums/MetricType.md)
Metric type, L2 or Cosine
Metric type, l2 or Cosine
#### Defined in

View File

@@ -15,11 +15,9 @@ npm install @lancedb/lancedb
This will download the appropriate native library for your platform. We currently
support:
- Linux (x86_64 and aarch64)
- Linux (x86_64 and aarch64 on glibc and musl)
- MacOS (Intel and ARM/M1/M2)
- Windows (x86_64 only)
We do not yet support musl-based Linux (such as Alpine Linux) or aarch64 Windows.
- Windows (x86_64 and aarch64)
## Usage

View File

@@ -0,0 +1,75 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BoostQuery
# Class: BoostQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BoostQuery()
```ts
new BoostQuery(
positive,
negative,
negativeBoost): BoostQuery
```
Creates an instance of BoostQuery.
#### Parameters
* **positive**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The positive query that boosts the relevance score.
* **negative**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The negative query that reduces the relevance score.
* **negativeBoost**: `number`
The factor by which the negative query reduces the score.
#### Returns
[`BoostQuery`](BoostQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -126,6 +126,37 @@ the vectors.
***
### ivfFlat()
```ts
static ivfFlat(options?): Index
```
Create an IvfFlat index
This index groups vectors into partitions of similar vectors. Each partition keeps track of
a centroid which is the average value of all vectors in the group.
During a query the centroids are compared with the query vector to find the closest
partitions. The vectors in these partitions are then searched to find
the closest vectors.
The partitioning process is called IVF and the `num_partitions` parameter controls how
many groups to create.
Note that training an IVF FLAT index on a large dataset is a slow operation and
currently is also a memory intensive operation.
#### Parameters
* **options?**: `Partial`&lt;[`IvfFlatOptions`](../interfaces/IvfFlatOptions.md)&gt;
#### Returns
[`Index`](Index.md)
***
### ivfPq()
```ts

View File

@@ -0,0 +1,83 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MatchQuery
# Class: MatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MatchQuery()
```ts
new MatchQuery(
query,
column,
boost,
fuzziness,
maxExpansions): MatchQuery
```
Creates an instance of MatchQuery.
#### Parameters
* **query**: `string`
The text query to search for.
* **column**: `string`
The name of the column to search within.
* **boost**: `number` = `1.0`
(Optional) The boost factor to influence the relevance score of this query. Default is `1.0`.
* **fuzziness**: `number` = `0`
(Optional) The allowed edit distance for fuzzy matching. Default is `0`.
* **maxExpansions**: `number` = `50`
(Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`.
#### Returns
[`MatchQuery`](MatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -0,0 +1,77 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MultiMatchQuery
# Class: MultiMatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MultiMatchQuery()
```ts
new MultiMatchQuery(
query,
columns,
boosts): MultiMatchQuery
```
Creates an instance of MultiMatchQuery.
#### Parameters
* **query**: `string`
The text query to search for across multiple columns.
* **columns**: `string`[]
An array of column names to search within.
* **boosts**: `number`[] = `...`
(Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column.
The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0.
If the length of `boosts` is less than `columns`, it will be padded with 1.0s.
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -0,0 +1,69 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / PhraseQuery
# Class: PhraseQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new PhraseQuery()
```ts
new PhraseQuery(query, column): PhraseQuery
```
Creates an instance of `PhraseQuery`.
#### Parameters
* **query**: `string`
The phrase to search for in the specified column.
* **column**: `string`
The name of the column to search within.
#### Returns
[`PhraseQuery`](PhraseQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -30,6 +30,53 @@ protected inner: Query | Promise<Query>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### execute()
```ts
@@ -159,7 +206,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;
@@ -262,7 +309,7 @@ nearestToText(query, columns?): Query
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **columns?**: `string`[]

View File

@@ -36,6 +36,49 @@ protected inner: NativeQueryType | Promise<NativeQueryType>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
***
### execute()
```ts
@@ -149,7 +192,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -48,6 +48,53 @@ addQueryVector(vector): VectorQuery
***
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### bypassVectorIndex()
```ts
@@ -300,7 +347,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -0,0 +1,46 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQueryType
# Enumeration: FullTextQueryType
Enum representing the types of full-text queries supported.
- `Match`: Performs a full-text search for terms in the query string.
- `MatchPhrase`: Searches for an exact phrase match in the text.
- `Boost`: Boosts the relevance score of specific terms in the query.
- `MultiMatch`: Searches across multiple fields for the query terms.
## Enumeration Members
### Boost
```ts
Boost: "boost";
```
***
### Match
```ts
Match: "match";
```
***
### MatchPhrase
```ts
MatchPhrase: "match_phrase";
```
***
### MultiMatch
```ts
MultiMatch: "multi_match";
```

View File

@@ -0,0 +1,19 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / packBits
# Function: packBits()
```ts
function packBits(data): number[]
```
## Parameters
* **data**: `number`[]
## Returns
`number`[]

View File

@@ -9,12 +9,20 @@
- [embedding](namespaces/embedding/README.md)
- [rerankers](namespaces/rerankers/README.md)
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
## Classes
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)
- [MakeArrowTableOptions](classes/MakeArrowTableOptions.md)
- [MatchQuery](classes/MatchQuery.md)
- [MergeInsertBuilder](classes/MergeInsertBuilder.md)
- [MultiMatchQuery](classes/MultiMatchQuery.md)
- [PhraseQuery](classes/PhraseQuery.md)
- [Query](classes/Query.md)
- [QueryBase](classes/QueryBase.md)
- [RecordBatchIterator](classes/RecordBatchIterator.md)
@@ -33,12 +41,14 @@
- [CreateTableOptions](interfaces/CreateTableOptions.md)
- [ExecutableQuery](interfaces/ExecutableQuery.md)
- [FtsOptions](interfaces/FtsOptions.md)
- [FullTextQuery](interfaces/FullTextQuery.md)
- [FullTextSearchOptions](interfaces/FullTextSearchOptions.md)
- [HnswPqOptions](interfaces/HnswPqOptions.md)
- [HnswSqOptions](interfaces/HnswSqOptions.md)
- [IndexConfig](interfaces/IndexConfig.md)
- [IndexOptions](interfaces/IndexOptions.md)
- [IndexStatistics](interfaces/IndexStatistics.md)
- [IvfFlatOptions](interfaces/IvfFlatOptions.md)
- [IvfPqOptions](interfaces/IvfPqOptions.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md)
@@ -66,3 +76,4 @@
- [connect](functions/connect.md)
- [makeArrowTable](functions/makeArrowTable.md)
- [packBits](functions/packBits.md)

View File

@@ -16,7 +16,7 @@ must be provided.
### dataType?
```ts
optional dataType: string;
optional dataType: string | DataType<Type, any>;
```
A new data type for the column. If not provided then the data type will not be changed.

View File

@@ -44,7 +44,7 @@ for testing purposes.
### readConsistencyInterval?
```ts
optional readConsistencyInterval: number;
optional readConsistencyInterval: null | number;
```
(For LanceDB OSS only): The interval, in seconds, at which to check for

View File

@@ -0,0 +1,35 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQuery
# Interface: FullTextQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;

View File

@@ -24,18 +24,18 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -24,18 +24,18 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -30,6 +30,17 @@ The type of the index
***
### loss?
```ts
optional loss: number;
```
The KMeans loss value of the index,
it is only present for vector indices.
***
### numIndexedRows
```ts

View File

@@ -0,0 +1,112 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / IvfFlatOptions
# Interface: IvfFlatOptions
Options to create an `IVF_FLAT` index
## Properties
### distanceType?
```ts
optional distanceType: "l2" | "cosine" | "dot" | "hamming";
```
Distance type to use to build the index.
Default value is "l2".
This is used when training the index to calculate the IVF partitions
(vectors are grouped in partitions with similar vectors according to this
distance type).
The distance type used to train an index MUST match the distance type used
to search the index. Failure to do so will yield inaccurate results.
The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
Note: the cosine distance is undefined when one (or both) of the vectors
are all zeros (there is no direction). These vectors are invalid and may
never be returned from a vector search.
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
l2 norm is 1), then dot distance is equivalent to the cosine distance.
"hamming" - Hamming distance. Hamming distance is a distance metric
calculated from the number of bits that are different between two vectors.
Hamming distance has a range of [0, dimension]. Note that the hamming distance
is only valid for binary vectors.
***
### maxIterations?
```ts
optional maxIterations: number;
```
Max iteration to train IVF kmeans.
When training an IVF FLAT index we use kmeans to calculate the partitions. This parameter
controls how many iterations of kmeans to run.
Increasing this might improve the quality of the index but in most cases these extra
iterations have diminishing returns.
The default value is 50.
***
### numPartitions?
```ts
optional numPartitions: number;
```
The number of IVF partitions to create.
This value should generally scale with the number of rows in the dataset.
By default the number of partitions is the square root of the number of
rows.
If this value is too large then the first part of the search (picking the
right partition) will be slow. If this value is too small then the second
part of the search (searching within a partition) will be slow.
***
### sampleRate?
```ts
optional sampleRate: number;
```
The number of vectors, per partition, to sample when training IVF kmeans.
When an IVF FLAT index is trained, we need to calculate partitions. These are groups
of vectors that are similar to each other. To do this we use an algorithm called kmeans.
Running kmeans on a large dataset can be slow. To speed this up we run kmeans on a
random sample of the data. This parameter controls the size of the sample. The total
number of vectors used to train the index is `sample_rate * num_partitions`.
Increasing this value might improve the quality of the index but in most cases the
default should be sufficient.
The default value is 256.

View File

@@ -31,13 +31,13 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
Note: the cosine distance is undefined when one (or both) of the vectors
@@ -46,7 +46,7 @@ never be returned from a vector search.
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -20,3 +20,13 @@ The maximum number of rows to return in a single batch
Batches may have fewer rows if the underlying data is stored
in smaller chunks.
***
### timeoutMs?
```ts
optional timeoutMs: number;
```
Timeout for query execution in milliseconds

File diff suppressed because one or more lines are too long

View File

@@ -59,8 +59,6 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.embeddings.open_clip.OpenClipEmbeddings
::: lancedb.embeddings.utils.with_embeddings
## Context
::: lancedb.context.contextualize

View File

@@ -15,7 +15,7 @@ Currently, LanceDB supports the following metrics:
| Metric | Description |
| --------- | --------------------------------------------------------------------------- |
| `l2` | [Euclidean / L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
| `l2` | [Euclidean / l2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
| `cosine` | [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) |
| `dot` | [Dot Production](https://en.wikipedia.org/wiki/Dot_product) |
| `hamming` | [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) |
@@ -138,6 +138,19 @@ LanceDB supports binary vectors as a data type, and has the ability to search bi
--8<-- "python/python/tests/docs/test_binary_vector.py:async_binary_vector"
```
=== "TypeScript"
```ts
--8<-- "nodejs/examples/search.test.ts:import"
--8<-- "nodejs/examples/search.test.ts:import_bin_util"
--8<-- "nodejs/examples/search.test.ts:ingest_binary_data"
--8<-- "nodejs/examples/search.test.ts:search_binary_data"
```
## Multivector type
LanceDB supports multivector type, this is useful when you have multiple vectors for a single item (e.g. with ColBert and ColPali).

View File

@@ -7,7 +7,7 @@ performed on the top-k results returned by the vector search. However, pre-filte
option that performs the filter prior to vector search. This can be useful to narrow down
the search space of a very large dataset to reduce query latency.
Note that both pre-filtering and post-filtering can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to set `use_index=false` if you know that the filter is highly selective.
Note that both pre-filtering and post-filtering can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to call `bypass_vector_index()` if you know that the filter is highly selective.
Similarly, a highly selective post-filter can lead to false positives. Increasing both `nprobes` and `refine_factor` can mitigate this issue. When deciding between pre-filtering and post-filtering, pre-filtering is generally the safer choice if you're uncertain.

View File

@@ -8,6 +8,11 @@ For trouble shooting, the best place to ask is in our Discord, under the relevan
language channel. By asking in the language-specific channel, it makes it more
likely that someone who knows the answer will see your question.
## Common issues
* Multiprocessing with `fork` is not supported. You should use `spawn` instead.
* Data returned by queries may not reflect the most recent writes, depending on configuration. LanceDB uses eventual consistency by default. See [consistency](/docs/src/guides/tables.md#consistency) for more information.
## Enabling logging
To provide more information, especially for LanceDB Cloud related issues, enable
@@ -31,3 +36,9 @@ print the resolved query plan. You can use the `explain_plan` method to do this:
* Python Sync: [LanceQueryBuilder.explain_plan][lancedb.query.LanceQueryBuilder.explain_plan]
* Python Async: [AsyncQueryBase.explain_plan][lancedb.query.AsyncQueryBase.explain_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.explainPlan](/lancedb/js/classes/QueryBase/#explainplan)
To understand how a query was actually executed—including metrics like execution time, number of rows processed, I/O stats, and more—use the analyze_plan method. This executes the query and returns a physical execution plan annotated with runtime metrics, making it especially helpful for performance tuning and debugging.
* Python Sync: [LanceQueryBuilder.analyze_plan][lancedb.query.LanceQueryBuilder.analyze_plan]
* Python Async: [AsyncQueryBase.analyze_plan][lancedb.query.AsyncQueryBase.analyze_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.analyzePlan](/lancedb/js/classes/QueryBase/#analyzePlan)

3
java/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
*.iml
.java-version

View File

@@ -8,13 +8,16 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.18.0-beta.0</version>
<version>0.19.0-beta.5</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>LanceDB Core</name>
<packaging>jar</packaging>
<properties>
<rust.release.build>false</rust.release.build>
</properties>
<dependencies>
<dependency>
@@ -68,7 +71,7 @@
</goals>
<configuration>
<path>lancedb-jni</path>
<release>true</release>
<release>${rust.release.build}</release>
<!-- Copy native libraries to target/classes for runtime access -->
<copyTo>${project.build.directory}/classes/nativelib</copyTo>
<copyWithPlatformDir>true</copyWithPlatformDir>

View File

@@ -1,16 +1,25 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import io.questdb.jar.jni.JarJniLoader;
import java.io.Closeable;
import java.util.List;
import java.util.Optional;
/**
* Represents LanceDB database.
*/
/** Represents LanceDB database. */
public class Connection implements Closeable {
static {
JarJniLoader.loadLib(Connection.class, "/nativelib", "lancedb_jni");
@@ -18,14 +27,11 @@ public class Connection implements Closeable {
private long nativeConnectionHandle;
/**
* Connect to a LanceDB instance.
*/
/** Connect to a LanceDB instance. */
public static native Connection connect(String uri);
/**
* Get the names of all tables in the database. The names are sorted in
* ascending order.
* Get the names of all tables in the database. The names are sorted in ascending order.
*
* @return the table names
*/
@@ -34,8 +40,7 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param limit The number of results to return.
* @return the table names
@@ -45,12 +50,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @return the table names
*/
public List<String> tableNames(String startAfter) {
@@ -58,12 +62,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
@@ -72,22 +75,19 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
public native List<String> tableNames(
Optional<String> startAfter, Optional<Integer> limit);
public native List<String> tableNames(Optional<String> startAfter, Optional<Integer> limit);
/**
* Closes this connection and releases any system resources associated with it. If
* the connection is
* already closed, then invoking this method has no effect.
* Closes this connection and releases any system resources associated with it. If the connection
* is already closed, then invoking this method has no effect.
*/
@Override
public void close() {
@@ -98,8 +98,7 @@ public class Connection implements Closeable {
}
/**
* Native method to release the Lance connection resources associated with the
* given handle.
* Native method to release the Lance connection resources associated with the given handle.
*
* @param handle The native handle to the connection resource.
*/

View File

@@ -1,27 +1,35 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
import java.nio.file.Path;
import java.util.List;
import java.net.URL;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
import java.net.URL;
import java.nio.file.Path;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
public class ConnectionTest {
private static final String[] TABLE_NAMES = {
"dataset_version",
"new_empty_dataset",
"test",
"write_stream"
"dataset_version", "new_empty_dataset", "test", "write_stream"
};
@TempDir
static Path tempDir; // Temporary directory for the tests
@TempDir static Path tempDir; // Temporary directory for the tests
private static URL lanceDbURL;
@BeforeAll
@@ -53,18 +61,21 @@ public class ConnectionTest {
@Test
void tableNamesStartAfter() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
assertTableNamesStartAfter(conn, TABLE_NAMES[0], 3, TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(
conn, TABLE_NAMES[0], 3, TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[1], 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[2], 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[3], 0);
assertTableNamesStartAfter(conn, "a_dataset", 4, TABLE_NAMES[0], TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(
conn, "a_dataset", 4, TABLE_NAMES[0], TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "o_dataset", 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "v_dataset", 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "z_dataset", 0);
}
}
private void assertTableNamesStartAfter(Connection conn, String startAfter, int expectedSize, String... expectedNames) {
private void assertTableNamesStartAfter(
Connection conn, String startAfter, int expectedSize, String... expectedNames) {
List<String> tableNames = conn.tableNames(startAfter);
assertEquals(expectedSize, tableNames.size());
for (int i = 0; i < expectedNames.length; i++) {
@@ -74,7 +85,7 @@ public class ConnectionTest {
@Test
void tableNamesLimit() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
for (int i = 0; i <= TABLE_NAMES.length; i++) {
List<String> tableNames = conn.tableNames(i);
assertEquals(i, tableNames.size());

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.18.0-beta.0</version>
<version>0.19.0-beta.5</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>
@@ -29,6 +29,25 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
<spotless.delimiter>package</spotless.delimiter>
<spotless.license.header>
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
</spotless.license.header>
</properties>
<modules>
@@ -127,7 +146,8 @@
<configuration>
<configLocation>google_checks.xml</configLocation>
<consoleOutput>true</consoleOutput>
<failsOnError>true</failsOnError>
<failsOnError>false</failsOnError>
<failOnViolation>false</failOnViolation>
<violationSeverity>warning</violationSeverity>
<linkXRef>false</linkXRef>
</configuration>
@@ -141,6 +161,10 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
</plugin>
</plugins>
<pluginManagement>
<plugins>
@@ -166,7 +190,6 @@
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<argLine>--add-opens=java.base/java.nio=ALL-UNNAMED</argLine>
<forkNode
implementation="org.apache.maven.plugin.surefire.extensions.SurefireForkNodeFactory" />
<useSystemClassLoader>false</useSystemClassLoader>
@@ -180,6 +203,54 @@
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<version>${spotless.version}</version>
<configuration>
<skip>${spotless.skip}</skip>
<upToDateChecking>
<enabled>true</enabled>
</upToDateChecking>
<java>
<includes>
<include>src/main/java/**/*.java</include>
<include>src/test/java/**/*.java</include>
</includes>
<googleJavaFormat>
<version>${spotless.java.googlejavaformat.version}</version>
<style>GOOGLE</style>
</googleJavaFormat>
<importOrder>
<order>com.lancedb.lance,,javax,java,\#</order>
</importOrder>
<removeUnusedImports />
</java>
<scala>
<includes>
<include>src/main/scala/**/*.scala</include>
<include>src/main/scala-*/**/*.scala</include>
<include>src/test/scala/**/*.scala</include>
<include>src/test/scala-*/**/*.scala</include>
</includes>
</scala>
<licenseHeader>
<content>${spotless.license.header}</content>
<delimiter>${spotless.delimiter}</delimiter>
</licenseHeader>
</configuration>
<executions>
<execution>
<id>spotless-check</id>
<phase>validate</phase>
<goals>
<goal>apply</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
</build>

93
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"cpu": [
"x64",
"arm64"
@@ -52,14 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.18.0-beta.0",
"@lancedb/vectordb-darwin-x64": "0.18.0-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.18.0-beta.0",
"@lancedb/vectordb-linux-arm64-musl": "0.18.0-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.18.0-beta.0",
"@lancedb/vectordb-linux-x64-musl": "0.18.0-beta.0",
"@lancedb/vectordb-win32-arm64-msvc": "0.18.0-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.18.0-beta.0"
"@lancedb/vectordb-darwin-arm64": "0.19.0-beta.5",
"@lancedb/vectordb-darwin-x64": "0.19.0-beta.5",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.0-beta.5",
"@lancedb/vectordb-linux-x64-gnu": "0.19.0-beta.5",
"@lancedb/vectordb-win32-x64-msvc": "0.19.0-beta.5"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -330,9 +327,9 @@
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.18.0-beta.0.tgz",
"integrity": "sha512-dLLgMPllYJOiRfPqkqkmoQu48RIa7K4dOF/qFP8Aex3zqeHE/0sFm3DYjtSFc6SR/6yT8u6Y9iFo2cQp5rCFJA==",
"version": "0.19.0-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.19.0-beta.5.tgz",
"integrity": "sha512-NuJVGaV4b6XgH3dlkCEquvtGM1cY5sIJE5M/LgJ3HYYvAbco/seBQM5AHTV/7CULoPEY9eQeJZOj9fWP5oQLYQ==",
"cpu": [
"arm64"
],
@@ -343,9 +340,9 @@
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.18.0-beta.0.tgz",
"integrity": "sha512-la0eauU0rzHO5eeVjBt8o/5UW4VzRYAuRA7nqUFLX5T6SWP5+UWjqusVVbWGz3ski+8uEX6VhlaFZP5uIJKGIg==",
"version": "0.19.0-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.19.0-beta.5.tgz",
"integrity": "sha512-hbadwvQcUgKJfluUHhN+mx+XeFRwTuh9mD0L3Tf3t3BkDTxyHpEG5WNgOpWrh6e1RU6zW54CoCyQuSEaVqGgGw==",
"cpu": [
"x64"
],
@@ -356,22 +353,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.18.0-beta.0.tgz",
"integrity": "sha512-AkXI/lB3yu1Di2G1lhilf89V6qPTppb13aAt+/6gU5/PSfA94y9VXD67D4WyvRbuQghJjDvAavMlWMrJc2NuMw==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-linux-arm64-musl": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-musl/-/vectordb-linux-arm64-musl-0.18.0-beta.0.tgz",
"integrity": "sha512-kTVcJ4LA8w/7egY4m0EXOt8c1DeFUquVtyvexO+VzIFeeHfBkkrMI0DkE0CpHmk+gctkG7EY39jzjgLnPvppnw==",
"version": "0.19.0-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.19.0-beta.5.tgz",
"integrity": "sha512-fu/EOYLr3mx76/SP4dEgbq0vSYHfuTf68lVl5/tL6eIb1Purz42l22+jNKLJ/S3Plase2SkXdxyY90K2Y/CvSg==",
"cpu": [
"arm64"
],
@@ -382,9 +366,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.18.0-beta.0.tgz",
"integrity": "sha512-KbtIy5DkaWTsKENm5Q27hjovrR7FRuoHhl0wDJtO/2CUZYlrskjEIfcfkfA2CrEQesBug4s5jgsvNM4Wcp6zoA==",
"version": "0.19.0-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.19.0-beta.5.tgz",
"integrity": "sha512-pzb8fl5M8155sc/mEFnKmuh9rCfQohHBlb+j+5qNMe84AyygQ8Me1H3b1h9fOkUPu2Y168zYfuGkjNv4Bjm9eA==",
"cpu": [
"x64"
],
@@ -394,36 +378,10 @@
"linux"
]
},
"node_modules/@lancedb/vectordb-linux-x64-musl": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-musl/-/vectordb-linux-x64-musl-0.18.0-beta.0.tgz",
"integrity": "sha512-SF07gmoGVExcF5v+IE6kBbCbXJSDyTgC7QCt+MDS1NsgoQ9OH7IyH7r6HJu16tKflUOUKlUHnP0hQOPpv1fWpg==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-win32-arm64-msvc": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-arm64-msvc/-/vectordb-win32-arm64-msvc-0.18.0-beta.0.tgz",
"integrity": "sha512-YYBuSBGDlxJgSI5gHjDmQo9sl05lAXfzil6QiKfgmUMsBtb2sT+GoUCgG6qzsfe99sWiTf+pMeWDsQgfrj9vNw==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"win32"
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.18.0-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.18.0-beta.0.tgz",
"integrity": "sha512-t9TXeUnMU7YbP+/nUJpStm75aWwUydZj2AK+G2XwDtQrQo4Xg7/NETEbBeogmIOHuidNQYia8jEeQCUon5/+Dw==",
"version": "0.19.0-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.19.0-beta.5.tgz",
"integrity": "sha512-5z6BSfTuZYJdDL2wwRrEQlnfluahzaUH2U7vj3i4ik4zaAwvaYcrjmdYCTLRYhFscUqzxd2pVFHbfRYe+maYzA==",
"cpu": [
"x64"
],
@@ -1226,9 +1184,10 @@
}
},
"node_modules/axios": {
"version": "1.7.7",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.7.7.tgz",
"integrity": "sha512-S4kL7XrjgBmvdGut0sN3yJxqYzrDOnivkBiN0OFs6hLiUam3UPvswUo0kqGyhqUZGEOytHyumEdXsAkgCOUf3Q==",
"version": "1.8.4",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.8.4.tgz",
"integrity": "sha512-eBSYY4Y68NNlHbHBMdeDmKNtDgXWhQsJcGqzO3iLUM0GraQFSS9cVgPX5I9b3lbdFKyYoAEGAZF1DwhTaljNAw==",
"license": "MIT",
"dependencies": {
"follow-redirects": "^1.15.6",
"form-data": "^4.0.0",

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -85,20 +85,14 @@
"aarch64-apple-darwin": "@lancedb/vectordb-darwin-arm64",
"x86_64-unknown-linux-gnu": "@lancedb/vectordb-linux-x64-gnu",
"aarch64-unknown-linux-gnu": "@lancedb/vectordb-linux-arm64-gnu",
"x86_64-unknown-linux-musl": "@lancedb/vectordb-linux-x64-musl",
"aarch64-unknown-linux-musl": "@lancedb/vectordb-linux-arm64-musl",
"x86_64-pc-windows-msvc": "@lancedb/vectordb-win32-x64-msvc",
"aarch64-pc-windows-msvc": "@lancedb/vectordb-win32-arm64-msvc"
"x86_64-pc-windows-msvc": "@lancedb/vectordb-win32-x64-msvc"
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.18.0-beta.0",
"@lancedb/vectordb-darwin-arm64": "0.18.0-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.18.0-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.18.0-beta.0",
"@lancedb/vectordb-linux-x64-musl": "0.18.0-beta.0",
"@lancedb/vectordb-linux-arm64-musl": "0.18.0-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.18.0-beta.0",
"@lancedb/vectordb-win32-arm64-msvc": "0.18.0-beta.0"
"@lancedb/vectordb-darwin-x64": "0.19.0-beta.5",
"@lancedb/vectordb-darwin-arm64": "0.19.0-beta.5",
"@lancedb/vectordb-linux-x64-gnu": "0.19.0-beta.5",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.0-beta.5",
"@lancedb/vectordb-win32-x64-msvc": "0.19.0-beta.5"
}
}

View File

@@ -1299,7 +1299,7 @@ export interface IvfPQIndexConfig {
index_name?: string
/**
* Metric type, L2 or Cosine
* Metric type, l2 or Cosine
*/
metric_type?: MetricType

View File

@@ -110,7 +110,7 @@ describe('LanceDB Mirrored Store Integration test', function () {
fs.readdir(path.join(mirroredPath, 'data'), { withFileTypes: true }, (err, files) => {
if (err != null) throw err
assert.equal(files.length, 1)
assert.equal(files.length, 1, `Found files: ${files.map(f => f.name)}`)
assert.isTrue(files[0].name.endsWith('.lance'))
})

View File

@@ -22,3 +22,4 @@ build.rs
jest.config.js
tsconfig.json
typedoc.json
typedoc_post_process.js

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.18.0-beta.0"
version = "0.19.0-beta.5"
license.workspace = true
description.workspace = true
repository.workspace = true
@@ -18,7 +18,7 @@ arrow-array.workspace = true
arrow-schema.workspace = true
env_logger.workspace = true
futures.workspace = true
lancedb = { path = "../rust/lancedb", features = ["remote"] }
lancedb = { path = "../rust/lancedb" }
napi = { version = "2.16.8", default-features = false, features = [
"napi9",
"async"
@@ -30,3 +30,8 @@ log.workspace = true
[build-dependencies]
napi-build = "2.1"
[features]
default = ["remote"]
fp16kernels = ["lancedb/fp16kernels"]
remote = ["lancedb/remote"]

View File

@@ -11,11 +11,9 @@ npm install @lancedb/lancedb
This will download the appropriate native library for your platform. We currently
support:
- Linux (x86_64 and aarch64)
- Linux (x86_64 and aarch64 on glibc and musl)
- MacOS (Intel and ARM/M1/M2)
- Windows (x86_64 only)
We do not yet support musl-based Linux (such as Alpine Linux) or aarch64 Windows.
- Windows (x86_64 and aarch64)
## Usage

View File

@@ -17,7 +17,7 @@ describe("when connecting", () => {
it("should connect", async () => {
const db = await connect(tmpDir.name);
expect(db.display()).toBe(
`ListingDatabase(uri=${tmpDir.name}, read_consistency_interval=None)`,
`ListingDatabase(uri=${tmpDir.name}, read_consistency_interval=5s)`,
);
});

View File

@@ -175,6 +175,8 @@ maybeDescribe("storage_options", () => {
tableNames = await db.tableNames();
expect(tableNames).toEqual([]);
await db.dropAllTables();
});
it("can configure encryption at connection and table level", async () => {
@@ -210,6 +212,8 @@ maybeDescribe("storage_options", () => {
await table.add([{ a: 2, b: 3 }]);
await bucket.assertAllEncrypted("test/table2.lance", kmsKey.keyId);
await db.dropAllTables();
});
});
@@ -298,5 +302,32 @@ maybeDescribe("DynamoDB Lock", () => {
const rowCount = await table.countRows();
expect(rowCount).toBe(6);
await db.dropAllTables();
});
it("clears dynamodb state after dropping all tables", async () => {
const uri = `s3+ddb://${bucket.name}/test?ddbTableName=${commitTable.name}`;
const db = await connect(uri, {
storageOptions: CONFIG,
readConsistencyInterval: 0,
});
await db.createTable("foo", [{ a: 1, b: 2 }]);
await db.createTable("bar", [{ a: 1, b: 2 }]);
let tableNames = await db.tableNames();
expect(tableNames).toEqual(["bar", "foo"]);
await db.dropAllTables();
tableNames = await db.tableNames();
expect(tableNames).toEqual([]);
// We can create a new table with the same name as the one we dropped.
await db.createTable("foo", [{ a: 1, b: 2 }]);
tableNames = await db.tableNames();
expect(tableNames).toEqual(["foo"]);
await db.dropAllTables();
});
});

View File

@@ -21,9 +21,11 @@ import {
Int64,
List,
Schema,
Uint8,
Utf8,
makeArrowTable,
} from "../lancedb/arrow";
import * as arrow from "../lancedb/arrow";
import {
EmbeddingFunction,
LanceSchema,
@@ -56,7 +58,7 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
it("be displayable", async () => {
expect(table.display()).toMatch(
/NativeTable\(some_table, uri=.*, read_consistency_interval=None\)/,
/NativeTable\(some_table, uri=.*, read_consistency_interval=5s\)/,
);
table.close();
expect(table.display()).toBe("ClosedTable(some_table)");
@@ -278,6 +280,15 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(res.getChild("y")?.toJSON()).toEqual([2, null, null, null]);
expect(res.getChild("z")?.toJSON()).toEqual([null, null, 3n, 5n]);
});
it("should handle null vectors at end of data", async () => {
// https://github.com/lancedb/lancedb/issues/2240
const data = [{ vector: [1, 2, 3] }, { vector: null }];
const db = await connect("memory://");
const table = await db.createTable("my_table", data);
expect(await table.countRows()).toEqual(2);
});
},
);
@@ -460,6 +471,8 @@ describe("When creating an index", () => {
indexType: "IvfPq",
columns: ["vec"],
});
const stats = await tbl.indexStats("vec_idx");
expect(stats?.loss).toBeDefined();
// Search without specifying the column
let rst = await tbl
@@ -620,6 +633,23 @@ describe("When creating an index", () => {
expect(plan2).not.toMatch("LanceScan");
});
it("should be able to run analyze plan", async () => {
await tbl.createIndex("vec");
await tbl.add([
{
id: 300,
vec: Array(32)
.fill(1)
.map(() => Math.random()),
tags: [],
},
]);
const plan = await tbl.query().nearestTo(queryVec).analyzePlan();
expect(plan).toMatch("AnalyzeExec");
expect(plan).toMatch("metrics=");
});
it("should be able to query with row id", async () => {
const results = await tbl
.query()
@@ -720,6 +750,7 @@ describe("When creating an index", () => {
expect(stats?.distanceType).toBeUndefined();
expect(stats?.indexType).toEqual("BTREE");
expect(stats?.numIndices).toEqual(1);
expect(stats?.loss).toBeUndefined();
});
test("when getting stats on non-existent index", async () => {
@@ -727,6 +758,38 @@ describe("When creating an index", () => {
expect(stats).toBeUndefined();
});
test("create ivf_flat with binary vectors", async () => {
const db = await connect(tmpDir.name);
const binarySchema = new Schema([
new Field("id", new Int32(), true),
new Field("vec", new FixedSizeList(32, new Field("item", new Uint8()))),
]);
const tbl = await db.createTable(
"binary",
makeArrowTable(
Array(300)
.fill(1)
.map((_, i) => ({
id: i,
vec: Array(32)
.fill(1)
.map(() => Math.floor(Math.random() * 255)),
})),
{ schema: binarySchema },
),
);
await tbl.createIndex("vec", {
config: Index.ivfFlat({ numPartitions: 10, distanceType: "hamming" }),
});
// query with binary vectors
const queryVec = Array(32)
.fill(1)
.map(() => Math.floor(Math.random() * 255));
const rst = await tbl.query().limit(5).nearestTo(queryVec).toArrow();
expect(rst.numRows).toBe(5);
});
// TODO: Move this test to the query API test (making sure we can reject queries
// when the dimension is incorrect)
test("two columns with different dimensions", async () => {
@@ -804,6 +867,44 @@ describe("When creating an index", () => {
});
});
describe("When querying a table", () => {
let tmpDir: tmp.DirResult;
beforeEach(() => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
});
afterEach(() => tmpDir.removeCallback());
it("should throw an error when timeout is reached", async () => {
const db = await connect(tmpDir.name);
const data = makeArrowTable([
{ text: "a", vector: [0.1, 0.2] },
{ text: "b", vector: [0.3, 0.4] },
]);
const table = await db.createTable("test", data);
await table.createIndex("text", { config: Index.fts() });
await expect(
table.query().where("text != 'a'").toArray({ timeoutMs: 0 }),
).rejects.toThrow("Query timeout");
await expect(
table.query().nearestTo([0.0, 0.0]).toArrow({ timeoutMs: 0 }),
).rejects.toThrow("Query timeout");
await expect(
table.search("a", "fts").toArray({ timeoutMs: 0 }),
).rejects.toThrow("Query timeout");
await expect(
table
.query()
.nearestToText("a")
.nearestTo([0.0, 0.0])
.toArrow({ timeoutMs: 0 }),
).rejects.toThrow("Query timeout");
});
});
describe("Read consistency interval", () => {
let tmpDir: tmp.DirResult;
beforeEach(() => {
@@ -920,6 +1021,93 @@ describe("schema evolution", function () {
new Field("price", new Float64(), true),
]);
expect(await table.schema()).toEqual(expectedSchema2);
await table.alterColumns([
{
path: "vector",
dataType: new FixedSizeList(2, new Field("item", new Float64(), true)),
},
]);
const expectedSchema3 = new Schema([
new Field("new_id", new Int32(), true),
new Field(
"vector",
new FixedSizeList(2, new Field("item", new Float64(), true)),
true,
),
new Field("price", new Float64(), true),
]);
expect(await table.schema()).toEqual(expectedSchema3);
});
it("can cast to various types", async function () {
const con = await connect(tmpDir.name);
// integers
const intTypes = [
new arrow.Int8(),
new arrow.Int16(),
new arrow.Int32(),
new arrow.Int64(),
new arrow.Uint8(),
new arrow.Uint16(),
new arrow.Uint32(),
new arrow.Uint64(),
];
const tableInts = await con.createTable("ints", [{ id: 1n }], {
schema: new Schema([new Field("id", new Int64(), true)]),
});
for (const intType of intTypes) {
await tableInts.alterColumns([{ path: "id", dataType: intType }]);
const schema = new Schema([new Field("id", intType, true)]);
expect(await tableInts.schema()).toEqual(schema);
}
// floats
const floatTypes = [
new arrow.Float16(),
new arrow.Float32(),
new arrow.Float64(),
];
const tableFloats = await con.createTable("floats", [{ val: 2.1 }], {
schema: new Schema([new Field("val", new Float32(), true)]),
});
for (const floatType of floatTypes) {
await tableFloats.alterColumns([{ path: "val", dataType: floatType }]);
const schema = new Schema([new Field("val", floatType, true)]);
expect(await tableFloats.schema()).toEqual(schema);
}
// Lists of floats
const listTypes = [
new arrow.List(new arrow.Field("item", new arrow.Float32(), true)),
new arrow.FixedSizeList(
2,
new arrow.Field("item", new arrow.Float64(), true),
),
new arrow.FixedSizeList(
2,
new arrow.Field("item", new arrow.Float16(), true),
),
new arrow.FixedSizeList(
2,
new arrow.Field("item", new arrow.Float32(), true),
),
];
const tableLists = await con.createTable("lists", [{ val: [2.1, 3.2] }], {
schema: new Schema([
new Field(
"val",
new FixedSizeList(2, new arrow.Field("item", new Float32())),
true,
),
]),
});
for (const listType of listTypes) {
await tableLists.alterColumns([{ path: "val", dataType: listType }]);
const schema = new Schema([new Field("val", listType, true)]);
expect(await tableLists.schema()).toEqual(schema);
}
});
it("can drop a column from the schema", async function () {
@@ -1116,6 +1304,27 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(results[0].text).toBe(data[0].text);
});
test("full text index on list", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: ["lance database", "the", "search"], vector: [0.1, 0.2, 0.3] },
{ text: ["lance database"], vector: [0.4, 0.5, 0.6] },
{ text: ["lance", "search"], vector: [0.7, 0.8, 0.9] },
{ text: ["database", "search"], vector: [1.0, 1.1, 1.2] },
{ text: ["unrelated", "doc"], vector: [1.3, 1.4, 1.5] },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts(),
});
const results = await table.search("lance").toArray();
expect(results.length).toBe(3);
const results2 = await table.search('"lance database"').toArray();
expect(results2.length).toBe(2);
});
test("full text search without positions", async () => {
const db = await connect(tmpDir.name);
const data = [
@@ -1213,6 +1422,30 @@ describe("when calling explainPlan", () => {
});
});
describe("when calling analyzePlan", () => {
let tmpDir: tmp.DirResult;
let table: Table;
let queryVec: number[];
beforeEach(async () => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
const con = await connect(tmpDir.name);
table = await con.createTable("vectors", [{ id: 1, vector: [1.1, 0.9] }]);
});
afterEach(() => {
tmpDir.removeCallback();
});
it("retrieves runtime metrics", async () => {
queryVec = Array(2)
.fill(1)
.map(() => Math.random());
const plan = await table.query().nearestTo(queryVec).analyzePlan();
console.log("Query Plan:\n", plan); // <--- Print the plan
expect(plan).toMatch("AnalyzeExec");
});
});
describe("column name options", () => {
let tmpDir: tmp.DirResult;
let table: Table;

View File

@@ -132,6 +132,17 @@ test("basic table examples", async () => {
},
]);
// --8<-- [end:alter_columns]
// --8<-- [start:alter_columns_vector]
await tbl.alterColumns([
{
path: "vector",
dataType: new arrow.FixedSizeList(
2,
new arrow.Field("item", new arrow.Float16(), false),
),
},
]);
// --8<-- [end:alter_columns_vector]
// --8<-- [start:drop_columns]
await tbl.dropColumns(["dbl_price"]);
// --8<-- [end:drop_columns]
@@ -191,5 +202,35 @@ test("basic table examples", async () => {
// --8<-- [end:create_f16_table]
await db.dropTable("f16_tbl");
}
const uri = databaseDir;
await db.createTable("my_table", [{ id: 1 }, { id: 2 }]);
{
// --8<-- [start:table_strong_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: 0 });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_strong_consistency]
}
{
// --8<-- [start:table_eventual_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: 5 });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_eventual_consistency]
}
{
// --8<-- [start:table_no_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: null });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_no_consistency]
}
{
// --8<-- [start:table_checkout_latest]
const tbl = await db.openTable("my_table");
// (Other writes happen to test_table_async from another process)
// Check for updates
tbl.checkoutLatest();
// --8<-- [end:table_checkout_latest]
}
});
});

View File

@@ -4,9 +4,12 @@ import { expect, test } from "@jest/globals";
// --8<-- [start:import]
import * as lancedb from "@lancedb/lancedb";
// --8<-- [end:import]
// --8<-- [start:import_bin_util]
import { Field, FixedSizeList, Int32, Schema, Uint8 } from "apache-arrow";
// --8<-- [end:import_bin_util]
import { withTempDirectory } from "./util.ts";
test("full text search", async () => {
test("vector search", async () => {
await withTempDirectory(async (databaseDir) => {
{
const db = await lancedb.connect(databaseDir);
@@ -14,8 +17,6 @@ test("full text search", async () => {
const data = Array.from({ length: 10_000 }, (_, i) => ({
vector: Array(128).fill(i),
id: `${i}`,
content: "",
longId: `${i}`,
}));
await db.createTable("my_vectors", data);
@@ -52,5 +53,41 @@ test("full text search", async () => {
expect(r.distance).toBeGreaterThanOrEqual(0.1);
expect(r.distance).toBeLessThan(0.2);
}
{
// --8<-- [start:ingest_binary_data]
const schema = new Schema([
new Field("id", new Int32(), true),
new Field("vec", new FixedSizeList(32, new Field("item", new Uint8()))),
]);
const data = lancedb.makeArrowTable(
Array(1_000)
.fill(0)
.map((_, i) => ({
// the 256 bits would be store in 32 bytes,
// if your data is already in this format, you can skip the packBits step
id: i,
vec: lancedb.packBits(Array(256).fill(i % 2)),
})),
{ schema: schema },
);
const tbl = await db.createTable("binary_table", data);
await tbl.createIndex("vec", {
config: lancedb.Index.ivfFlat({
numPartitions: 10,
distanceType: "hamming",
}),
});
// --8<-- [end:ingest_binary_data]
// --8<-- [start:search_binary_data]
const query = Array(32)
.fill(1)
.map(() => Math.floor(Math.random() * 255));
const results = await tbl.query().nearestTo(query).limit(10).toArrow();
// --8<-- [end:search_binary_data
expect(results.numRows).toBe(10);
}
});
});

View File

@@ -8,7 +8,11 @@ import {
Bool,
BufferType,
DataType,
DateUnit,
Date_,
Decimal,
Dictionary,
Duration,
Field,
FixedSizeBinary,
FixedSizeList,
@@ -21,19 +25,22 @@ import {
LargeBinary,
List,
Null,
Precision,
RecordBatch,
RecordBatchFileReader,
RecordBatchFileWriter,
RecordBatchStreamWriter,
Schema,
Struct,
Timestamp,
Type,
Utf8,
Vector,
makeVector as arrowMakeVector,
vectorFromArray as badVectorFromArray,
makeBuilder,
makeData,
makeTable,
vectorFromArray,
} from "apache-arrow";
import { Buffers } from "apache-arrow/data";
import { type EmbeddingFunction } from "./embedding/embedding_function";
@@ -179,6 +186,21 @@ export class VectorColumnOptions {
}
}
// biome-ignore lint/suspicious/noExplicitAny: skip
function vectorFromArray(data: any, type?: DataType) {
// Workaround for: https://github.com/apache/arrow/issues/45862
// If FSL type with float
if (DataType.isFixedSizeList(type) && DataType.isFloat(type.valueType)) {
const extendedData = [...data, new Array(type.listSize).fill(0.0)];
const array = badVectorFromArray(extendedData, type);
return array.slice(0, data.length);
} else if (type === undefined) {
return badVectorFromArray(data);
} else {
return badVectorFromArray(data, type);
}
}
/** Options to control the makeArrowTable call. */
export class MakeArrowTableOptions {
/*
@@ -1170,3 +1192,137 @@ function validateSchemaEmbeddings(
return new Schema(fields, schema.metadata);
}
interface JsonDataType {
type: string;
fields?: JsonField[];
length?: number;
}
interface JsonField {
name: string;
type: JsonDataType;
nullable: boolean;
metadata: Map<string, string>;
}
// Matches format of https://github.com/lancedb/lance/blob/main/rust/lance/src/arrow/json.rs
export function dataTypeToJson(dataType: DataType): JsonDataType {
switch (dataType.typeId) {
// For primitives, matches https://github.com/lancedb/lance/blob/e12bb9eff2a52f753668d4b62c52e4d72b10d294/rust/lance-core/src/datatypes.rs#L185
case Type.Null:
return { type: "null" };
case Type.Bool:
return { type: "bool" };
case Type.Int8:
return { type: "int8" };
case Type.Int16:
return { type: "int16" };
case Type.Int32:
return { type: "int32" };
case Type.Int64:
return { type: "int64" };
case Type.Uint8:
return { type: "uint8" };
case Type.Uint16:
return { type: "uint16" };
case Type.Uint32:
return { type: "uint32" };
case Type.Uint64:
return { type: "uint64" };
case Type.Int: {
const bitWidth = (dataType as Int).bitWidth;
const signed = (dataType as Int).isSigned;
const prefix = signed ? "" : "u";
return { type: `${prefix}int${bitWidth}` };
}
case Type.Float: {
switch ((dataType as Float).precision) {
case Precision.HALF:
return { type: "halffloat" };
case Precision.SINGLE:
return { type: "float" };
case Precision.DOUBLE:
return { type: "double" };
}
throw Error("Unsupported float precision");
}
case Type.Float16:
return { type: "halffloat" };
case Type.Float32:
return { type: "float" };
case Type.Float64:
return { type: "double" };
case Type.Utf8:
return { type: "string" };
case Type.Binary:
return { type: "binary" };
case Type.LargeUtf8:
return { type: "large_string" };
case Type.LargeBinary:
return { type: "large_binary" };
case Type.List:
return {
type: "list",
fields: [fieldToJson((dataType as List).children[0])],
};
case Type.FixedSizeList: {
const fixedSizeList = dataType as FixedSizeList;
return {
type: "fixed_size_list",
fields: [fieldToJson(fixedSizeList.children[0])],
length: fixedSizeList.listSize,
};
}
case Type.Struct:
return {
type: "struct",
fields: (dataType as Struct).children.map(fieldToJson),
};
case Type.Date: {
const unit = (dataType as Date_).unit;
return {
type: unit === DateUnit.DAY ? "date32:day" : "date64:ms",
};
}
case Type.Timestamp: {
const timestamp = dataType as Timestamp;
const timezone = timestamp.timezone || "-";
return {
type: `timestamp:${timestamp.unit}:${timezone}`,
};
}
case Type.Decimal: {
const decimal = dataType as Decimal;
return {
type: `decimal:${decimal.bitWidth}:${decimal.precision}:${decimal.scale}`,
};
}
case Type.Duration: {
const duration = dataType as Duration;
return { type: `duration:${duration.unit}` };
}
case Type.FixedSizeBinary: {
const byteWidth = (dataType as FixedSizeBinary).byteWidth;
return { type: `fixed_size_binary:${byteWidth}` };
}
case Type.Dictionary: {
const dict = dataType as Dictionary;
const indexType = dataTypeToJson(dict.indices);
const valueType = dataTypeToJson(dict.valueType);
return {
type: `dict:${valueType.type}:${indexType.type}:false`,
};
}
}
throw new Error("Unsupported data type");
}
function fieldToJson(field: Field): JsonField {
return {
name: field.name,
type: dataTypeToJson(field.type),
nullable: field.nullable,
metadata: field.metadata,
};
}

View File

@@ -14,7 +14,6 @@ import {
export {
AddColumnsSql,
ColumnAlteration,
ConnectionOptions,
IndexStatistics,
IndexConfig,
@@ -48,12 +47,19 @@ export {
QueryExecutionOptions,
FullTextSearchOptions,
RecordBatchIterator,
FullTextQuery,
MatchQuery,
PhraseQuery,
BoostQuery,
MultiMatchQuery,
FullTextQueryType,
} from "./query";
export {
Index,
IndexOptions,
IvfPqOptions,
IvfFlatOptions,
HnswPqOptions,
HnswSqOptions,
FtsOptions,
@@ -65,6 +71,7 @@ export {
UpdateOptions,
OptimizeOptions,
Version,
ColumnAlteration,
} from "./table";
export { MergeInsertBuilder } from "./merge";
@@ -79,7 +86,7 @@ export {
DataLike,
IntoVector,
} from "./arrow";
export { IntoSql } from "./util";
export { IntoSql, packBits } from "./util";
/**
* Connect to a LanceDB instance at the given URI.

View File

@@ -62,13 +62,13 @@ export interface IvfPqOptions {
*
* "l2" - Euclidean distance. This is a very common distance metric that
* accounts for both magnitude and direction when determining the distance
* between vectors. L2 distance has a range of [0, ∞).
* between vectors. l2 distance has a range of [0, ∞).
*
* "cosine" - Cosine distance. Cosine distance is a distance metric
* calculated from the cosine similarity between two vectors. Cosine
* similarity is a measure of similarity between two non-zero vectors of an
* inner product space. It is defined to equal the cosine of the angle
* between them. Unlike L2, the cosine distance is not affected by the
* between them. Unlike l2, the cosine distance is not affected by the
* magnitude of the vectors. Cosine distance has a range of [0, 2].
*
* Note: the cosine distance is undefined when one (or both) of the vectors
@@ -77,7 +77,7 @@ export interface IvfPqOptions {
*
* "dot" - Dot product. Dot distance is the dot product of two vectors. Dot
* distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
* L2 norm is 1), then dot distance is equivalent to the cosine distance.
* l2 norm is 1), then dot distance is equivalent to the cosine distance.
*/
distanceType?: "l2" | "cosine" | "dot";
@@ -125,18 +125,18 @@ export interface HnswPqOptions {
*
* "l2" - Euclidean distance. This is a very common distance metric that
* accounts for both magnitude and direction when determining the distance
* between vectors. L2 distance has a range of [0, ∞).
* between vectors. l2 distance has a range of [0, ∞).
*
* "cosine" - Cosine distance. Cosine distance is a distance metric
* calculated from the cosine similarity between two vectors. Cosine
* similarity is a measure of similarity between two non-zero vectors of an
* inner product space. It is defined to equal the cosine of the angle
* between them. Unlike L2, the cosine distance is not affected by the
* between them. Unlike l2, the cosine distance is not affected by the
* magnitude of the vectors. Cosine distance has a range of [0, 2].
*
* "dot" - Dot product. Dot distance is the dot product of two vectors. Dot
* distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
* L2 norm is 1), then dot distance is equivalent to the cosine distance.
* l2 norm is 1), then dot distance is equivalent to the cosine distance.
*/
distanceType?: "l2" | "cosine" | "dot";
@@ -241,18 +241,18 @@ export interface HnswSqOptions {
*
* "l2" - Euclidean distance. This is a very common distance metric that
* accounts for both magnitude and direction when determining the distance
* between vectors. L2 distance has a range of [0, ∞).
* between vectors. l2 distance has a range of [0, ∞).
*
* "cosine" - Cosine distance. Cosine distance is a distance metric
* calculated from the cosine similarity between two vectors. Cosine
* similarity is a measure of similarity between two non-zero vectors of an
* inner product space. It is defined to equal the cosine of the angle
* between them. Unlike L2, the cosine distance is not affected by the
* between them. Unlike l2, the cosine distance is not affected by the
* magnitude of the vectors. Cosine distance has a range of [0, 2].
*
* "dot" - Dot product. Dot distance is the dot product of two vectors. Dot
* distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
* L2 norm is 1), then dot distance is equivalent to the cosine distance.
* l2 norm is 1), then dot distance is equivalent to the cosine distance.
*/
distanceType?: "l2" | "cosine" | "dot";
@@ -327,6 +327,94 @@ export interface HnswSqOptions {
efConstruction?: number;
}
/**
* Options to create an `IVF_FLAT` index
*/
export interface IvfFlatOptions {
/**
* The number of IVF partitions to create.
*
* This value should generally scale with the number of rows in the dataset.
* By default the number of partitions is the square root of the number of
* rows.
*
* If this value is too large then the first part of the search (picking the
* right partition) will be slow. If this value is too small then the second
* part of the search (searching within a partition) will be slow.
*/
numPartitions?: number;
/**
* Distance type to use to build the index.
*
* Default value is "l2".
*
* This is used when training the index to calculate the IVF partitions
* (vectors are grouped in partitions with similar vectors according to this
* distance type).
*
* The distance type used to train an index MUST match the distance type used
* to search the index. Failure to do so will yield inaccurate results.
*
* The following distance types are available:
*
* "l2" - Euclidean distance. This is a very common distance metric that
* accounts for both magnitude and direction when determining the distance
* between vectors. l2 distance has a range of [0, ∞).
*
* "cosine" - Cosine distance. Cosine distance is a distance metric
* calculated from the cosine similarity between two vectors. Cosine
* similarity is a measure of similarity between two non-zero vectors of an
* inner product space. It is defined to equal the cosine of the angle
* between them. Unlike l2, the cosine distance is not affected by the
* magnitude of the vectors. Cosine distance has a range of [0, 2].
*
* Note: the cosine distance is undefined when one (or both) of the vectors
* are all zeros (there is no direction). These vectors are invalid and may
* never be returned from a vector search.
*
* "dot" - Dot product. Dot distance is the dot product of two vectors. Dot
* distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
* l2 norm is 1), then dot distance is equivalent to the cosine distance.
*
* "hamming" - Hamming distance. Hamming distance is a distance metric
* calculated from the number of bits that are different between two vectors.
* Hamming distance has a range of [0, dimension]. Note that the hamming distance
* is only valid for binary vectors.
*/
distanceType?: "l2" | "cosine" | "dot" | "hamming";
/**
* Max iteration to train IVF kmeans.
*
* When training an IVF FLAT index we use kmeans to calculate the partitions. This parameter
* controls how many iterations of kmeans to run.
*
* Increasing this might improve the quality of the index but in most cases these extra
* iterations have diminishing returns.
*
* The default value is 50.
*/
maxIterations?: number;
/**
* The number of vectors, per partition, to sample when training IVF kmeans.
*
* When an IVF FLAT index is trained, we need to calculate partitions. These are groups
* of vectors that are similar to each other. To do this we use an algorithm called kmeans.
*
* Running kmeans on a large dataset can be slow. To speed this up we run kmeans on a
* random sample of the data. This parameter controls the size of the sample. The total
* number of vectors used to train the index is `sample_rate * num_partitions`.
*
* Increasing this value might improve the quality of the index but in most cases the
* default should be sufficient.
*
* The default value is 256.
*/
sampleRate?: number;
}
/**
* Options to create a full text search index
*/
@@ -426,6 +514,33 @@ export class Index {
);
}
/**
* Create an IvfFlat index
*
* This index groups vectors into partitions of similar vectors. Each partition keeps track of
* a centroid which is the average value of all vectors in the group.
*
* During a query the centroids are compared with the query vector to find the closest
* partitions. The vectors in these partitions are then searched to find
* the closest vectors.
*
* The partitioning process is called IVF and the `num_partitions` parameter controls how
* many groups to create.
*
* Note that training an IVF FLAT index on a large dataset is a slow operation and
* currently is also a memory intensive operation.
*/
static ivfFlat(options?: Partial<IvfFlatOptions>) {
return new Index(
LanceDbIndex.ivfFlat(
options?.distanceType,
options?.numPartitions,
options?.maxIterations,
options?.sampleRate,
),
);
}
/**
* Create a btree index
*

View File

@@ -17,6 +17,7 @@ import {
VectorQuery as NativeVectorQuery,
} from "./native";
import { Reranker } from "./rerankers";
export class RecordBatchIterator implements AsyncIterator<RecordBatch> {
private promisedInner?: Promise<NativeBatchIterator>;
private inner?: NativeBatchIterator;
@@ -62,7 +63,7 @@ class RecordBatchIterable<
// biome-ignore lint/suspicious/noExplicitAny: skip
[Symbol.asyncIterator](): AsyncIterator<RecordBatch<any>, any, undefined> {
return new RecordBatchIterator(
this.inner.execute(this.options?.maxBatchLength),
this.inner.execute(this.options?.maxBatchLength, this.options?.timeoutMs),
);
}
}
@@ -78,6 +79,11 @@ export interface QueryExecutionOptions {
* in smaller chunks.
*/
maxBatchLength?: number;
/**
* Timeout for query execution in milliseconds
*/
timeoutMs?: number;
}
/**
@@ -152,7 +158,7 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
}
fullTextSearch(
query: string,
query: string | FullTextQuery,
options?: Partial<FullTextSearchOptions>,
): this {
let columns: string[] | null = null;
@@ -164,9 +170,18 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
}
}
this.doCall((inner: NativeQueryType) =>
inner.fullTextSearch(query, columns),
);
this.doCall((inner: NativeQueryType) => {
if (typeof query === "string") {
inner.fullTextSearch({
query: query,
columns: columns,
});
} else {
// If query is a FullTextQuery object, convert it to a dict
const queryObj = query.toDict();
inner.fullTextSearch(queryObj);
}
});
return this;
}
@@ -273,9 +288,11 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
options?: Partial<QueryExecutionOptions>,
): Promise<NativeBatchIterator> {
if (this.inner instanceof Promise) {
return this.inner.then((inner) => inner.execute(options?.maxBatchLength));
return this.inner.then((inner) =>
inner.execute(options?.maxBatchLength, options?.timeoutMs),
);
} else {
return this.inner.execute(options?.maxBatchLength);
return this.inner.execute(options?.maxBatchLength, options?.timeoutMs);
}
}
@@ -348,6 +365,43 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
return this.inner.explainPlan(verbose);
}
}
/**
* Executes the query and returns the physical query plan annotated with runtime metrics.
*
* This is useful for debugging and performance analysis, as it shows how the query was executed
* and includes metrics such as elapsed time, rows processed, and I/O statistics.
*
* @example
* import * as lancedb from "@lancedb/lancedb"
*
* const db = await lancedb.connect("./.lancedb");
* const table = await db.createTable("my_table", [
* { vector: [1.1, 0.9], id: "1" },
* ]);
*
* const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
*
* Example output (with runtime metrics inlined):
* AnalyzeExec verbose=true, metrics=[]
* ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
* Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
* CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
* GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
* FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
* SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
* KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
* LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
*
* @returns A query execution plan with runtime metrics for each step.
*/
async analyzePlan(): Promise<string> {
if (this.inner instanceof Promise) {
return this.inner.then((inner) => inner.analyzePlan());
} else {
return this.inner.analyzePlan();
}
}
}
/**
@@ -681,8 +735,167 @@ export class Query extends QueryBase<NativeQuery> {
}
}
nearestToText(query: string, columns?: string[]): Query {
this.doCall((inner) => inner.fullTextSearch(query, columns));
nearestToText(query: string | FullTextQuery, columns?: string[]): Query {
this.doCall((inner) => {
if (typeof query === "string") {
inner.fullTextSearch({
query: query,
columns: columns,
});
} else {
const queryObj = query.toDict();
inner.fullTextSearch(queryObj);
}
});
return this;
}
}
/**
* Enum representing the types of full-text queries supported.
*
* - `Match`: Performs a full-text search for terms in the query string.
* - `MatchPhrase`: Searches for an exact phrase match in the text.
* - `Boost`: Boosts the relevance score of specific terms in the query.
* - `MultiMatch`: Searches across multiple fields for the query terms.
*/
export enum FullTextQueryType {
Match = "match",
MatchPhrase = "match_phrase",
Boost = "boost",
MultiMatch = "multi_match",
}
/**
* Represents a full-text query interface.
* This interface defines the structure and behavior for full-text queries,
* including methods to retrieve the query type and convert the query to a dictionary format.
*/
export interface FullTextQuery {
queryType(): FullTextQueryType;
toDict(): Record<string, unknown>;
}
export class MatchQuery implements FullTextQuery {
/**
* Creates an instance of MatchQuery.
*
* @param query - The text query to search for.
* @param column - The name of the column to search within.
* @param boost - (Optional) The boost factor to influence the relevance score of this query. Default is `1.0`.
* @param fuzziness - (Optional) The allowed edit distance for fuzzy matching. Default is `0`.
* @param maxExpansions - (Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`.
*/
constructor(
private query: string,
private column: string,
private boost: number = 1.0,
private fuzziness: number = 0,
private maxExpansions: number = 50,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.Match;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
[this.column]: {
query: this.query,
boost: this.boost,
fuzziness: this.fuzziness,
// biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs
max_expansions: this.maxExpansions,
},
},
};
}
}
export class PhraseQuery implements FullTextQuery {
/**
* Creates an instance of `PhraseQuery`.
*
* @param query - The phrase to search for in the specified column.
* @param column - The name of the column to search within.
*/
constructor(
private query: string,
private column: string,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.MatchPhrase;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
[this.column]: this.query,
},
};
}
}
export class BoostQuery implements FullTextQuery {
/**
* Creates an instance of BoostQuery.
*
* @param positive - The positive query that boosts the relevance score.
* @param negative - The negative query that reduces the relevance score.
* @param negativeBoost - The factor by which the negative query reduces the score.
*/
constructor(
private positive: FullTextQuery,
private negative: FullTextQuery,
private negativeBoost: number,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.Boost;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
positive: this.positive.toDict(),
negative: this.negative.toDict(),
// biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs
negative_boost: this.negativeBoost,
},
};
}
}
export class MultiMatchQuery implements FullTextQuery {
/**
* Creates an instance of MultiMatchQuery.
*
* @param query - The text query to search for across multiple columns.
* @param columns - An array of column names to search within.
* @param boosts - (Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column.
*
* The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0.
* If the length of `boosts` is less than `columns`, it will be padded with 1.0s.
*/
constructor(
private query: string,
private columns: string[],
private boosts: number[] = columns.map(() => 1.0),
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.MultiMatch;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
query: this.query,
columns: this.columns,
boost: this.boosts,
},
};
}
}

View File

@@ -4,8 +4,10 @@
import {
Table as ArrowTable,
Data,
DataType,
IntoVector,
Schema,
dataTypeToJson,
fromDataToBuffer,
tableFromIPC,
} from "./arrow";
@@ -15,13 +17,13 @@ import { IndexOptions } from "./indices";
import { MergeInsertBuilder } from "./merge";
import {
AddColumnsSql,
ColumnAlteration,
IndexConfig,
IndexStatistics,
OptimizeStats,
Table as _NativeTable,
} from "./native";
import { Query, VectorQuery } from "./query";
import { sanitizeType } from "./sanitize";
import { IntoSql, toSQL } from "./util";
export { IndexConfig } from "./native";
@@ -618,7 +620,27 @@ export class LocalTable extends Table {
}
async alterColumns(columnAlterations: ColumnAlteration[]): Promise<void> {
await this.inner.alterColumns(columnAlterations);
const processedAlterations = columnAlterations.map((alteration) => {
if (typeof alteration.dataType === "string") {
return {
...alteration,
dataType: JSON.stringify({ type: alteration.dataType }),
};
} else if (alteration.dataType === undefined) {
return {
...alteration,
dataType: undefined,
};
} else {
const dataType = sanitizeType(alteration.dataType);
return {
...alteration,
dataType: JSON.stringify(dataTypeToJson(dataType)),
};
}
});
await this.inner.alterColumns(processedAlterations);
}
async dropColumns(columnNames: string[]): Promise<void> {
@@ -711,3 +733,38 @@ export class LocalTable extends Table {
await this.inner.migrateManifestPathsV2();
}
}
/**
* A definition of a column alteration. The alteration changes the column at
* `path` to have the new name `name`, to be nullable if `nullable` is true,
* and to have the data type `data_type`. At least one of `rename` or `nullable`
* must be provided.
*/
export interface ColumnAlteration {
/**
* The path to the column to alter. This is a dot-separated path to the column.
* If it is a top-level column then it is just the name of the column. If it is
* a nested column then it is the path to the column, e.g. "a.b.c" for a column
* `c` nested inside a column `b` nested inside a column `a`.
*/
path: string;
/**
* The new name of the column. If not provided then the name will not be changed.
* This must be distinct from the names of all other columns in the table.
*/
rename?: string;
/**
* A new data type for the column. If not provided then the data type will not be changed.
* Changing data types is limited to casting to the same general type. For example, these
* changes are valid:
* * `int32` -> `int64` (integers)
* * `double` -> `float` (floats)
* * `string` -> `large_string` (strings)
* But these changes are not:
* * `int32` -> `double` (mix integers and floats)
* * `string` -> `int32` (mix strings and integers)
*/
dataType?: string | DataType;
/** Set the new nullability. Note that a nullable column cannot be made non-nullable. */
nullable?: boolean;
}

View File

@@ -35,6 +35,16 @@ export function toSQL(value: IntoSql): string {
}
}
export function packBits(data: Array<number>): Array<number> {
const packed = Array(data.length >> 3).fill(0);
for (let i = 0; i < data.length; i++) {
const byte = i >> 3;
const bit = i & 7;
packed[byte] |= data[i] << bit;
}
return packed;
}
export class TTLCache {
// biome-ignore lint/suspicious/noExplicitAny: <explanation>
private readonly cache: Map<string, { value: any; expires: number }>;

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

252
nodejs/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"cpu": [
"x64",
"arm64"
@@ -2304,89 +2304,20 @@
}
},
"node_modules/@babel/code-frame": {
"version": "7.23.5",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.23.5.tgz",
"integrity": "sha512-CgH3s1a96LipHCmSUmYFPwY7MNx8C3avkq7i4Wl3cfa662ldtUe4VM1TPXX70pfmrlWTb6jLqTYrZyT2ZTJBgA==",
"version": "7.26.2",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.26.2.tgz",
"integrity": "sha512-RJlIHRueQgwWitWgF8OdFYGZX328Ax5BCemNGlqHfplnRT9ESi8JkFlvaVYbS+UubVY6dpv87Fs2u5M29iNFVQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/highlight": "^7.23.4",
"chalk": "^2.4.2"
"@babel/helper-validator-identifier": "^7.25.9",
"js-tokens": "^4.0.0",
"picocolors": "^1.0.0"
},
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/code-frame/node_modules/ansi-styles": {
"version": "3.2.1",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz",
"integrity": "sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
"dev": true,
"dependencies": {
"color-convert": "^1.9.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/code-frame/node_modules/chalk": {
"version": "2.4.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz",
"integrity": "sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
"dev": true,
"dependencies": {
"ansi-styles": "^3.2.1",
"escape-string-regexp": "^1.0.5",
"supports-color": "^5.3.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/code-frame/node_modules/color-convert": {
"version": "1.9.3",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz",
"integrity": "sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==",
"dev": true,
"dependencies": {
"color-name": "1.1.3"
}
},
"node_modules/@babel/code-frame/node_modules/color-name": {
"version": "1.1.3",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.3.tgz",
"integrity": "sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw==",
"dev": true
},
"node_modules/@babel/code-frame/node_modules/escape-string-regexp": {
"version": "1.0.5",
"resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-1.0.5.tgz",
"integrity": "sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==",
"dev": true,
"engines": {
"node": ">=0.8.0"
}
},
"node_modules/@babel/code-frame/node_modules/has-flag": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz",
"integrity": "sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==",
"dev": true,
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/code-frame/node_modules/supports-color": {
"version": "5.5.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-5.5.0.tgz",
"integrity": "sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==",
"dev": true,
"dependencies": {
"has-flag": "^3.0.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/compat-data": {
"version": "7.23.5",
"resolved": "https://registry.npmjs.org/@babel/compat-data/-/compat-data-7.23.5.tgz",
@@ -2589,19 +2520,21 @@
}
},
"node_modules/@babel/helper-string-parser": {
"version": "7.23.4",
"resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.23.4.tgz",
"integrity": "sha512-803gmbQdqwdf4olxrX4AJyFBV/RTr3rSmOj0rKwesmzlfhYNDEs+/iOcznzpNWlJlIlTJC2QfPFcHB6DlzdVLQ==",
"version": "7.25.9",
"resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.25.9.tgz",
"integrity": "sha512-4A/SCr/2KLd5jrtOMFzaKjVtAei3+2r/NChoBNoZ3EyP/+GlhoaEGoWOZUmFmoITP7zOJyHIMm+DYRd8o3PvHA==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/helper-validator-identifier": {
"version": "7.22.20",
"resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.22.20.tgz",
"integrity": "sha512-Y4OZ+ytlatR8AI+8KZfKuL5urKp7qey08ha31L8b3BwewJAoJamTzyvxPR/5D+KkdJCGPq/+8TukHBlY10FX9A==",
"version": "7.25.9",
"resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.25.9.tgz",
"integrity": "sha512-Ed61U6XJc3CVRfkERJWDz4dJwKe7iLmmJsbOGu9wSloNSFttHV0I8g6UAgb7qnK5ly5bGLPd4oXZlxCdANBOWQ==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.9.0"
}
@@ -2616,109 +2549,28 @@
}
},
"node_modules/@babel/helpers": {
"version": "7.23.8",
"resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.23.8.tgz",
"integrity": "sha512-KDqYz4PiOWvDFrdHLPhKtCThtIcKVy6avWD2oG4GEvyQ+XDZwHD4YQd+H2vNMnq2rkdxsDkU82T+Vk8U/WXHRQ==",
"version": "7.27.0",
"resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.27.0.tgz",
"integrity": "sha512-U5eyP/CTFPuNE3qk+WZMxFkp/4zUzdceQlfzf7DdGdhp+Fezd7HD+i8Y24ZuTMKX3wQBld449jijbGq6OdGNQg==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/template": "^7.22.15",
"@babel/traverse": "^7.23.7",
"@babel/types": "^7.23.6"
"@babel/template": "^7.27.0",
"@babel/types": "^7.27.0"
},
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/highlight": {
"version": "7.23.4",
"resolved": "https://registry.npmjs.org/@babel/highlight/-/highlight-7.23.4.tgz",
"integrity": "sha512-acGdbYSfp2WheJoJm/EBBBLh/ID8KDc64ISZ9DYtBmC8/Q204PZJLHyzeB5qMzJ5trcOkybd78M4x2KWsUq++A==",
"dev": true,
"dependencies": {
"@babel/helper-validator-identifier": "^7.22.20",
"chalk": "^2.4.2",
"js-tokens": "^4.0.0"
},
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/highlight/node_modules/ansi-styles": {
"version": "3.2.1",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz",
"integrity": "sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
"dev": true,
"dependencies": {
"color-convert": "^1.9.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/highlight/node_modules/chalk": {
"version": "2.4.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz",
"integrity": "sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
"dev": true,
"dependencies": {
"ansi-styles": "^3.2.1",
"escape-string-regexp": "^1.0.5",
"supports-color": "^5.3.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/highlight/node_modules/color-convert": {
"version": "1.9.3",
"resolved": "https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz",
"integrity": "sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==",
"dev": true,
"dependencies": {
"color-name": "1.1.3"
}
},
"node_modules/@babel/highlight/node_modules/color-name": {
"version": "1.1.3",
"resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.3.tgz",
"integrity": "sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw==",
"dev": true
},
"node_modules/@babel/highlight/node_modules/escape-string-regexp": {
"version": "1.0.5",
"resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-1.0.5.tgz",
"integrity": "sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==",
"dev": true,
"engines": {
"node": ">=0.8.0"
}
},
"node_modules/@babel/highlight/node_modules/has-flag": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz",
"integrity": "sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==",
"dev": true,
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/highlight/node_modules/supports-color": {
"version": "5.5.0",
"resolved": "https://registry.npmjs.org/supports-color/-/supports-color-5.5.0.tgz",
"integrity": "sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==",
"dev": true,
"dependencies": {
"has-flag": "^3.0.0"
},
"engines": {
"node": ">=4"
}
},
"node_modules/@babel/parser": {
"version": "7.23.6",
"resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.23.6.tgz",
"integrity": "sha512-Z2uID7YJ7oNvAI20O9X0bblw7Qqs8Q2hFy0R9tAfnfLkp5MW0UH9eUvnDSnFwKZ0AvgS1ucqR4KzvVHgnke1VQ==",
"version": "7.27.0",
"resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.27.0.tgz",
"integrity": "sha512-iaepho73/2Pz7w2eMS0Q5f83+0RKI7i4xmiYeBmDzfRVbQtTOG7Ts0S4HzJVsTMGI9keU8rNfuZr8DKfSt7Yyg==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/types": "^7.27.0"
},
"bin": {
"parser": "bin/babel-parser.js"
},
@@ -2904,14 +2756,15 @@
}
},
"node_modules/@babel/template": {
"version": "7.22.15",
"resolved": "https://registry.npmjs.org/@babel/template/-/template-7.22.15.tgz",
"integrity": "sha512-QPErUVm4uyJa60rkI73qneDacvdvzxshT3kksGqlGWYdOTIUOwJ7RDUL8sGqslY1uXWSL6xMFKEXDS3ox2uF0w==",
"version": "7.27.0",
"resolved": "https://registry.npmjs.org/@babel/template/-/template-7.27.0.tgz",
"integrity": "sha512-2ncevenBqXI6qRMukPlXwHKHchC7RyMuu4xv5JBXRfOGVcTy1mXCD12qrp7Jsoxll1EV3+9sE4GugBVRjT2jFA==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/code-frame": "^7.22.13",
"@babel/parser": "^7.22.15",
"@babel/types": "^7.22.15"
"@babel/code-frame": "^7.26.2",
"@babel/parser": "^7.27.0",
"@babel/types": "^7.27.0"
},
"engines": {
"node": ">=6.9.0"
@@ -2948,14 +2801,14 @@
}
},
"node_modules/@babel/types": {
"version": "7.23.6",
"resolved": "https://registry.npmjs.org/@babel/types/-/types-7.23.6.tgz",
"integrity": "sha512-+uarb83brBzPKN38NX1MkB6vb6+mwvR6amUulqAE7ccQw1pEl+bCia9TbdG1lsnFP7lZySvUn37CHyXQdfTwzg==",
"version": "7.27.0",
"resolved": "https://registry.npmjs.org/@babel/types/-/types-7.27.0.tgz",
"integrity": "sha512-H45s8fVLYjbhFH62dIJ3WtmJ6RSPt/3DRO0ZcT2SUiYiQyz3BLVb9ADEnLl91m74aQPS3AzzeajZHYOalWe3bg==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/helper-string-parser": "^7.23.4",
"@babel/helper-validator-identifier": "^7.22.20",
"to-fast-properties": "^2.0.0"
"@babel/helper-string-parser": "^7.25.9",
"@babel/helper-validator-identifier": "^7.25.9"
},
"engines": {
"node": ">=6.9.0"
@@ -5550,10 +5403,11 @@
"devOptional": true
},
"node_modules/axios": {
"version": "1.7.7",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.7.7.tgz",
"integrity": "sha512-S4kL7XrjgBmvdGut0sN3yJxqYzrDOnivkBiN0OFs6hLiUam3UPvswUo0kqGyhqUZGEOytHyumEdXsAkgCOUf3Q==",
"version": "1.8.4",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.8.4.tgz",
"integrity": "sha512-eBSYY4Y68NNlHbHBMdeDmKNtDgXWhQsJcGqzO3iLUM0GraQFSS9cVgPX5I9b3lbdFKyYoAEGAZF1DwhTaljNAw==",
"dev": true,
"license": "MIT",
"dependencies": {
"follow-redirects": "^1.15.6",
"form-data": "^4.0.0",
@@ -7869,7 +7723,8 @@
"version": "4.0.0",
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
"integrity": "sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==",
"dev": true
"dev": true,
"license": "MIT"
},
"node_modules/js-yaml": {
"version": "3.14.1",
@@ -9360,15 +9215,6 @@
"integrity": "sha512-3f0uOEAQwIqGuWW2MVzYg8fV/QNnc/IpuJNG837rLuczAaLVHslWHZQj4IGiEl5Hs3kkbhwL9Ab7Hrsmuj+Smw==",
"dev": true
},
"node_modules/to-fast-properties": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/to-fast-properties/-/to-fast-properties-2.0.0.tgz",
"integrity": "sha512-/OaKK0xYrs3DmxRYqL/yDc+FxFUVYhDlXMhRmv3z915w2HF1tnN1omB354j8VUGO/hbRzyD6Y3sA7v7GS/ceog==",
"dev": true,
"engines": {
"node": ">=4"
}
},
"node_modules/to-regex-range": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/to-regex-range/-/to-regex-range-5.0.1.tgz",

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.18.0-beta.0",
"version": "0.19.0-beta.5",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
@@ -74,8 +74,10 @@
"artifacts": "napi artifacts",
"build:debug": "napi build --platform --no-const-enum --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
"build:release": "napi build --platform --no-const-enum --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
"build": "npm run build:debug && tsc -b && shx cp lancedb/native.d.ts dist/native.d.ts && shx cp lancedb/*.node dist/",
"build-release": "npm run build:release && tsc -b && shx cp lancedb/native.d.ts dist/native.d.ts",
"build": "npm run build:debug && npm run tsc && shx cp lancedb/*.node dist/",
"build-release": "npm run build:release && npm run tsc",
"tsc": "tsc -b",
"posttsc": "shx cp lancedb/native.d.ts dist/native.d.ts",
"lint-ci": "biome ci .",
"docs": "typedoc --plugin typedoc-plugin-markdown --treatWarningsAsErrors --out ../docs/src/js lancedb/index.ts",
"postdocs": "node typedoc_post_process.js",

View File

@@ -48,8 +48,16 @@ impl Connection {
pub async fn new(uri: String, options: ConnectionOptions) -> napi::Result<Self> {
let mut builder = ConnectBuilder::new(&uri);
if let Some(interval) = options.read_consistency_interval {
builder =
builder.read_consistency_interval(std::time::Duration::from_secs_f64(interval));
match interval {
Either::A(seconds) => {
builder = builder.read_consistency_interval(Some(
std::time::Duration::from_secs_f64(seconds),
));
}
Either::B(_) => {
builder = builder.read_consistency_interval(None);
}
}
}
if let Some(storage_options) = options.storage_options {
for (key, value) in storage_options {

View File

@@ -4,7 +4,9 @@
use std::sync::Mutex;
use lancedb::index::scalar::{BTreeIndexBuilder, FtsIndexBuilder};
use lancedb::index::vector::{IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder};
use lancedb::index::vector::{
IvfFlatIndexBuilder, IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder,
};
use lancedb::index::Index as LanceDbIndex;
use napi_derive::napi;
@@ -63,6 +65,32 @@ impl Index {
})
}
#[napi(factory)]
pub fn ivf_flat(
distance_type: Option<String>,
num_partitions: Option<u32>,
max_iterations: Option<u32>,
sample_rate: Option<u32>,
) -> napi::Result<Self> {
let mut ivf_flat_builder = IvfFlatIndexBuilder::default();
if let Some(distance_type) = distance_type {
let distance_type = parse_distance_type(distance_type)?;
ivf_flat_builder = ivf_flat_builder.distance_type(distance_type);
}
if let Some(num_partitions) = num_partitions {
ivf_flat_builder = ivf_flat_builder.num_partitions(num_partitions);
}
if let Some(max_iterations) = max_iterations {
ivf_flat_builder = ivf_flat_builder.max_iterations(max_iterations);
}
if let Some(sample_rate) = sample_rate {
ivf_flat_builder = ivf_flat_builder.sample_rate(sample_rate);
}
Ok(Self {
inner: Mutex::new(Some(LanceDbIndex::IvfFlat(ivf_flat_builder))),
})
}
#[napi(factory)]
pub fn btree() -> Self {
Self {

View File

@@ -4,6 +4,7 @@
use std::collections::HashMap;
use env_logger::Env;
use napi::{bindgen_prelude::Null, Either};
use napi_derive::*;
mod connection;
@@ -18,7 +19,6 @@ mod table;
mod util;
#[napi(object)]
#[derive(Debug)]
pub struct ConnectionOptions {
/// (For LanceDB OSS only): The interval, in seconds, at which to check for
/// updates to the table from other processes. If None, then consistency is not
@@ -29,7 +29,7 @@ pub struct ConnectionOptions {
/// has passed since the last check, then the table will be checked for updates.
/// Note: this consistency only applies to read operations. Write operations are
/// always consistent.
pub read_consistency_interval: Option<f64>,
pub read_consistency_interval: Option<Either<f64, Null>>,
/// (For LanceDB OSS only): configuration for object storage.
///
/// The available options are described at https://lancedb.github.io/lancedb/guides/storage/

View File

@@ -3,7 +3,7 @@
use std::sync::Arc;
use lancedb::index::scalar::FullTextSearchQuery;
use lancedb::index::scalar::{FtsQuery, FullTextSearchQuery, MatchQuery, PhraseQuery};
use lancedb::query::ExecutableQuery;
use lancedb::query::Query as LanceDbQuery;
use lancedb::query::QueryBase;
@@ -18,7 +18,7 @@ use crate::error::NapiErrorExt;
use crate::iterator::RecordBatchIterator;
use crate::rerankers::Reranker;
use crate::rerankers::RerankerCallbacks;
use crate::util::parse_distance_type;
use crate::util::{parse_distance_type, parse_fts_query};
#[napi]
pub struct Query {
@@ -38,9 +38,53 @@ impl Query {
}
#[napi]
pub fn full_text_search(&mut self, query: String, columns: Option<Vec<String>>) {
let query = FullTextSearchQuery::new(query).columns(columns);
pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> {
let query = unsafe { query.cast::<napi::JsObject>() };
let query = if let Some(query_text) = query.get::<_, String>("query").transpose() {
let mut query_text = query_text?;
let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false);
if is_phrase {
// Remove the surrounding quotes for phrase queries
query_text = query_text[1..query_text.len() - 1].to_string();
}
let query: FtsQuery = match (is_phrase, is_multi_match) {
(false, _) => MatchQuery::new(query_text).into(),
(true, false) => PhraseQuery::new(query_text).into(),
(true, true) => {
return Err(napi::Error::from_reason(
"Phrase queries cannot be used with multiple columns.",
));
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
}
query
} else if let Some(query) = query.get::<_, napi::JsObject>("query")? {
let query = parse_fts_query(&query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(napi::Error::from_reason(
"Invalid full text search query object".to_string(),
));
};
self.inner = self.inner.clone().full_text_search(query);
Ok(())
}
#[napi]
@@ -87,11 +131,15 @@ impl Query {
pub async fn execute(
&self,
max_batch_length: Option<u32>,
timeout_ms: Option<u32>,
) -> napi::Result<RecordBatchIterator> {
let mut execution_opts = QueryExecutionOptions::default();
if let Some(max_batch_length) = max_batch_length {
execution_opts.max_batch_length = max_batch_length;
}
if let Some(timeout_ms) = timeout_ms {
execution_opts.timeout = Some(std::time::Duration::from_millis(timeout_ms as u64))
}
let inner_stream = self
.inner
.execute_with_options(execution_opts)
@@ -114,6 +162,16 @@ impl Query {
))
})
}
#[napi(catch_unwind)]
pub async fn analyze_plan(&self) -> napi::Result<String> {
self.inner.analyze_plan().await.map_err(|e| {
napi::Error::from_reason(format!(
"Failed to execute analyze plan: {}",
convert_error(&e)
))
})
}
}
#[napi]
@@ -185,9 +243,53 @@ impl VectorQuery {
}
#[napi]
pub fn full_text_search(&mut self, query: String, columns: Option<Vec<String>>) {
let query = FullTextSearchQuery::new(query).columns(columns);
pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> {
let query = unsafe { query.cast::<napi::JsObject>() };
let query = if let Some(query_text) = query.get::<_, String>("query").transpose() {
let mut query_text = query_text?;
let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false);
if is_phrase {
// Remove the surrounding quotes for phrase queries
query_text = query_text[1..query_text.len() - 1].to_string();
}
let query: FtsQuery = match (is_phrase, is_multi_match) {
(false, _) => MatchQuery::new(query_text).into(),
(true, false) => PhraseQuery::new(query_text).into(),
(true, true) => {
return Err(napi::Error::from_reason(
"Phrase queries cannot be used with multiple columns.",
));
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
}
query
} else if let Some(query) = query.get::<_, napi::JsObject>("query")? {
let query = parse_fts_query(&query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(napi::Error::from_reason(
"Invalid full text search query object".to_string(),
));
};
self.inner = self.inner.clone().full_text_search(query);
Ok(())
}
#[napi]
@@ -232,11 +334,15 @@ impl VectorQuery {
pub async fn execute(
&self,
max_batch_length: Option<u32>,
timeout_ms: Option<u32>,
) -> napi::Result<RecordBatchIterator> {
let mut execution_opts = QueryExecutionOptions::default();
if let Some(max_batch_length) = max_batch_length {
execution_opts.max_batch_length = max_batch_length;
}
if let Some(timeout_ms) = timeout_ms {
execution_opts.timeout = Some(std::time::Duration::from_millis(timeout_ms as u64))
}
let inner_stream = self
.inner
.execute_with_options(execution_opts)
@@ -259,4 +365,14 @@ impl VectorQuery {
))
})
}
#[napi(catch_unwind)]
pub async fn analyze_plan(&self) -> napi::Result<String> {
self.inner.analyze_plan().await.map_err(|e| {
napi::Error::from_reason(format!(
"Failed to execute analyze plan: {}",
convert_error(&e)
))
})
}
}

View File

@@ -498,6 +498,9 @@ pub struct IndexStatistics {
pub distance_type: Option<String>,
/// The number of parts this index is split into.
pub num_indices: Option<u32>,
/// The KMeans loss value of the index,
/// it is only present for vector indices.
pub loss: Option<f64>,
}
impl From<lancedb::index::IndexStatistics> for IndexStatistics {
fn from(value: lancedb::index::IndexStatistics) -> Self {
@@ -507,6 +510,7 @@ impl From<lancedb::index::IndexStatistics> for IndexStatistics {
index_type: value.index_type.to_string(),
distance_type: value.distance_type.map(|d| d.to_string()),
num_indices: value.num_indices,
loss: value.loss,
}
}
}

View File

@@ -1,6 +1,7 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use lancedb::index::scalar::{BoostQuery, FtsQuery, MatchQuery, MultiMatchQuery, PhraseQuery};
use lancedb::DistanceType;
pub fn parse_distance_type(distance_type: impl AsRef<str>) -> napi::Result<DistanceType> {
@@ -15,3 +16,144 @@ pub fn parse_distance_type(distance_type: impl AsRef<str>) -> napi::Result<Dista
))),
}
}
pub fn parse_fts_query(query: &napi::JsObject) -> napi::Result<FtsQuery> {
let query_type = query
.get_property_names()?
.get_element::<napi::JsString>(0)?;
let query_type = query_type.into_utf8()?.into_owned()?;
let query_value =
query
.get::<_, napi::JsObject>(&query_type)?
.ok_or(napi::Error::from_reason(format!(
"query value {} not found",
query_type
)))?;
match query_type.as_str() {
"match" => {
let column = query_value
.get_property_names()?
.get_element::<napi::JsString>(0)?
.into_utf8()?
.into_owned()?;
let params =
query_value
.get::<_, napi::JsObject>(&column)?
.ok_or(napi::Error::from_reason(format!(
"column {} not found",
column
)))?;
let query = params
.get::<_, napi::JsString>("query")?
.ok_or(napi::Error::from_reason("query not found"))?
.into_utf8()?
.into_owned()?;
let boost = params
.get::<_, napi::JsNumber>("boost")?
.ok_or(napi::Error::from_reason("boost not found"))?
.get_double()? as f32;
let fuzziness = params
.get::<_, napi::JsNumber>("fuzziness")?
.map(|f| f.get_uint32())
.transpose()?;
let max_expansions = params
.get::<_, napi::JsNumber>("max_expansions")?
.ok_or(napi::Error::from_reason("max_expansions not found"))?
.get_uint32()? as usize;
let query = MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions);
Ok(query.into())
}
"match_phrase" => {
let column = query_value
.get_property_names()?
.get_element::<napi::JsString>(0)?
.into_utf8()?
.into_owned()?;
let query = query_value
.get::<_, napi::JsString>(&column)?
.ok_or(napi::Error::from_reason(format!(
"column {} not found",
column
)))?
.into_utf8()?
.into_owned()?;
let query = PhraseQuery::new(query).with_column(Some(column));
Ok(query.into())
}
"boost" => {
let positive = query_value
.get::<_, napi::JsObject>("positive")?
.ok_or(napi::Error::from_reason("positive not found"))?;
let negative = query_value
.get::<_, napi::JsObject>("negative")?
.ok_or(napi::Error::from_reason("negative not found"))?;
let negative_boost = query_value
.get::<_, napi::JsNumber>("negative_boost")?
.ok_or(napi::Error::from_reason("negative_boost not found"))?
.get_double()? as f32;
let positive = parse_fts_query(&positive)?;
let negative = parse_fts_query(&negative)?;
let query = BoostQuery::new(positive, negative, Some(negative_boost));
Ok(query.into())
}
"multi_match" => {
let query = query_value
.get::<_, napi::JsString>("query")?
.ok_or(napi::Error::from_reason("query not found"))?
.into_utf8()?
.into_owned()?;
let columns_array = query_value
.get::<_, napi::JsTypedArray>("columns")?
.ok_or(napi::Error::from_reason("columns not found"))?;
let columns_num = columns_array.get_array_length()?;
let mut columns = Vec::with_capacity(columns_num as usize);
for i in 0..columns_num {
let column = columns_array
.get_element::<napi::JsString>(i)?
.into_utf8()?
.into_owned()?;
columns.push(column);
}
let boost_array = query_value
.get::<_, napi::JsTypedArray>("boost")?
.ok_or(napi::Error::from_reason("boost not found"))?;
if boost_array.get_array_length()? != columns_num {
return Err(napi::Error::from_reason(format!(
"boost array length ({}) does not match columns length ({})",
boost_array.get_array_length()?,
columns_num
)));
}
let mut boost = Vec::with_capacity(columns_num as usize);
for i in 0..columns_num {
let b = boost_array.get_element::<napi::JsNumber>(i)?.get_double()? as f32;
boost.push(b);
}
let query =
MultiMatchQuery::try_new_with_boosts(query, columns, boost).map_err(|e| {
napi::Error::from_reason(format!("Error creating MultiMatchQuery: {}", e))
})?;
Ok(query.into())
}
_ => Err(napi::Error::from_reason(format!(
"Unsupported query type: {}",
query_type
))),
}
}

56
pyright_report.csv Normal file
View File

@@ -0,0 +1,56 @@
file,errors,warnings,total_issues
python/python/lancedb/arrow.py,0,0,0
python/python/lancedb/background_loop.py,0,0,0
python/python/lancedb/embeddings/__init__.py,0,0,0
python/python/lancedb/exceptions.py,0,0,0
python/python/lancedb/index.py,0,0,0
python/python/lancedb/integrations/__init__.py,0,0,0
python/python/lancedb/remote/__init__.py,0,0,0
python/python/lancedb/remote/errors.py,0,0,0
python/python/lancedb/rerankers/__init__.py,0,0,0
python/python/lancedb/rerankers/answerdotai.py,0,0,0
python/python/lancedb/rerankers/cohere.py,0,0,0
python/python/lancedb/rerankers/colbert.py,0,0,0
python/python/lancedb/rerankers/cross_encoder.py,0,0,0
python/python/lancedb/rerankers/openai.py,0,0,0
python/python/lancedb/rerankers/util.py,0,0,0
python/python/lancedb/rerankers/voyageai.py,0,0,0
python/python/lancedb/schema.py,0,0,0
python/python/lancedb/types.py,0,0,0
python/python/lancedb/__init__.py,0,1,1
python/python/lancedb/conftest.py,1,0,1
python/python/lancedb/embeddings/bedrock.py,1,0,1
python/python/lancedb/merge.py,1,0,1
python/python/lancedb/rerankers/base.py,1,0,1
python/python/lancedb/rerankers/jinaai.py,0,1,1
python/python/lancedb/rerankers/linear_combination.py,1,0,1
python/python/lancedb/embeddings/instructor.py,2,0,2
python/python/lancedb/embeddings/openai.py,2,0,2
python/python/lancedb/embeddings/watsonx.py,2,0,2
python/python/lancedb/embeddings/registry.py,3,0,3
python/python/lancedb/embeddings/sentence_transformers.py,3,0,3
python/python/lancedb/integrations/pyarrow.py,3,0,3
python/python/lancedb/rerankers/rrf.py,3,0,3
python/python/lancedb/dependencies.py,4,0,4
python/python/lancedb/embeddings/gemini_text.py,4,0,4
python/python/lancedb/embeddings/gte.py,4,0,4
python/python/lancedb/embeddings/gte_mlx_model.py,4,0,4
python/python/lancedb/embeddings/ollama.py,4,0,4
python/python/lancedb/embeddings/transformers.py,4,0,4
python/python/lancedb/remote/db.py,5,0,5
python/python/lancedb/context.py,6,0,6
python/python/lancedb/embeddings/cohere.py,6,0,6
python/python/lancedb/fts.py,6,0,6
python/python/lancedb/db.py,9,0,9
python/python/lancedb/embeddings/utils.py,9,0,9
python/python/lancedb/common.py,11,0,11
python/python/lancedb/util.py,13,0,13
python/python/lancedb/embeddings/imagebind.py,14,0,14
python/python/lancedb/embeddings/voyageai.py,15,0,15
python/python/lancedb/embeddings/open_clip.py,16,0,16
python/python/lancedb/pydantic.py,16,0,16
python/python/lancedb/embeddings/base.py,17,0,17
python/python/lancedb/embeddings/jinaai.py,18,1,19
python/python/lancedb/remote/table.py,23,0,23
python/python/lancedb/query.py,47,1,48
python/python/lancedb/table.py,61,0,61
1 file errors warnings total_issues
2 python/python/lancedb/arrow.py 0 0 0
3 python/python/lancedb/background_loop.py 0 0 0
4 python/python/lancedb/embeddings/__init__.py 0 0 0
5 python/python/lancedb/exceptions.py 0 0 0
6 python/python/lancedb/index.py 0 0 0
7 python/python/lancedb/integrations/__init__.py 0 0 0
8 python/python/lancedb/remote/__init__.py 0 0 0
9 python/python/lancedb/remote/errors.py 0 0 0
10 python/python/lancedb/rerankers/__init__.py 0 0 0
11 python/python/lancedb/rerankers/answerdotai.py 0 0 0
12 python/python/lancedb/rerankers/cohere.py 0 0 0
13 python/python/lancedb/rerankers/colbert.py 0 0 0
14 python/python/lancedb/rerankers/cross_encoder.py 0 0 0
15 python/python/lancedb/rerankers/openai.py 0 0 0
16 python/python/lancedb/rerankers/util.py 0 0 0
17 python/python/lancedb/rerankers/voyageai.py 0 0 0
18 python/python/lancedb/schema.py 0 0 0
19 python/python/lancedb/types.py 0 0 0
20 python/python/lancedb/__init__.py 0 1 1
21 python/python/lancedb/conftest.py 1 0 1
22 python/python/lancedb/embeddings/bedrock.py 1 0 1
23 python/python/lancedb/merge.py 1 0 1
24 python/python/lancedb/rerankers/base.py 1 0 1
25 python/python/lancedb/rerankers/jinaai.py 0 1 1
26 python/python/lancedb/rerankers/linear_combination.py 1 0 1
27 python/python/lancedb/embeddings/instructor.py 2 0 2
28 python/python/lancedb/embeddings/openai.py 2 0 2
29 python/python/lancedb/embeddings/watsonx.py 2 0 2
30 python/python/lancedb/embeddings/registry.py 3 0 3
31 python/python/lancedb/embeddings/sentence_transformers.py 3 0 3
32 python/python/lancedb/integrations/pyarrow.py 3 0 3
33 python/python/lancedb/rerankers/rrf.py 3 0 3
34 python/python/lancedb/dependencies.py 4 0 4
35 python/python/lancedb/embeddings/gemini_text.py 4 0 4
36 python/python/lancedb/embeddings/gte.py 4 0 4
37 python/python/lancedb/embeddings/gte_mlx_model.py 4 0 4
38 python/python/lancedb/embeddings/ollama.py 4 0 4
39 python/python/lancedb/embeddings/transformers.py 4 0 4
40 python/python/lancedb/remote/db.py 5 0 5
41 python/python/lancedb/context.py 6 0 6
42 python/python/lancedb/embeddings/cohere.py 6 0 6
43 python/python/lancedb/fts.py 6 0 6
44 python/python/lancedb/db.py 9 0 9
45 python/python/lancedb/embeddings/utils.py 9 0 9
46 python/python/lancedb/common.py 11 0 11
47 python/python/lancedb/util.py 13 0 13
48 python/python/lancedb/embeddings/imagebind.py 14 0 14
49 python/python/lancedb/embeddings/voyageai.py 15 0 15
50 python/python/lancedb/embeddings/open_clip.py 16 0 16
51 python/python/lancedb/pydantic.py 16 0 16
52 python/python/lancedb/embeddings/base.py 17 0 17
53 python/python/lancedb/embeddings/jinaai.py 18 1 19
54 python/python/lancedb/remote/table.py 23 0 23
55 python/python/lancedb/query.py 47 1 48
56 python/python/lancedb/table.py 61 0 61

Some files were not shown because too many files have changed in this diff Show More