Compare commits

...

294 Commits

Author SHA1 Message Date
Lance Release
e08d45e090 Bump version: 0.23.1-beta.0 → 0.23.1-beta.1 2025-06-17 23:22:00 +00:00
Will Jones
2e3ddb8382 ci: fix lockfile failure for vectordb node (#2443)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated release workflow to set a specific Git user name and email for
automated commits during the package publishing process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 15:14:11 -07:00
Wyatt Alt
627ca4c810 chore: update lance to v0.29.1-beta.2 (#2442)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to use a newer version of the Lance
library.
- **New Features**
- Added support for a new query occurrence type labeled "MUST NOT" in
search filters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 14:02:13 -07:00
Lance Release
f8dae4ffe9 Bump version: 0.20.0 → 0.20.1-beta.0 2025-06-16 16:30:14 +00:00
Lance Release
9eb6119468 Bump version: 0.23.0 → 0.23.1-beta.0 2025-06-16 16:29:22 +00:00
Weston Pace
59b57e30ed feat: add maximum and minimum nprobes properties (#2430)
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.

- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.

- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-13 15:18:29 -07:00
BubbleCal
fec8d58f06 feat: support a bunch or FTS features in JS SDK (#2431)
- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-12 17:04:19 +08:00
BubbleCal
84ded9d678 feat: support new FTS features in python SDK (#2411)
- AND operator
- phrase query slop param
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for combining full-text search queries using AND/OR
operators, enabling more flexible query composition.
- Introduced new query types and parameters, including boolean queries,
operator selection, occurrence constraints, and phrase slop for advanced
search scenarios.
- Enhanced asynchronous search to accept rich full-text query objects
directly.

- **Bug Fixes**
- Improved handling and validation of full-text search queries in both
synchronous and asynchronous search operations.

- **Tests**
- Updated and expanded tests to cover new full-text query types and
their usage in search functions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-06 14:33:46 +08:00
Wyatt Alt
65696d9713 chore: update lance in lancedb (#2424)
This updates lance to v0.29.1-beta.1.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies for improved consistency and
reliability. No changes to user-facing functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 19:06:51 -07:00
Will Jones
e2f2ea32e4 ci: fix vectordb release (#2422)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the release workflow to include an additional step for
improved process reliability. No changes to user-facing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 17:06:02 -07:00
Lance Release
d5f2eca754 Bump version: 0.20.0-beta.3 → 0.20.0 2025-06-04 21:08:31 +00:00
Lance Release
7fa455a8a5 Bump version: 0.20.0-beta.2 → 0.20.0-beta.3 2025-06-04 21:07:59 +00:00
Lance Release
8f42b5874e Bump version: 0.23.0-beta.3 → 0.23.0 2025-06-04 21:07:39 +00:00
Lance Release
274f19f560 Bump version: 0.23.0-beta.2 → 0.23.0-beta.3 2025-06-04 21:07:38 +00:00
Will Jones
fbcbc75b5b feat: upgrade lance to stable version (#2420)
Adds a script to change the lance dependency easily. To make this
change, I just had to run:

```bash
python ci/set_lance_version.py stable
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a script to automate updating the Lance package version in
project dependencies.
- **Chores**
- Updated workflows to improve lockfile management and automate updates
during releases and publishing.
- Switched Lance dependencies from git-based references to fixed version
numbers for improved stability.
- Enhanced lockfile update script with an option to amend commits and
quieter output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-06-04 13:34:30 -07:00
Will Jones
008f389bd0 ci: commit updated Cargo.lock (#2418)
Follow up to #2416

Forgot to do `git add`.
Also need to delete old actions updating package lock.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
  - Removed legacy workflows related to updating package lock files.
- Improved the update lockfiles script to ensure updated lockfiles are
always included in amended commits.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 08:40:38 -07:00
Lance Release
91af6518d9 Updating package-lock.json 2025-06-04 07:15:07 +00:00
Lance Release
af6819762c Updating package-lock.json 2025-06-04 07:14:50 +00:00
Lance Release
7acece493d Bump version: 0.20.0-beta.1 → 0.20.0-beta.2 2025-06-04 07:14:39 +00:00
Lance Release
20e017fedc Bump version: 0.23.0-beta.1 → 0.23.0-beta.2 2025-06-04 07:13:44 +00:00
Jack Ye
74e578b3c8 feat: upgrade lance to v0.29.0-beta.2 (#2419)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated various internal dependencies to newer versions for improved
stability and compatibility.
  - Increased the version number for the Python package.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-03 15:16:26 -07:00
Lance Release
d92d9eb3d2 Updating package-lock.json 2025-06-03 16:28:18 +00:00
Lance Release
b6cdce7bc9 Updating package-lock.json 2025-06-03 16:28:02 +00:00
Lance Release
316b406265 Bump version: 0.20.0-beta.0 → 0.20.0-beta.1 2025-06-03 16:27:53 +00:00
Lance Release
8825c7c1dd Bump version: 0.23.0-beta.0 → 0.23.0-beta.1 2025-06-03 16:26:58 +00:00
David Myriel
81c85ff702 docs: announcement for Documentation (#2410)
Just letting people know where to look starting June 1st. 

Both docsites should be pointing to lancedb.github.io/documentation.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Added a notification banner to the documentation site informing users
about a new URL for accessing the latest documentation starting June
1st, 2025. The message includes a clickable link that opens in a new
tab.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-06-03 08:55:02 -07:00
Will Jones
570f2154d5 ci: automatically update Cargo.lock (#2416)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated workflow to ignore changes in the `Cargo.lock` file during
documentation checks, reducing unnecessary workflow failures.
- Enhanced release process by adding automated lockfile updates for
Node.js and Rust components.
- Removed an obsolete package-lock update job from the publishing
workflow to streamline releases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-03 07:49:21 -07:00
Lance Release
0525c055fc Updating package-lock.json 2025-05-31 04:29:20 +00:00
Lance Release
38d11291da Updating package-lock.json 2025-05-31 03:48:11 +00:00
Lance Release
258e682574 Updating package-lock.json 2025-05-31 03:47:55 +00:00
Lance Release
d7afa600b8 Bump version: 0.19.2-beta.0 → 0.20.0-beta.0 2025-05-31 03:47:37 +00:00
Lance Release
5c7303ab2e Bump version: 0.22.2-beta.0 → 0.23.0-beta.0 2025-05-31 03:47:13 +00:00
Will Jones
5895ef4039 ci: revert unnecessary version bump (#2415)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Downgraded version numbers for the Node.js, Python, and Rust packages.
No other user-facing changes were made.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 16:51:14 -07:00
Jack Ye
0528cd858a fix: avoid failing list_indices for any unknown index (#2413)
Closes #2412 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved the reliability of listing indices by logging warnings for
errors and skipping problematic entries, ensuring successful results are
returned.
- Internal indices used for optimization are now excluded from the
visible list of indices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 14:43:12 -07:00
Jack Ye
6582f43422 feat: upgrade lance to v0.29.0-beta.1 (#2414)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies for improved stability and
compatibility. No user-facing changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 13:47:41 -07:00
BubbleCal
5c7f63388d feat!: upgrade lance to v0.28.0 (#2404)
this introduces some breaking changes in terms of rust API of creating
FTS index, and the default index params changed

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Updated default settings for full-text search (FTS) index creation:
stemming, stop word removal, and ASCII folding are now enabled by
default, while token position storage is disabled by default.

- **Refactor**
- Simplified and streamlined the configuration and handling of FTS index
parameters for improved maintainability and consistency across
interfaces.
- Enhanced serialization and request construction for FTS index
parameters to reduce manual handling and improve code clarity.
- Improved test coverage by explicitly enabling positional indexing in
FTS tests to support phrase queries.

- **Chores**
- Upgraded all internal dependencies related to FTS indexing to the
latest version for enhanced compatibility and performance.
- Updated package versions for Node.js, Python, and Rust components to
the latest beta releases.
- Improved CI workflows by adding Rust toolchain setup with formatting
and linting tools.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2025-05-29 15:19:24 -07:00
Renato Marroquin
d0bc671cac docs: add example for querying a lance table with SQL (#2389)
Adds example for querying a dataset with SQL

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Added new guides on querying LanceDB tables using SQL with DuckDB and
Apache Datafusion.
- Included detailed instructions for integrating LanceDB with Datafusion
in Python.
- Updated navigation to include Datafusion and SQL querying
documentation.
- Improved formatting in TypeScript and vectordb update examples for
consistency.

- **Tests**
- Added a new test demonstrating SQL querying on Lance tables via
DataFusion integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-05-29 06:14:38 -07:00
David Myriel
d37e17593d [doc] Add New Readme Page (#2405)
Added a new readme for better navigation, updated language and more
detail

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Updated the README with a modernized header, improved structure, and
clearer descriptions of features and architecture.
- Expanded and reorganized key features and product offerings for better
clarity.
- Simplified installation instructions and added a table of SDK
interfaces with documentation links.
- Enhanced community and contributor sections with new visuals and links
to social and support channels.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-27 17:45:17 +02:00
Lance Release
cb726d370e Updating package-lock.json 2025-05-23 22:36:54 +00:00
Lance Release
23ee132546 Updating package-lock.json 2025-05-23 21:58:58 +00:00
Lance Release
7fa090d330 Updating package-lock.json 2025-05-23 21:58:43 +00:00
Lance Release
07bc1c5397 Bump version: 0.19.1 → 0.19.2-beta.0 2025-05-23 21:58:31 +00:00
Lance Release
d7a9dbb9fc Bump version: 0.22.1 → 0.22.2-beta.0 2025-05-23 21:58:17 +00:00
Jack Ye
00487afc7d feat: upgrade lance to v0.27.3-beta.2 (#2403)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies for improved compatibility and
stability. No changes to user-facing features.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-23 14:53:13 -07:00
BubbleCal
1902d65aad docs: update the num_partitions recommendation (#2401) 2025-05-23 23:45:37 +08:00
Lance Release
c4fbb65b8e Updating package-lock.json 2025-05-22 07:06:03 +00:00
Lance Release
875ed7ae6f Updating package-lock.json 2025-05-22 05:58:59 +00:00
Lance Release
95a46a57ba Updating package-lock.json 2025-05-22 05:58:43 +00:00
Lance Release
51561e31a0 Bump version: 0.19.1-beta.6 → 0.19.1 2025-05-22 05:58:05 +00:00
Lance Release
7b19120578 Bump version: 0.19.1-beta.5 → 0.19.1-beta.6 2025-05-22 05:58:00 +00:00
Lance Release
745c34a6a9 Bump version: 0.22.1-beta.6 → 0.22.1 2025-05-22 05:57:20 +00:00
Lance Release
db8fa2454d Bump version: 0.22.1-beta.5 → 0.22.1-beta.6 2025-05-22 05:57:20 +00:00
Lei Xu
a67a7b4b42 chore: use stable lance (#2398)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies to use a stable release version for
improved consistency and reliability. No changes to application features
or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-21 22:34:29 -07:00
Lei Xu
496846e532 chore: bump lance version (#2397)
- Bump lance version and prepare a new release.
- Bump rust toolchain to 1.86, because GHA ubuntu does not have 1.83
`cargo-fmt` anymore
2025-05-21 14:15:55 -07:00
Ayush Chaurasia
dadcfebf8e docs: add logos in genkit docs page (#2391)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added an integration banner image to the beginning of the
Genkitx-LanceDB documentation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-20 01:40:12 +05:30
Lance Release
67033dbd7f Updating package-lock.json 2025-05-16 00:25:41 +00:00
Lance Release
05a85cfc2a Updating package-lock.json 2025-05-15 23:44:27 +00:00
Lance Release
40c5d3d72b Updating package-lock.json 2025-05-15 23:44:10 +00:00
Lance Release
198f0f80c6 Bump version: 0.19.1-beta.4 → 0.19.1-beta.5 2025-05-15 23:43:32 +00:00
Lance Release
e3f2fd3892 Bump version: 0.22.1-beta.4 → 0.22.1-beta.5 2025-05-15 23:42:46 +00:00
Wyatt Alt
f401ccc599 chore: update lance to 0.27.1-beta.1 (#2388)
This is for fe14671f1

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to newer versions for improved stability
and performance. No changes to features or functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-15 16:09:01 -07:00
Ayush Chaurasia
81b59139f8 docs: add genkit integration docs (#2383)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added a comprehensive guide for integrating LanceDB with Genkit,
including installation instructions, setup, indexing, retrieval, and
building a custom RAG pipeline with example code and screenshots.
- Updated the documentation navigation to include the new Genkit
integration, making it accessible from the site menu.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-12 18:18:07 +05:30
ayush chaurasia
1026781ab6 Revert "update"
This reverts commit 9c699b8cd9.
2025-05-11 21:04:59 +05:30
ayush chaurasia
9c699b8cd9 update 2025-05-11 21:01:53 +05:30
Lance Release
34bec59bc3 Updating package-lock.json 2025-05-08 21:34:37 +00:00
Lance Release
a5fbbf0d66 Updating package-lock.json 2025-05-08 20:20:18 +00:00
Lance Release
b42721167b Updating package-lock.json 2025-05-08 20:20:00 +00:00
Lance Release
543dec9ff0 Bump version: 0.19.1-beta.3 → 0.19.1-beta.4 2025-05-08 20:19:17 +00:00
Lance Release
04f962f6b0 Bump version: 0.22.1-beta.3 → 0.22.1-beta.4 2025-05-08 20:18:40 +00:00
LuQQiu
19e896ff69 chore: add default for result structs (#2377)
add default for result structs, when values are not provided, will go
with the default values

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved internal handling of table operation results to support
default values. No changes to user-facing features or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:09:11 -07:00
Will Jones
272e4103b2 feat: provide timeout parameter for merge_insert (#2378)
Provides the ability to set a timeout for merge insert. The default
underlying timeout is however long the first attempt takes, or if there
are multiple attempts, 30 seconds. This has two use cases:

1. Make the timeout shorter, when you want to fail if it takes too long.
2. Allow taking more time to do retries.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying a timeout when performing merge insert
operations in Python, Node.js, and Rust APIs.
- Introduced a new option to control the maximum allowed execution time
for merge inserts, including retry timeout handling.

- **Documentation**
- Updated and added documentation to describe the new timeout option and
its usage in APIs.

- **Tests**
- Added and updated tests to verify correct timeout behavior during
merge insert operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:07:05 -07:00
Wyatt Alt
75c257ebb6 fix: return IndexNotExist on remote drop index 404 (#2380)
Prior to this commit, attempting to drop an index that did not exist
would return a TableNotFound error stating that the target table does
not exist -- even when it did exist. Instead, we now return an
IndexNotFound error.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling when attempting to drop a non-existent index,
providing a more accurate error message.
- **Tests**
- Added a test to verify correct error reporting when dropping an index
that does not exist.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:24:05 -07:00
Wyatt Alt
9ee152eb42 fix: support __len__ on remote table (#2379)
This moves the __len__ method from LanceTable and RemoteTable to Table
so that child classes don't need to implement their own. In the process,
it fixes the implementation of RemoteTable's length method, which was
previously missing a return statement.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Centralized the table length functionality in the base table class,
simplifying subclass behavior.
- Removed redundant or non-functional length methods from specific table
classes.

- **Tests**
- Added a new test to verify correct table length reporting for remote
tables.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:23:39 -07:00
LuQQiu
c9ae1b1737 fix: add restore with tag in python and nodejs API (#2374)
add restore with tag API in python and nodejs API and add tests to guard
them

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The restore functionality now supports using version tags in addition
to numeric version identifiers, allowing you to revert tables to a state
marked by a tag.
- **Bug Fixes**
  - Restoring with an unknown tag now properly raises an error.
- **Documentation**
- Updated documentation and examples to clarify that restore accepts
both version numbers and tags.
- **Tests**
- Added new tests to verify restore behavior with version tags and error
handling for unknown tags.
  - Added tests for checkout and restore operations involving tags.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-06 16:12:58 -07:00
Lance Release
89dc80c42a Updating package-lock.json 2025-05-06 03:53:49 +00:00
Wyatt Alt
7b020ac799 chore: run cargo update (#2376) 2025-05-05 20:26:42 -07:00
Lance Release
529e774bbb Updating package-lock.json 2025-05-06 02:45:45 +00:00
Lance Release
7c12239305 Updating package-lock.json 2025-05-06 02:45:29 +00:00
Lance Release
d83424d6b4 Bump version: 0.19.1-beta.2 → 0.19.1-beta.3 2025-05-06 02:45:06 +00:00
Lance Release
8bf89f887c Bump version: 0.22.1-beta.2 → 0.22.1-beta.3 2025-05-06 02:44:39 +00:00
LuQQiu
b2160b2304 fix: fix backward compatibility with the add API (#2375)
add API originally returns a struct with request_id, add backward
compatibility for that

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of empty server responses for various data
operations to ensure consistent behavior across server versions.
- Added default values to version and numeric fields to prevent errors
when response data is incomplete.

- **Tests**
- Expanded tests to cover multiple server response scenarios, validating
correct version handling in data operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 19:26:27 -07:00
Lance Release
1bb82597be Updating package-lock.json 2025-05-06 01:21:13 +00:00
Lance Release
e4eee38b3c Updating package-lock.json 2025-05-06 00:09:39 +00:00
Lance Release
64fc2be503 Updating package-lock.json 2025-05-06 00:09:19 +00:00
Lance Release
dc8054e90d Bump version: 0.19.1-beta.1 → 0.19.1-beta.2 2025-05-06 00:08:55 +00:00
Lance Release
1684940946 Bump version: 0.22.1-beta.1 → 0.22.1-beta.2 2025-05-06 00:08:29 +00:00
LuQQiu
695813463c chore: reduce unneeded API calls for return version for write operations and improve test (#2373)
Reduce the duplicate code for remote write operation testing.
Avoid double call to remote to get version info, just return 0 instead
of suddenly adding extra API calls for end users when they are using old
servers.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added version tracking to table operation results, allowing users to
see the commit version associated with add, update, delete, merge, and
column modification operations.
- **Bug Fixes**
- Improved compatibility with legacy servers by standardizing version
information as zero when the server does not return a version.
- **Documentation**
- Clarified the meaning of the version field in operation results,
especially for cases involving legacy server responses.
- **Tests**
- Enhanced test coverage to verify correct behavior with both legacy and
modern server responses.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 16:47:19 -07:00
LuQQiu
ed594b0f76 feat: return version for all write operations (#2368)
return version info for all write operations (add, update, merge_insert
and column modification operations)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Table modification operations (add, update, delete, merge,
add/alter/drop columns) now return detailed result objects including
version numbers and operation statistics.
- Result objects provide clearer feedback such as rows affected and new
table version after each operation.

- **Documentation**
- Updated documentation to describe new result objects and their fields
for all relevant table operations.
- Added documentation for new result interfaces and updated method
return types in Node.js and Python APIs.

- **Tests**
- Enhanced test coverage to assert correctness of returned versioning
and operation metadata after table modifications.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 14:25:34 -07:00
Will Jones
cee2b5ea42 chore: upgrade pyarrow pin (#2192)
Closes #2191


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the required version of the pyarrow package to version 16 or
higher.
- Adjusted automated testing workflows to install pyarrow version 16 for
compatibility checks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 11:23:13 -07:00
Alex Pilon
f315f9665a feat: implement bindings to return merge stats (#2367)
Based on this comment:
https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075
and https://github.com/lancedb/lance/pull/2357

Here is my attempt at implementing bindings for returning merge stats
from a `merge_insert.execute` call for lancedb.

Note: I have almost no idea what I am doing in Rust but tried to follow
existing code patterns and pay attention to compiler hints.
- The change in nodejs binding appeared to be necessary to get
compilation to work, presumably this could actual work properly by
returning some kind of NAPI JS object of the stats data?
- I am unsure of what to do with the remote/table.rs changes -
necessarily for compilation to work; I assume this is related to LanceDB
cloud, but unsure the best way to handle that at this point.

Proof of function:

```python
import pandas as pd
import lancedb


db = lancedb.connect("/tmp/test.db")

test_data = pd.DataFrame(
    {
        "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"],
        "id": [1, 2, 3, 4, 5],
        "content": [
            "World",
            "This is a test",
            "Another example",
            "More test data",
            "Final entry",
        ],
    }
)

table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite")

update_data = pd.DataFrame(
    {
        "title": [
            "Hello, World",
            "Test Document, it's good",
            "Example",
            "Data Sample",
            "Last One",
            "New One",
        ],
        "id": [1, 2, 3, 4, 5, 6],
        "content": [
            "World",
            "This is a test",
            "Another example",
            "More test data",
            "Final entry",
            "New content",
        ],
    }
)

stats = (
    table.merge_insert(on="id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(update_data)
)

print(stats)
```

returns

```
{'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0}
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Merge-insert operations now return detailed statistics, including
counts of inserted, updated, and deleted rows.
- **Bug Fixes**
- Tests updated to validate returned merge-insert statistics for
accuracy.
- **Documentation**
- Method documentation improved to reflect new return values and clarify
merge operation results.
- Added documentation for the new `MergeStats` interface detailing
operation statistics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-05-01 10:00:20 -07:00
Andrew C. Oliver
5deb26bc8b fix: prevent embedded objects from returning null in all of their fields (#2355)
metadata{filename=xyz} filename would be there structurally, but ALWAYS
null.

I didn't include this as a file but it may be useful for understanding
the problem for people searching on this issue so I'm including it here
as documentation. Before this patch any field that is more than 1 deep
is accepted but returns null values for subfields when queried.

```js
const lancedb = require('@lancedb/lancedb');

// Debug logger
function debug(message, data) {
  console.log(`[TEST] ${message}`, data !== undefined ? data : '');
}

// Log when our unwrapArrowObject is called
const kParent = Symbol.for("parent");
const kRowIndex = Symbol.for("rowIndex");

// Override console.log for our test
const originalConsoleLog = console.log;
console.log = function() {
  // Filter out noisy logs
  if (arguments[0] && typeof arguments[0] === 'string' && arguments[0].includes('[INFO] [LanceDB]')) {
    originalConsoleLog.apply(console, arguments);
  }
  originalConsoleLog.apply(console, arguments);
};

async function main() {
  debug('Starting test...');
  
  // Connect to the database
  debug('Connecting to database...');
  const db = await lancedb.connect('./.lancedb');
  
  // Try to open an existing table, or create a new one if it doesn't exist
  let table;
  try {
    table = await db.openTable('test_nested_fields');
    debug('Opened existing table');
  } catch (e) {
    debug('Creating new table...');
    
    // Create test data with nested metadata structure
    const data = [
      {
        id: 'test1',
        vector: [1, 2, 3],
        metadata: {
          filePath: "/path/to/file1.ts",
          startLine: 10,
          endLine: 20,
          text: "function test() { return true; }"
        }
      },
      {
        id: 'test2',
        vector: [4, 5, 6],
        metadata: {
          filePath: "/path/to/file2.ts",
          startLine: 30,
          endLine: 40,
          text: "function test2() { return false; }"
        }
      }
    ];
    
    debug('Data to be inserted:', JSON.stringify(data, null, 2));
    
    // Create the table
    table = await db.createTable('test_nested_fields', data);
    debug('Table created successfully');
  }
  
  // Query the table and get results
  debug('Querying table...');
  const results = await table.search([1, 2, 3]).limit(10).toArray();
  
  // Log the results
  debug('Number of results:', results.length);
  
  if (results.length > 0) {
    const firstResult = results[0];
    debug('First result properties:', Object.keys(firstResult));
    
    // Check if metadata is accessible and what properties it has
    if (firstResult.metadata) {
      debug('Metadata properties:', Object.keys(firstResult.metadata));
      debug('Metadata filePath:', firstResult.metadata.filePath);
      debug('Metadata startLine:', firstResult.metadata.startLine);
      
      // Destructure to see if that helps
      const { filePath, startLine, endLine, text } = firstResult.metadata;
      debug('Destructured values:', { filePath, startLine, endLine, text });
      
      // Check if it's a proxy object
      debug('Result is proxy?', Object.getPrototypeOf(firstResult) === Object.prototype ? false : true);
      debug('Metadata is proxy?', Object.getPrototypeOf(firstResult.metadata) === Object.prototype ? false : true);
    } else {
      debug('Metadata is not accessible!');
    }
  }
  
  // Close the database
  await db.close();
}

main().catch(e => {
  console.error('Error:', e);
}); 
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of nested struct fields to ensure accurate
preservation of values during serialization and deserialization.
- Enhanced robustness when accessing nested object properties, reducing
errors with missing or null values.

- **Tests**
- Added tests to verify correct handling of nested struct fields through
serialization and deserialization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-05-01 09:38:55 -07:00
Lance Release
3cc670ac38 Updating package-lock.json 2025-04-29 23:21:19 +00:00
Lance Release
4ade3e31e2 Updating package-lock.json 2025-04-29 22:19:46 +00:00
Lance Release
a222d2cd91 Updating package-lock.json 2025-04-29 22:19:30 +00:00
Lance Release
508e621f3d Bump version: 0.19.1-beta.0 → 0.19.1-beta.1 2025-04-29 22:19:14 +00:00
Lance Release
a1a0472f3f Bump version: 0.22.1-beta.0 → 0.22.1-beta.1 2025-04-29 22:18:53 +00:00
Wyatt Alt
3425a6d339 feat: upgrade lance to v0.27.0-beta.2 (#2364)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependencies for related components to use the latest version
from a specific repository source. No changes to features or public
functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-29 14:59:56 -07:00
Ryan Green
af54e0ce06 feat: add table stats API (#2363)
* Add a new "table stats" API to expose basic table and fragment
statistics with local and remote table implementations

### Questions
* This is using `calculate_data_stats` to determine total bytes in the
table. This seems like a potentially expensive operation - are there any
concerns about performance for large datasets?

### Notes
* bytes_on_disk seems to be stored at the column level but there does
not seem to be a way to easily calculate total bytes per fragment. This
may need to be added in lance before we can support fragment size
(bytes) statistics.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added a method to retrieve comprehensive table statistics, including
total rows, index counts, storage size, and detailed fragment size
metrics such as minimum, maximum, mean, and percentiles.
- Enabled fetching of table statistics from remote sources through
asynchronous requests.
- Extended table interfaces across Python, Rust, and Node.js to support
synchronous and asynchronous retrieval of table statistics.
- **Tests**
- Introduced tests to verify the accuracy of the new table statistics
feature for both populated and empty tables.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-29 15:19:08 -02:30
Lance Release
089905fe8f Updating package-lock.json 2025-04-28 19:13:36 +00:00
Lance Release
554939e5d2 Updating package-lock.json 2025-04-28 17:20:58 +00:00
Lance Release
7a13814922 Updating package-lock.json 2025-04-28 17:20:42 +00:00
Lance Release
e9f25f6a12 Bump version: 0.19.0 → 0.19.1-beta.0 2025-04-28 17:20:26 +00:00
Lance Release
419a433244 Bump version: 0.22.0 → 0.22.1-beta.0 2025-04-28 17:20:10 +00:00
LuQQiu
a9311c4dc0 feat: add list/create/delete/update/checkout tag API (#2353)
add the tag related API to list existing tags, attach tag to a version,
update the tag version, delete tag, get the version of the tag, and
checkout the version that the tag bounded to.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced table version tagging, allowing users to create, update,
delete, and list human-readable tags for specific table versions.
  - Enabled checking out a table by either version number or tag name.
- Added new interfaces for tag management in both Python and Node.js
APIs, supporting synchronous and asynchronous workflows.

- **Bug Fixes**
  - None.

- **Documentation**
- Updated documentation to describe the new tagging features, including
usage examples.

- **Tests**
- Added comprehensive tests for tag creation, updating, deletion,
listing, and version checkout by tag in both Python and Node.js
environments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-28 10:04:46 -07:00
LuQQiu
178bcf9c90 fix: hybrid search explain plan analyze plan (#2360)
Fix hybrid search explain plan analyze plan API

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added options to view the execution plan and analyze the runtime
performance of hybrid queries.
- **Refactor**
- Improved internal handling of query setup for better modularity and
maintainability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-27 18:39:43 -07:00
Lance Release
b9be092cb1 Updating package-lock.json 2025-04-25 22:05:57 +00:00
Lance Release
e8c0c52315 Updating package-lock.json 2025-04-25 21:17:03 +00:00
Lance Release
a60fa0d3b7 Updating package-lock.json 2025-04-25 21:16:48 +00:00
Lance Release
726d629b9b Bump version: 0.19.0-beta.12 → 0.19.0 2025-04-25 21:16:30 +00:00
Lance Release
b493f56dee Bump version: 0.19.0-beta.11 → 0.19.0-beta.12 2025-04-25 21:16:25 +00:00
Lance Release
a8b5ad7e74 Bump version: 0.22.0-beta.12 → 0.22.0 2025-04-25 21:16:07 +00:00
Lance Release
f8f6264883 Bump version: 0.22.0-beta.11 → 0.22.0-beta.12 2025-04-25 21:16:07 +00:00
Will Jones
d8517117f1 feat: upgrade Lance to v0.26.0 (#2359)
Upstream changelog:
https://github.com/lancedb/lance/releases/tag/v0.26.0

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated dependency management to use published crate versions for
improved reliability and maintainability.
- Added a temporary workaround for build issues by pinning a specific
version of a dependency.
- **Refactor**
- Improved resource management and concurrency by updating internal
ownership models for object storage components.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-25 13:59:12 -07:00
Lance Release
ab66dd5ed2 Updating package-lock.json 2025-04-25 06:04:06 +00:00
Lance Release
cbb9a7877c Updating package-lock.json 2025-04-25 05:02:47 +00:00
Lance Release
b7fc223535 Updating package-lock.json 2025-04-25 05:02:32 +00:00
Lance Release
1fdaf7a1a4 Bump version: 0.19.0-beta.10 → 0.19.0-beta.11 2025-04-25 05:02:16 +00:00
Lance Release
d11819c90c Bump version: 0.22.0-beta.10 → 0.22.0-beta.11 2025-04-25 05:01:57 +00:00
BubbleCal
9b902272f1 fix: sync hybrid search ignores the distance range params (#2356)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for distance range filtering in hybrid vector queries,
allowing users to specify lower and upper bounds for search results.

- **Tests**
- Introduced new tests to validate distance range filtering and
reranking in both synchronous and asynchronous hybrid query scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-25 13:01:22 +08:00
Will Jones
8c0622fa2c fix: remote limit to avoid "Limit must be non-negative" (#2354)
To workaround this issue: https://github.com/lancedb/lancedb/issues/2211

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of large query parameters to prevent potential
overflow issues when using the "k" parameter in queries.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-24 15:04:06 -07:00
Philip Meier
2191f948c3 fix: add missing pydantic model config compat (#2316)
Fixes #2315.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Enhanced query processing to maintain smooth functionality across
different dependency versions, ensuring improved stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-22 14:46:10 -07:00
Will Jones
acc3b03004 ci: fix docs deploy (#2351)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved CI workflow for documentation builds by optimizing Rust build
settings and updating the runner environment.
  - Fixed a typo in a workflow step name.
- Streamlined caching steps to reduce redundancy and improve efficiency.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-22 13:55:34 -07:00
Lance Release
7f091b8c8e Updating package-lock.json 2025-04-22 19:16:43 +00:00
Lance Release
c19bdd9a24 Updating package-lock.json 2025-04-22 18:24:16 +00:00
Lance Release
dad0ff5cd2 Updating package-lock.json 2025-04-22 18:23:59 +00:00
Lance Release
a705621067 Bump version: 0.19.0-beta.9 → 0.19.0-beta.10 2025-04-22 18:23:39 +00:00
Lance Release
39614fdb7d Bump version: 0.22.0-beta.9 → 0.22.0-beta.10 2025-04-22 18:23:17 +00:00
Ryan Green
96d534d4bc feat: add retries to remote client for requests with stream bodies (#2349)
Closes https://github.com/lancedb/lancedb/issues/2307
* Adds retries to remote operations with stream bodies (add,
merge_insert)
* Change default retryable status codes to 409, 429, 500, 502, 503, 504
* Don't retry add or merge_insert operations on 5xx responses

Notes:
* Supporting retries on stream bodies means we have to buffer the body
into memory so it can be cloned on retry. This will impact memory use
patterns for the remote client. This buffering can be disabled by
disabling retries (i.e. setting retries to 0 in RetryConfig)
* It does not seem that retry config can be specified by env vars as the
documentation suggests. I added a follow-up issue
[here](https://github.com/lancedb/lancedb/issues/2350)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced retry support for remote requests with configurable limits
and exponential backoff with jitter.
- Added robust retry logic for streaming data uploads, enabling retries
with buffered data to ensure reliability.

- **Bug Fixes**
- Improved error handling and retry behavior for HTTP status codes 409
and 504.

- **Refactor**
- Centralized and modularized HTTP request sending and retry logic
across remote database and table operations.
  - Streamlined request ID management for improved traceability.
- Simplified error message construction in index waiting functionality.

- **Tests**
  - Added a test verifying merge-insert retries on HTTP 409 responses.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-22 15:40:44 -02:30
Lance Release
5051d30d09 Updating package-lock.json 2025-04-21 23:55:43 +00:00
Lance Release
db853c4041 Updating package-lock.json 2025-04-21 22:50:56 +00:00
Lance Release
76d1d22bdc Updating package-lock.json 2025-04-21 22:50:40 +00:00
Lance Release
d8746c61c6 Bump version: 0.19.0-beta.8 → 0.19.0-beta.9 2025-04-21 22:50:20 +00:00
Lance Release
1a66df2627 Bump version: 0.22.0-beta.8 → 0.22.0-beta.9 2025-04-21 22:49:59 +00:00
Will Jones
44670076c1 fix: move timeout to avoid retries (#2347)
I added a timeout to query execution options in
https://github.com/lancedb/lancedb/pull/2288. However, this was send to
the request timeout, but the retry implementation is unaware of this
timeout. So once the query timed out, a retry would be triggered.
Instead, this PR changes it so the timeout happens outside the retry
loop.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved query timeout handling to provide clearer error messages and
more reliable cancellation if a query takes too long to complete.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 14:27:04 -07:00
Will Jones
92f0b16e46 fix(python): make sure pandas is optional (#2346)
Fixes #2344


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Tests**
- Updated tests to use PyArrow Tables instead of pandas DataFrames where
possible, reducing reliance on pandas.
- Tests that require pandas are now automatically skipped if pandas is
not installed.
- **Chores**
- Improved workflow to uninstall both pylance and pandas in a specific
test step.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 13:42:13 -07:00
Eileen Noonan
1620ba3508 docs: make table.update() nodejs guide consistent with API documentation (#2334)
The docs in the Guide here do not match the [API reference]
(https://lancedb.github.io/lancedb/js/classes/Table/#updateopts) for the
nodejs client.

I am writing an Elixir wrapper over the typescript library (Rust
forthcoming!) and confirmed in testing that the API reference is correct
vs the Guide.

Following the Guide docs, the error I got was:

"lance error: Invalid user input: Schema error: No field named bar.
Valid fields are foo. For a query of:

await table.update({foo: "buzz"}, { where: "foo = 'bar'"});
Over a table with a schema of just {foo: Utf8}.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Reformatted a code snippet in the guide to enhance readability by
splitting it into multiple lines for improved clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 08:38:16 -07:00
Ryan Green
3ae90dde80 feat: add new table API to wait for async indexing (#2338)
* Add new wait_for_index() table operation that polls until indices are
created/fully indexed
* Add an optional wait timeout parameter to all create_index operations
* Python and NodeJS interfaces

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Added optional waiting for index creation completion with configurable
timeout.
- Introduced methods to poll and wait for indices to be fully built
across sync and async tables.
  - Extended index creation APIs to accept a wait timeout parameter.
- **Bug Fixes**
- Added a new timeout error variant for improved error reporting on
index operations.
- **Tests**
- Added tests covering successful index readiness waiting, timeout
scenarios, and missing index cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 08:41:21 -02:30
Magnus
4f07fea6df feat: add ColPali embedding support with MultiVector type (#2170)
This PR adds ColPali support with ColPaliEmbeddings class (tagged
"colpali") using ColQwen2.5 for multi-vector text/image embeddings. Also
added MultiVector Pydantic type to handle the vector lists.

I've added some integration test for the embedding model and some unit
test for the new Pydantic type. Could be a template for other ColPali
variants as well. or until transformers🤗 starts supporting it.


Still `TODO`:

- [ ] Documentation
- [ ] Add an example

_Could also allow Image as query, but didn't work well when testing it._

[ColPali-Engine](https://github.com/illuin-tech/colpali) version:
0.3.9.dev17+g3faee24

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for ColPali-based multimodal multi-vector
embeddings for both text and images.
- Added a new embedding class for generating multi-vector embeddings,
configurable for various model and processing options.
- Added a new Pydantic type for multi-vector embeddings, supporting
validation and schema generation for lists of fixed-dimension vectors.

- **Bug Fixes**
- Ensured proper asynchronous index creation in query tests for improved
reliability.

- **Tests**
- Added integration tests for ColPali embeddings, including
text-to-image search and validation of multi-vector fields.
- Added comprehensive tests for the new multi-vector Pydantic type,
covering schema, validation, and default value behavior.

- **Chores**
  - Updated optional dependencies to include the ColPali engine.
  - Added utility to check for availability of flash attention support.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 11:47:37 +08:00
Lance Release
3d7d82cf86 Updating package-lock.json 2025-04-17 23:13:37 +00:00
Lance Release
edc4e40a7b Updating package-lock.json 2025-04-17 22:16:36 +00:00
Lance Release
ca3806a02f Updating package-lock.json 2025-04-17 22:16:20 +00:00
Lance Release
35cff12e31 Bump version: 0.19.0-beta.7 → 0.19.0-beta.8 2025-04-17 22:16:02 +00:00
Lance Release
c6c20cb2bd Bump version: 0.22.0-beta.7 → 0.22.0-beta.8 2025-04-17 22:15:46 +00:00
Weston Pace
26080ee4c1 feat: add prewarm_index function (#2342)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added the ability to prewarm (load into memory) table indexes via new
methods in Python, Node.js, and Rust APIs, potentially reducing
cold-start query latency.
- **Bug Fixes**
- Ensured prewarming an index does not interfere with subsequent search
operations.
- **Tests**
- Introduced new test cases to verify full-text search index creation,
prewarming, and search functionalities in both Python and Node.js.
- **Chores**
  - Updated dependencies for improved compatibility and performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Lu Qiu <luqiujob@gmail.com>
2025-04-17 15:14:36 -07:00
Guspan Tanadi
ef3a2b5357 docs: intended path relative links (#2321)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated the link in the documentation to correctly reference the
workflow file, ensuring accurate navigation from the current context.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com>
2025-04-16 13:12:09 -07:00
Adam Azzam
c42a201389 docs: remove trailing commas from AWS IAM Policies (#2324)
Before:

<img width="1173" alt="Screenshot 2025-04-08 at 10 58 50 AM"
src="https://github.com/user-attachments/assets/e5c69c45-ab68-488f-9c7f-e12f7ecbfaab"
/>

After:
<img width="1136" alt="Screenshot 2025-04-08 at 10 58 58 AM"
src="https://github.com/user-attachments/assets/108c11ea-09b3-49b5-9a50-b880e72a0270"
/>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated JSON policy examples in the storage guides to correct
formatting issues and enhance syntax clarity for readers.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-16 13:09:21 -07:00
Lance Release
24e42ccd4d Updating package-lock.json 2025-04-15 05:29:37 +00:00
Lance Release
8a50944061 Updating package-lock.json 2025-04-15 04:11:16 +00:00
Lance Release
40e066bc7c Updating package-lock.json 2025-04-15 04:11:00 +00:00
Lance Release
b3ad105fa0 Bump version: 0.19.0-beta.6 → 0.19.0-beta.7 2025-04-15 04:10:43 +00:00
Lance Release
6e701d3e1b Bump version: 0.22.0-beta.6 → 0.22.0-beta.7 2025-04-15 04:10:26 +00:00
BubbleCal
2248aa9508 fix: bugs for new FTS APIs (#2314)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced full-text search capabilities with support for phrase
queries, fuzzy matching, boosting, and multi-column matching.
- Search methods now accept full-text query objects directly, improving
query flexibility and precision.
- Python and JavaScript SDKs updated to handle full-text queries
seamlessly, including async search support.

- **Tests**
- Added comprehensive tests covering fuzzy search, phrase search, and
boosted queries to ensure robust full-text search functionality.

- **Documentation**
- Updated query class documentation to reflect new constructor options
and removal of deprecated methods for clarity and simplicity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-15 11:51:35 +08:00
PhorstenkampFuzzy
a6fa69ab89 fix(python): add pylance as its own optional dependency (#2336)
This change allows to centrally manage the plance depndency without
everybody needing to monitor for compatibility manually.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an optional dependency that enhances development support.
Users can now benefit from improved static analysis capabilities when
installing the recommended version (0.23.2 or later).

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-14 09:28:16 -07:00
Will Jones
b3a4efd587 fix: revert change default read_consistency_interval=5s (#2327)
This reverts commit a547c523c2 or #2281

The current implementation can cause panics and performance degradation.
I will bring this back with more testing in
https://github.com/lancedb/lancedb/pull/2311

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Enhanced clarity on read consistency settings with updated
descriptions and default behavior.
- Removed outdated warnings about eventual consistency from the
troubleshooting guide.

- **Refactor**
- Streamlined the handling of the read consistency interval across
integrations, now defaulting to "None" for improved performance.
  - Simplified internal logic to offer a more consistent experience.

- **Tests**
- Updated test expectations to reflect the new default representation
for the read consistency interval.
- Removed redundant tests related to "no consistency" settings for
streamlined testing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-04-14 08:48:15 -07:00
Lei Xu
4708b60bb1 chore: cargo update on main (#2331)
Fix test failures on main
2025-04-12 09:00:47 -05:00
Lei Xu
080ea2f9a4 chore: fix 1.86 warnings (#2312)
Fix rust 1.86 warnings
2025-04-12 08:29:10 -05:00
Ayush Chaurasia
32fdde23f8 fix: robust handling of empty result when reranking (#2313)
I found some edge cases while running experiments that - depending on
the base reranking libraries, some of them don't handle empty lists
well. This PR manually checks if the result set to be reranked is empty

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced search result processing by ensuring that reordering only
occurs when valid, non-empty results are available, thereby preventing
unnecessary operations and potential errors.

- **Tests**
- Added automated tests to verify that empty search result sets are
handled correctly, ensuring consistent behavior across various
rerankers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-09 16:26:05 +05:30
Lance Release
c44e5c046c Updating package-lock.json 2025-04-08 07:01:33 +00:00
Lance Release
f23aa0a793 Updating package-lock.json 2025-04-08 06:17:03 +00:00
Lance Release
83fc2b1851 Updating package-lock.json 2025-04-08 06:16:48 +00:00
Lance Release
56aa133ee6 Bump version: 0.19.0-beta.5 → 0.19.0-beta.6 2025-04-08 06:16:30 +00:00
Lance Release
27d9e5c596 Bump version: 0.22.0-beta.5 → 0.22.0-beta.6 2025-04-08 06:16:14 +00:00
BubbleCal
ec8271931f feat: support to create FTS index on list of strings (#2317)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal library dependencies to the latest beta version for
improved system stability.
- **Tests**
- Added automated tests to validate full-text search functionality on
list-based text fields.
- **Refactor**
- Enhanced the search processing logic to provide robust support for
list-type text data, ensuring more reliable results.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-08 14:12:35 +08:00
Lance Release
6c6966600c Updating package-lock.json 2025-04-04 22:56:57 +00:00
Lance Release
2e170c3c7b Updating package-lock.json 2025-04-04 21:50:28 +00:00
Lance Release
fd92e651d1 Updating package-lock.json 2025-04-04 21:50:12 +00:00
Lance Release
c298482ee1 Bump version: 0.19.0-beta.4 → 0.19.0-beta.5 2025-04-04 21:49:53 +00:00
Lance Release
d59f64b5a3 Bump version: 0.22.0-beta.4 → 0.22.0-beta.5 2025-04-04 21:49:34 +00:00
fzowl
30ed8c4c43 fix: voyageai regression multimodal supercedes text models (#2268)
fix #2160
2025-04-04 14:45:56 -07:00
Will Jones
4a2cdbf299 ci: provide token for deprecate call (#2309)
This should prevent the failures we are seeing in Node release.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chore**
- Enhanced the package deprecation process with improved security
measures, ensuring smoother and more reliable updates during package
deprecation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 14:44:58 -07:00
Will Jones
657843d9e9 perf: remove redundant checkout latest (#2310)
This bug was introduced in https://github.com/lancedb/lancedb/pull/2281

Likely introduced during a rebase when fixing merge conflicts.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Updated the refresh process so that reloading now uses the existing
dataset version instead of automatically updating to the latest version.
This change may affect workflows that rely on immediate data updates
during refresh.
  
- **New Features**
- Introduced a new module for tracking I/O statistics in object store
operations, enhancing monitoring capabilities.
- Added a new test module to validate the functionality of the dataset
operations.

- **Bug Fixes**
- Reintroduced the `write_options` method in the `CreateTableBuilder`,
ensuring consistent functionality across different builder variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:56:02 -07:00
Will Jones
1cd76b8498 feat: add timeout to query execution options (#2288)
Closes #2287


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added configurable timeout support for query executions. Users can now
specify maximum wait times for queries, enhancing control over
long-running operations across various integrations.
- **Tests**
- Expanded test coverage to validate timeout behavior in both
synchronous and asynchronous query flows, ensuring timely error
responses when query execution exceeds the specified limit.
- Introduced a new test suite to verify query operations when a timeout
is reached, checking for appropriate error handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:34:41 -07:00
Lei Xu
a38f784081 chore: add numpy as dependency (#2308) 2025-04-04 10:33:39 -07:00
Will Jones
647dee4e94 ci: check release builds when we change dependencies (#2299)
The issue we fixed in https://github.com/lancedb/lancedb/pull/2296 was
caused by an upgrade in dependencies. This could have been caught if we
had run these CI jobs when we did the dependency change.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated our automated pipeline to trigger additional stability checks
when dependency configurations change, ensuring smoother build and
release processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-03 16:19:00 -07:00
Lance Release
0844c2dd64 Updating package-lock.json 2025-04-02 21:23:50 +00:00
Lance Release
fd2692295c Updating package-lock.json 2025-04-02 21:23:34 +00:00
Lance Release
d4ea50fba1 Bump version: 0.19.0-beta.3 → 0.19.0-beta.4 2025-04-02 21:23:19 +00:00
Lance Release
0d42297cf8 Bump version: 0.22.0-beta.3 → 0.22.0-beta.4 2025-04-02 21:23:02 +00:00
Weston Pace
a6d4125cbf feat: upgrade lance to 0.25.3b2 (#2304)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
	- Updated core dependency versions to v0.25.3-beta.2.
	- Enabled additional functionality with a new "dynamodb" feature.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-02 14:22:30 -07:00
Lance Release
5c32a99e61 Updating package-lock.json 2025-04-02 09:28:46 +00:00
Lance Release
cefaa75b24 Updating package-lock.json 2025-04-02 09:28:30 +00:00
Lance Release
bd62c2384f Bump version: 0.19.0-beta.2 → 0.19.0-beta.3 2025-04-02 09:28:14 +00:00
Lance Release
f0bc08c0d7 Bump version: 0.22.0-beta.2 → 0.22.0-beta.3 2025-04-02 09:27:55 +00:00
BubbleCal
e52ac79c69 fix: can't do structured FTS in python (#2300)
missed to support it in `search()` API and there were some pydantic
errors

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced full-text search capabilities by incorporating additional
parameters, enabling more flexible query definitions.
- Extended table search functionality to support full-text queries
alongside existing search types.

- **Tests**
- Introduced new tests that validate both structured and conditional
full-text search behaviors.
- Expanded test coverage for various query types, including MatchQuery,
BoostQuery, MultiMatchQuery, and PhraseQuery.

- **Bug Fixes**
- Fixed a logic issue in query processing to ensure correct handling of
full-text search queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-02 17:27:15 +08:00
Will Jones
f091f57594 ci: fix lancedb musl builds (#2296)
Fixes #2255


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Enhanced the build process to improve performance and reliability
across Linux platforms.
  - Updated environment settings for more accurate compiler integration.
- Activated previously inactive build configurations to support advanced
feature support.
- Added support for the x86_64 architecture on Linux systems utilizing
the musl C library.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 14:44:27 -07:00
Lance Release
a997fd4108 Updating package-lock.json 2025-04-01 17:28:57 +00:00
Lance Release
1486514ccc Updating package-lock.json 2025-04-01 17:28:40 +00:00
Lance Release
a505bc3965 Bump version: 0.19.0-beta.1 → 0.19.0-beta.2 2025-04-01 17:28:21 +00:00
Lance Release
c1738250a3 Bump version: 0.22.0-beta.1 → 0.22.0-beta.2 2025-04-01 17:27:57 +00:00
Weston Pace
1ee63984f5 feat: allow FSB to be used for btree indices (#2297)
We recently allowed this for lance but there was a check in lancedb as
well that was preventing it

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for indexing fixed-size binary data using B-tree
structures for efficient data storage and retrieval.
- **Tests**
- Implemented automated tests to ensure the new binary indexing works
correctly and meets the expected configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 10:27:22 -07:00
Lance Release
2eb2c8862a Updating package-lock.json 2025-04-01 14:27:26 +00:00
Lance Release
4ea8e178d3 Updating package-lock.json 2025-04-01 14:27:07 +00:00
Lance Release
e4485a630e Bump version: 0.19.0-beta.0 → 0.19.0-beta.1 2025-04-01 14:26:47 +00:00
Lance Release
fb95f9b3bd Bump version: 0.22.0-beta.0 → 0.22.0-beta.1 2025-04-01 14:26:28 +00:00
Weston Pace
625bab3f21 feat: update to lance 0.25.3b1 (#2294)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependency versions for improved performance and
compatibility.

- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.

- **Bug Fixes**
  - Improved error handling and reporting for full-text search queries.

- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
2025-04-01 06:36:42 -07:00
Will Jones
e59f9382a0 ci: deprecate vectordb each release (#2292)
I released each time we published, the new package was no longer
deprecated. This re-deprecated the package after a new publish.
2025-03-31 12:03:04 -07:00
Lance Release
fdee7ba477 Updating package-lock.json 2025-03-30 19:09:17 +00:00
Lance Release
c44fa3abc4 Updating package-lock.json 2025-03-30 18:05:07 +00:00
Lance Release
fc43aac0ed Updating package-lock.json 2025-03-30 18:04:51 +00:00
Lance Release
e67cd0baf9 Bump version: 0.18.3-beta.0 → 0.19.0-beta.0 2025-03-30 18:04:32 +00:00
Lance Release
26dab93f2a Bump version: 0.21.3-beta.0 → 0.22.0-beta.0 2025-03-30 18:04:14 +00:00
LuQQiu
b9bdb8d937 fix: fix remote restore api to always checkout latest version (#2291)
Fix restore to always checkout latest version, following local restore
api implementation

a1d1833a40/rust/lancedb/src/table.rs (L1910)
Otherwise
table.create_table -> version 1
table.add_table -> version 2
table.checkout(1), table.restore() -> the version remains at 1 (should
checkout_latest inside restore method to update version to latest
version and allow write operation)
table.checkout_latest() -> version is 3
can do write operations
2025-03-29 22:46:57 -07:00
LuQQiu
a1d1833a40 feat: add analyze_plan api (#2280)
add analyze plan api to allow executing the queries and see runtime
metrics.
Which help identify the query IO overhead and help identify query
slowness
2025-03-28 14:28:52 -07:00
Will Jones
a547c523c2 feat!: change default read_consistency_interval=5s (#2281)
Previously, when we loaded the next version of the table, we would block
all reads with a write lock. Now, we only do that if
`read_consistency_interval=0`. Otherwise, we load the next version
asynchronously in the background. This should mean that
`read_consistency_interval > 0` won't have a meaningful impact on
latency.

Along with this change, I felt it was safe to change the default
consistency interval to 5 seconds. The current default is `None`, which
means we will **never** check for a new version by default. I think that
default is contrary to most users expectations.
2025-03-28 11:04:31 -07:00
Lance Release
dc8b75feab Updating package-lock.json 2025-03-28 17:15:17 +00:00
Lance Release
c1600cdc06 Updating package-lock.json 2025-03-28 16:04:01 +00:00
Lance Release
f5dee46970 Updating package-lock.json 2025-03-28 16:03:46 +00:00
Lance Release
346cbf8bf7 Bump version: 0.18.2-beta.0 → 0.18.3-beta.0 2025-03-28 16:03:31 +00:00
Lance Release
3c7dfe9f28 Bump version: 0.21.2-beta.0 → 0.21.3-beta.0 2025-03-28 16:03:17 +00:00
Lei Xu
f52d05d3fa feat: add columns using pyarrow schema (#2284) 2025-03-28 08:51:50 -07:00
vinoyang
c321cccc12 chore(java): make rust release to be a switch option (#2277) 2025-03-28 11:26:24 +08:00
LuQQiu
cba14a5743 feat: add restore remote api (#2282) 2025-03-27 16:33:52 -07:00
vinoyang
72057b743d chore(java): introduce spotless plugin (#2278) 2025-03-27 10:38:39 +08:00
LuQQiu
698f329598 feat: add explain plan remote api (#2263)
Add explain plan remote api
2025-03-26 11:22:40 -07:00
BubbleCal
79fa745130 feat: upgrade lance to v0.25.1-beta.3 (#2276)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-26 23:14:27 +08:00
vinoyang
2ad71bdeca fix(java): make test work for jdk8 (#2269) 2025-03-25 10:57:49 -07:00
vinoyang
7c13615096 fix(java): add .gitignore file (#2270) 2025-03-25 10:56:08 -07:00
Wyatt Alt
f882f5b69a fix: update Query pydoc (#2273)
Removes reference of nonexistent method.
2025-03-25 08:50:23 -07:00
Benjamin Clavié
a68311a893 fix: answerdotai rerankers argument passing (#2117)
This fixes an issue for people wishing to use different kinds of
rerankers in lancedb via AnswerDotAI rerankers. Currently, the arguments
are passed sequentially, but they don't match the[Reranker class
implementation](d604a8c47d/rerankers/reranker.py (L179)):
the second argument is expected to be an optional "lang" for default
models, while model_type should be passed explicitly.

The one line changes in this PR fixes it and enables the use of other
methods (eg LLMs-as-rerankers)
2025-03-24 12:31:59 +05:30
Ayush Chaurasia
846a5cea33 fix: handle light and dark mode logo (#2265) 2025-03-22 10:21:05 -07:00
QianZhu
e3dec647b5 docs: replace banner as an image (#2262) 2025-03-21 18:35:35 -07:00
QianZhu
c58104cecc docs: add banner for LanceDB Cloud in public beta (#2261) 2025-03-21 17:54:34 -07:00
QianZhu
b3b5362632 docs: replace Lancedb Cloud link (#2259)
* direct users to cloud.lancedb.com since LanceDB Cloud is in public
beta
* removed the `cast vector dimension` from alter columns as we don't
support it
2025-03-21 17:43:00 -07:00
Will Jones
abe06fee3d feat(python): warn on fork (#2258)
Closes #768
2025-03-21 17:18:10 -07:00
Will Jones
93a82fd371 ci: allow dry run on PR to Python release (#2245)
This just makes it easier to test in the future.
2025-03-21 16:14:32 -07:00
Will Jones
0d379e6ffa ci(node): setup URL so auth token is picked up (#2257)
Should fix failure seen here:
https://github.com/lancedb/lancedb/actions/runs/13999958170/job/39207039825
2025-03-21 16:14:24 -07:00
Lance Release
e1388bdfdd Updating package-lock.json 2025-03-21 20:46:53 +00:00
Lance Release
315a24c2bc Updating package-lock.json 2025-03-21 20:03:43 +00:00
Lance Release
6dd4cf6038 Updating package-lock.json 2025-03-21 20:03:27 +00:00
Lance Release
f97e751b3c Bump version: 0.18.1 → 0.18.2-beta.0 2025-03-21 20:02:59 +00:00
Lance Release
e803a626a1 Bump version: 0.21.1 → 0.21.2-beta.0 2025-03-21 20:02:25 +00:00
Weston Pace
9403254442 feat: add to_query_object method (#2239)
This PR adds a `to_query_object` method to the various query builders
(except not hybrid queries yet). This makes it possible to inspect the
query that is built.

In addition this PR does some normalization between the sync and async
query paths. A few custom defaults were removed in favor of None (with
the default getting set once, in rust).

Also, the synchronous to_batches method will now actually stream results

Also, the remote API now defaults to prefiltering
2025-03-21 13:01:51 -07:00
Will Jones
b2a38ac366 fix: make pylance optional again (#2209)
The two remaining blockers were:

* A method `with_embeddings` that was deprecated a year ago
* A typecheck for `LanceDataset`
2025-03-21 11:26:32 -07:00
BubbleCal
bdb6c09c3b feat: support binary vector and IVF_FLAT in TypeScript (#2221)
resolve #2218

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:57:08 -07:00
Will Jones
2bfdef2624 ci: refactor node releases (#2223)
This PR fixes build issues associated with `aws-lc-rs`, while
simplifying the build process. Previously, we used custom scripts for
the musl and Windows ARM builds. These were complicated and prone to
breaking. This PR switches to a setup that mirrors
https://github.com/napi-rs/package-template/blob/main/.github/workflows/CI.yml.

* linux glibc and musl builds now use the Docker images provided by the
napi project
* Windows ARM build now just cross compiles from Windows x64, which
turns out to work quite well.
2025-03-21 10:56:29 -07:00
Samuel Colvin
7982d5c082 fix: correct rust install docs (#2253)
I'm pretty sure you mean `cargo add lancedb` here, `cargo install
lancedb` fails right now.
2025-03-21 10:12:53 -07:00
BubbleCal
7ff6ec7fe3 feat: upgrade to lance v0.25.0-beta.5 (#2248)
- adds `loss` into the index stats for vector index
- now `optimize` can retrain the vector index

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:12:23 -07:00
Ayush Chaurasia
ba1ded933a fix: add better check for empty results in hybrid search (#2252)
fixes: https://github.com/lancedb/lancedb/issues/2249
2025-03-21 13:05:05 +05:30
Will Jones
b595d8a579 fix(nodejs): workaround for apache-arrow null vector issue (#2244)
Fixes #2240
2025-03-20 08:07:10 -07:00
Will Jones
2a1d6d8abf ci: simplify windows builds (#2243)
We soon won't rely on cross compiling from Linux to windows, so can
remove this check. Instead, check that we can cross compile from Windows
between architectures.
2025-03-20 08:06:56 -07:00
Will Jones
440a466a13 ci: remove OpenSSL as dependency in favor of rustls (#2242)
`object_store` already hard codes `rustls` as the TLS implementation, so
we have been shipping a mix of `rustls` and `openssl`. For simplicity of
builds, we should consolidate to one, and that has to be `rustls`.
2025-03-20 08:06:45 -07:00
Ayush Chaurasia
b9afd9c860 docs: add late interaction, multi-vector guide & link example (#2231)
1/2 docs update for this week. Addesses issues from this docs epic -
https://github.com/lancedb/lancedb/issues/1476
2025-03-20 20:29:32 +05:30
Will Jones
a6b6f6a806 ci: drop vectordb support for musl, windows ARM (#2241)
vectordb is deprecated, and these platforms are particularly difficult
to maintain. Removing now to prevent further headaches.

We will keep these platforms supported on `@lancedb/lancedb`.
2025-03-19 12:23:46 -07:00
Ayush Chaurasia
ae1548b507 docs: add cloud & enterprise cta (#2235)
2/2 docs update this week
- Add cloud & enterprise CTA
- remove outdated projects/examples from landing page
2025-03-19 10:55:05 -07:00
Weston Pace
4e03ee82bc refactor: rework catalog/database options (#2213)
The `ConnectRequest` has a set of properties that only make sense for
listing databases / catalogs and a set of properties that only make
sense for remote databases.

This PR reduces all options to a single `HashMap<String, String>`. This
makes it easier to add new database / catalog implementations and makes
it clearer to users which options are applicable in which situations.

I don't believe there are any breaking changes here. The closest thing
is that I placed the `ConnectBuilder` methods `api_key`, `region`, and
`host_override` behind a `remote` feature gate. This is not strictly
needed and I could remove the feature gate but it seemed appropriate.
Since using these methods without the remote feature would have been
meaningless I don't feel this counts as a breaking change.

We could look at removing these methods entirely from the
`ConnectBuilder` (and encouraging users to use `RemoteDatabaseOptions`
instead) but I'm not sure how I feel about that.

Another approach we could take is to move these methods into a
`RemoteConnectBuilderExt` trait (and there could be a similar
`ListingConnectBuilderExt` trait to add methods for the listing database
/ catalog).

For now though my main goal is to simplify `ConnectRequest` as much as
possible (I see this being part of the key public API for database /
catalog integrations, similar to the `BaseTable`, `Catalog`, and
`Database` traits and I'd like it to be simple).
2025-03-18 10:13:59 -07:00
Weston Pace
46a6846d07 refactor: remove dataset reference from base table (#2226) 2025-03-17 06:27:33 -07:00
Will Jones
a207213358 fix: insert structs in non-alphabetical order (#2222)
Closes #2114

Starting in #1965, we no longer pass the table schema into
`pa.Table.from_pylist()`. This means PyArrow is choosing the order of
the struct subfields, and apparently it does them in alphabetical order.
This is fine in theory, since in Lance we support providing fields in
any order. However, before we pass it to Lance, we call
`pa.Table.cast()` to align column types to the table types.
`pa.Table.cast()` is strict about field order, so we need to create a
cast target schema that aligns with the input data. We were doing this
at the top-level fields, but weren't doing this in nested fields. This
PR adds support to do this for nested ones.
2025-03-13 14:46:05 -07:00
BubbleCal
6c321c694a feat: upgrade lance to 0.25.0-beta2 (#2220)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-13 14:12:54 -07:00
Bob Liu
5c00b2904c feat: add get dataset method on NativeTable (#2021)
I want to public the dataset method from native table, then I can use
more lance method like order_by which is not exposed in the lancedb
crate.
2025-03-13 11:15:28 -07:00
Gagan Bhullar
14677d7c18 fix: metric type inconsistency (#2122)
PR fixes #2113

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-03-12 10:28:37 -07:00
Martin Schorfmann
dd22a379b2 fix: use Self return type annotation for abstract query builder (#2127)
Hello LanceDB team,

while developing using `lancedb` as a library I encountered a typing
problem affecting IDE hints and completions during development.

---

## Current Situation

Currently, the abstract base class `lancedb.query:LanceQueryBuilder`
uses method chaining to build up the search parameters, where the
methods have `LanceQueryBuilder` as a return type hint.

This leads to two issues:
1. Implementing subclasses of `LanceQueryBuilder` need to override
methods to modify the return type hint, even when they don't need to
change its implementation, just to ensure adequate IDE hints and
completions.
2. When using method chaining the first method directly inherited from
the abstract `LanceQueryBuilder` causes the inferred type to switch back
to `LanceQueryBuilder`. So even when the type starts from
`lancdb.table:LanceTable.search(query_type="vector", ...)` and therefor
correctly is inferred as `LanceVectorQueryBuilder`, after calling e.g.
`LanceVectorQueryBuilder.limit(...)` it is seen as the abstract
`LanceQueryBuilder` from that point on.

### Example of current situation


![image](https://github.com/user-attachments/assets/09678727-8722-43bd-a8a2-67d9b5fc0db5)

## Proposed changes

I propose to change the return type hints of the corresponding methods
(including classmethod `create()`) in the abstract base class
`LanceQueryBuilder` from `LanceQueryBuilder` to `Self`.
`Self` is already imported in the module:

```py
    if sys.version_info >= (3, 11):
        from typing import Self
    else:
        from typing_extensions import Self
```

### Further possible changes

Additionally, the implementing subclasses could also change the return
type hints to `Self` to potentially allow for further inheritance
easily.
> [!NOTE]
> **However this is not part of this pull request as of writing.**

### Example after proposed changes


![image](https://github.com/user-attachments/assets/a9aea636-e426-477a-86ee-2dad3af2876f)

---

Best regards
Martin
2025-03-12 10:08:25 -07:00
Will Jones
7747c9bcbf feat(node): parse arrow types in alterColumns() (#2208)
Previously, users could only specify new data types in `alterColumns` as
strings:

```ts
await tbl.alterColumns([
  path: "price",
  dataType: "float"
]);
```

But this has some problems:

1. It wasn't clear what were valid types
2. It was impossible to specify nested types, like lists and vector
columns.

This PR changes it to take an Arrow data type, similar to how the Python
API works. This allows casting vector types:

```ts
await tbl.alterColumns([
  {
    path: "vector",
    dataType: new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float16(), false),
    ),
  },
]);
```

Closes #2185
2025-03-12 09:57:36 -07:00
QianZhu
c9d6fc43a6 docs: use bypass_vector_index() instead of use_index=false (#2115) 2025-03-12 09:31:09 -07:00
Martin Schorfmann
581bcfbb88 docs: fix docstring of EmbeddingFunction (#2118)
Hello LanceDB team,

---

I have fixed a discrepancy in the class docstring of
`lancedb.embeddings.base:EmbeddingFunction` and made consistency
alignments to that docstring.

### Changes made

1. The docstring referred to the abstract method
`get_source_embeddings()`.
  This method does not exist in the repository at the current state.
I have changed the mention to refer to the actual abstract method
`compute_source_embeddings()`.
2. Also, I aligned the consistency within the ordered list which is
describing the methods to be implemented by concrete embedding
functions.

---

Thank you for developing this useful library. 👍

Best regards
Martin
2025-03-12 09:30:01 -07:00
vinoyang
3750639b5f feat(rust): add connect_catalog method to support connect catalog via url (#2177) 2025-03-12 05:19:03 -07:00
Lance Release
e744d54460 Updating package-lock.json 2025-03-11 14:00:55 +00:00
Lance Release
9d1ce4b5a5 Updating package-lock.json 2025-03-11 13:15:18 +00:00
Lance Release
729ce5e542 Updating package-lock.json 2025-03-11 13:15:03 +00:00
Lance Release
de6739e7ec Bump version: 0.18.1-beta.0 → 0.18.1 2025-03-11 13:14:49 +00:00
Lance Release
495216efdb Bump version: 0.18.0 → 0.18.1-beta.0 2025-03-11 13:14:44 +00:00
Lance Release
a3b45a4d00 Bump version: 0.21.1-beta.0 → 0.21.1 2025-03-11 13:14:30 +00:00
Lance Release
c316c2f532 Bump version: 0.21.0 → 0.21.1-beta.0 2025-03-11 13:14:29 +00:00
Weston Pace
3966b16b63 fix: restore pylance as mandatory dependency (#2204)
We attempted to make pylance optional in
https://github.com/lancedb/lancedb/pull/2156 but it appears this did not
quite work. Users are unable to use lancedb from a fresh install. This
reverts the optional-ness so we can get back in a working state while we
fix the issue.
2025-03-11 06:13:52 -07:00
Lance Release
5661cc15ac Updating package-lock.json 2025-03-10 23:53:56 +00:00
Lance Release
4e7220400f Updating package-lock.json 2025-03-10 23:13:52 +00:00
Lance Release
ae4928fe77 Updating package-lock.json 2025-03-10 23:13:36 +00:00
Lance Release
e80a405dee Bump version: 0.18.0-beta.1 → 0.18.0 2025-03-10 23:13:18 +00:00
Lance Release
a53e19e386 Bump version: 0.18.0-beta.0 → 0.18.0-beta.1 2025-03-10 23:13:13 +00:00
Lance Release
c0097c5f0a Bump version: 0.21.0-beta.2 → 0.21.0 2025-03-10 23:12:56 +00:00
Lance Release
c199708e64 Bump version: 0.21.0-beta.1 → 0.21.0-beta.2 2025-03-10 23:12:56 +00:00
Weston Pace
4a47150ae7 feat: upgrade to lance 0.24.1 (#2199) 2025-03-10 15:18:37 -07:00
Wyatt Alt
f86b20a564 fix: delete tables from DDB on drop_all_tables (#2194)
Prior to this commit, issuing drop_all_tables on a listing database with
an external manifest store would delete physical tables but leave
references behind in the manifest store. The table drop would succeed,
but subsequent creation of a table with the same name would fail with a
conflict.

With this patch, the external manifest store is updated to account for
the dropped tables so that dropped table names can be reused.
2025-03-10 15:00:53 -07:00
msu-reevo
cc81f3e1a5 fix(python): typing (#2167)
@wjones127 is there a standard way you guys setup your virtualenv? I can
either relist all the dependencies in the pyright precommit section, or
specify a venv, or the user has to be in the virtual environment when
they run git commit. If the venv location was standardized or a python
manager like `uv` was used it would be easier to avoid duplicating the
pyright dependency list.

Per your suggestion, in `pyproject.toml` I added in all the passing
files to the `includes` section.

For ruff I upgraded the version and removed "TCH" which doesn't exist as
an option.

I added a `pyright_report.csv` which contains a list of all files sorted
by pyright errors ascending as a todo list to work on.

I fixed about 30 issues in `table.py` stemming from str's being passed
into methods that required a string within a set of string Literals by
extracting them into `types.py`

Can you verify in the rust bridge that the schema should be a property
and not a method here? If it's a method, then there's another place in
the code where `inner.schema` should be `inner.schema()`
``` python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
```

Also unless the `_lancedb.pyi` file is wrong, then there is no
`__anext__` here for `__inner` when it's not an `AsyncGenerator` and
only `next` is defined:
``` python
    async def __anext__(self) -> pa.RecordBatch:
        return await self._inner.__anext__()
        if isinstance(self._inner, AsyncGenerator):
            batch = await self._inner.__anext__()
        else:
            batch = await self._inner.next()
        if batch is None:
            raise StopAsyncIteration
        return batch
```
in the else statement, `_inner` is a `RecordBatchStream`
```python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
    async def next(self) -> Optional[pa.RecordBatch]: ...
```

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-03-10 09:01:23 -07:00
Weston Pace
bc49c4db82 feat: respect datafusion's batch size when running as a table provider (#2187)
Datafusion makes the batch size available as part of the `SessionState`.
We should use that to set the `max_batch_length` property in the
`QueryExecutionOptions`.
2025-03-07 05:53:36 -08:00
Weston Pace
d2eec46f17 feat: add support for streaming input to create_table (#2175)
This PR makes it possible to create a table using an asynchronous stream
of input data. Currently only a synchronous iterator is supported. There
are a number of follow-ups not yet tackled:

* Support for embedding functions (the embedding functions wrapper needs
to be re-written to be async, should be an easy lift)
* Support for async input into the remote table (the make_ipc_batch
needs to change to accept async input, leaving undone for now because I
think we want to support actual streaming uploads into the remote table
soon)
* Support for async input into the add function (pretty essential, but
it is a fairly distinct code path, so saving for a different PR)
2025-03-06 11:55:00 -08:00
Lance Release
51437bc228 Bump version: 0.21.0-beta.0 → 0.21.0-beta.1 2025-03-06 19:23:06 +00:00
Bert
fa53cfcfd2 feat: support modifying field metadata in lancedb python (#2178) 2025-03-04 16:58:46 -05:00
vinoyang
374fe0ad95 feat(rust): introduce Catalog trait and implement ListingCatalog (#2148)
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-03-03 20:22:24 -08:00
BubbleCal
35e5b84ba9 chore: upgrade lance to 0.24.0-beta.1 (#2171)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-03 12:32:12 +08:00
Lei Xu
7c12d497b0 ci: bump python to 3.12 in GHA (#2169) 2025-03-01 17:24:02 -08:00
ayao227
dfe4ba8dad chore: add reo integration (#2149)
This PR adds reo integration to the lancedb documentation website.
2025-02-28 07:51:34 -08:00
Weston Pace
fa1b9ad5bd fix: don't use with_schema to remove schema metadata (#2162)
It seems that `RecordBatch::with_schema` is unable to remove schema
metadata from a batch. It fails with the error `target schema is not
superset of current schema`.

I'm not sure how the `test_metadata_erased` test is passing. Strangely,
the metadata was not present by the time the batch arrived at the
metadata eraser. I think maybe the schema metadata is only present in
the batch if there is a filter.

I've created a new unit test that makes sure the metadata is erased if
we have a filter also
2025-02-27 10:24:00 -08:00
BubbleCal
8877eb020d feat: record the server version for remote table (#2147)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-02-27 15:55:59 +08:00
Will Jones
01e4291d21 feat(python): drop hard dependency on pylance (#2156)
Closes #1793
2025-02-26 15:53:45 -08:00
Lance Release
ab3ea76ad1 Updating package-lock.json 2025-02-26 21:23:39 +00:00
Lance Release
728ef8657d Updating package-lock.json 2025-02-26 20:11:37 +00:00
Lance Release
0b13901a16 Updating package-lock.json 2025-02-26 20:11:22 +00:00
Lance Release
84b110e0ef Bump version: 0.17.0 → 0.18.0-beta.0 2025-02-26 20:11:07 +00:00
Lance Release
e1836e54e3 Bump version: 0.20.0 → 0.21.0-beta.0 2025-02-26 20:10:54 +00:00
Weston Pace
4ba5326880 feat: reapply upgrade lance to v0.23.3-beta.1 (#2157)
This reverts commit 2f0c5baea2.

---------

Co-authored-by: Lu Qiu <luqiujob@gmail.com>
2025-02-26 11:44:11 -08:00
Lance Release
b036a69300 Updating package-lock.json 2025-02-26 19:32:22 +00:00
Will Jones
5b12a47119 feat!: revert query limit to be unbounded for scans (#2151)
In earlier PRs (#1886, #1191) we made the default limit 10 regardless of
the query type. This was confusing for users and in many cases a
breaking change. Users would have queries that used to return all
results, but instead only returned the first 10, causing silent bugs.

Part of the cause was consistency: the Python sync API seems to have
always had a limit of 10, while newer APIs (Python async and Nodejs)
didn't.

This PR sets the default limit only for searches (vector search, FTS),
while letting scans (even with filters) be unbounded. It does this
consistently for all SDKs.

Fixes #1983
Fixes #1852
Fixes #2141
2025-02-26 10:32:14 -08:00
Lance Release
769d483e50 Updating package-lock.json 2025-02-26 18:16:59 +00:00
Lance Release
9ecb11fe5a Updating package-lock.json 2025-02-26 18:16:42 +00:00
221 changed files with 18178 additions and 4475 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.17.0"
current_version = "0.20.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.
@@ -87,26 +87,11 @@ glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-arm64-musl\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-arm64-musl\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-musl\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-musl\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-x64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-x64-msvc\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-arm64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-arm64-msvc\": \"{current_version}\""
# Cargo files
# ------------
[[tool.bumpversion.files]]

View File

@@ -34,6 +34,10 @@ rustflags = ["-C", "target-cpu=haswell", "-C", "target-feature=+avx2,+fma,+f16c"
[target.x86_64-unknown-linux-musl]
rustflags = ["-C", "target-cpu=haswell", "-C", "target-feature=-crt-static,+avx2,+fma,+f16c"]
[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"
rustflags = ["-C", "target-feature=-crt-static"]
[target.aarch64-apple-darwin]
rustflags = ["-C", "target-cpu=apple-m1", "-C", "target-feature=+neon,+fp16,+fhm,+dotprod"]
@@ -44,4 +48,4 @@ rustflags = ["-Ctarget-feature=+crt-static"]
# Experimental target for Arm64 Windows
[target.aarch64-pc-windows-msvc]
rustflags = ["-Ctarget-feature=+crt-static"]
rustflags = ["-Ctarget-feature=+crt-static"]

View File

@@ -36,8 +36,7 @@ runs:
args: ${{ inputs.args }}
before-script-linux: |
set -e
yum install -y openssl-devel \
&& curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-$(uname -m).zip > /tmp/protoc.zip \
curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-$(uname -m).zip > /tmp/protoc.zip \
&& unzip /tmp/protoc.zip -d /usr/local \
&& rm /tmp/protoc.zip
- name: Build Arm Manylinux Wheel
@@ -52,7 +51,7 @@ runs:
args: ${{ inputs.args }}
before-script-linux: |
set -e
yum install -y openssl-devel clang \
yum install -y clang \
&& curl -L https://github.com/protocolbuffers/protobuf/releases/download/v24.4/protoc-24.4-linux-aarch_64.zip > /tmp/protoc.zip \
&& unzip /tmp/protoc.zip -d /usr/local \
&& rm /tmp/protoc.zip

View File

@@ -18,17 +18,24 @@ concurrency:
group: "pages"
cancel-in-progress: true
env:
# This reduces the disk space needed for the build
RUSTFLAGS: "-C debuginfo=0"
# according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html
# CI builds are faster with incremental disabled.
CARGO_INCREMENTAL: "0"
jobs:
# Single deploy job since we're just deploying
build:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: buildjet-8vcpu-ubuntu-2204
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependecies needed for ubuntu
- name: Install dependencies needed for ubuntu
run: |
sudo apt install -y protobuf-compiler libssl-dev
rustup update && rustup default
@@ -38,6 +45,7 @@ jobs:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "docs/requirements.txt"
- uses: Swatinem/rust-cache@v2
- name: Build Python
working-directory: python
run: |
@@ -49,7 +57,6 @@ jobs:
node-version: 20
cache: 'npm'
cache-dependency-path: node/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install node dependencies
working-directory: node
run: |

View File

@@ -43,7 +43,7 @@ jobs:
- uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: "1.79.0"
toolchain: "1.81.0"
cache-workspaces: "./java/core/lancedb-jni"
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
@@ -97,7 +97,7 @@ jobs:
- name: Dry run
if: github.event_name == 'pull_request'
run: |
mvn --batch-mode -DskipTests package
mvn --batch-mode -DskipTests -Drust.release.build=true package
- name: Set github
run: |
git config --global user.email "LanceDB Github Runner"
@@ -108,7 +108,7 @@ jobs:
echo "use-agent" >> ~/.gnupg/gpg.conf
echo "pinentry-mode loopback" >> ~/.gnupg/gpg.conf
export GPG_TTY=$(tty)
mvn --batch-mode -DskipTests -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
mvn --batch-mode -DskipTests -Drust.release.build=true -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
env:
SONATYPE_USER: ${{ secrets.SONATYPE_USER }}
SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }}

View File

@@ -35,6 +35,9 @@ jobs:
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
@@ -68,6 +71,9 @@ jobs:
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
@@ -110,4 +116,3 @@ jobs:
-Djdk.reflect.useDirectMethodHandle=false \
-Dio.netty.tryReflectionSetAccessible=true"
JAVA_HOME=$JAVA_17 mvn clean test

View File

@@ -84,6 +84,7 @@ jobs:
run: |
pip install bump-my-version PyGithub packaging
bash ci/bump_version.sh ${{ inputs.type }} ${{ inputs.bump-minor }} v $COMMIT_BEFORE_BUMP
bash ci/update_lockfiles.sh --amend
- name: Push new version tag
if: ${{ !inputs.dry_run }}
uses: ad-m/github-push-action@master
@@ -92,11 +93,3 @@ jobs:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
branch: ${{ github.ref }}
tags: true
- uses: ./.github/workflows/update_package_lock
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
- uses: ./.github/workflows/update_package_lock_nodejs
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -47,6 +47,9 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Lint
run: |
cargo fmt --all -- --check
@@ -113,7 +116,7 @@ jobs:
set -e
npm ci
npm run docs
if ! git diff --exit-code; then
if ! git diff --exit-code -- . ':(exclude)Cargo.lock'; then
echo "Docs need to be updated"
echo "Run 'npm run docs', fix any warnings, and commit the changes."
exit 1

File diff suppressed because it is too large Load Diff

View File

@@ -4,6 +4,11 @@ on:
push:
tags:
- 'python-v*'
pull_request:
# This should trigger a dry run (we skip the final publish step)
paths:
- .github/workflows/pypi-publish.yml
- Cargo.toml # Change in dependency frequently breaks builds
jobs:
linux:
@@ -46,6 +51,7 @@ jobs:
arm-build: ${{ matrix.config.platform == 'aarch64' }}
manylinux: ${{ matrix.config.manylinux }}
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
@@ -75,6 +81,7 @@ jobs:
python-minor-version: 8
args: "--release --strip --target ${{ matrix.config.target }} --features fp16kernels"
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
@@ -96,10 +103,12 @@ jobs:
args: "--release --strip"
vcpkg_token: ${{ secrets.VCPKG_GITHUB_PACKAGES }}
- uses: ./.github/workflows/upload_wheel
if: startsWith(github.ref, 'refs/tags/python-v')
with:
pypi_token: ${{ secrets.LANCEDB_PYPI_API_TOKEN }}
fury_token: ${{ secrets.FURY_TOKEN }}
gh-release:
if: startsWith(github.ref, 'refs/tags/python-v')
runs-on: ubuntu-latest
permissions:
contents: write

View File

@@ -13,6 +13,11 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# Color output for pytest is off by default.
PYTEST_ADDOPTS: "--color=yes"
FORCE_COLOR: "1"
jobs:
lint:
name: "Lint"
@@ -33,13 +38,14 @@ jobs:
python-version: "3.12"
- name: Install ruff
run: |
pip install ruff==0.8.4
pip install ruff==0.9.9
- name: Format check
run: ruff format --check .
- name: Lint
run: ruff check .
doctest:
name: "Doctest"
type-check:
name: "Type Check"
timeout-minutes: 30
runs-on: "ubuntu-22.04"
defaults:
@@ -54,7 +60,36 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"
- name: Install protobuf compiler
run: |
sudo apt update
sudo apt install -y protobuf-compiler
pip install toml
- name: Install dependencies
run: |
python ../ci/parse_requirements.py pyproject.toml --extras dev,tests,embeddings > requirements.txt
pip install -r requirements.txt
- name: Run pyright
run: pyright
doctest:
name: "Doctest"
timeout-minutes: 30
runs-on: "ubuntu-24.04"
defaults:
run:
shell: bash
working-directory: python
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- name: Install protobuf
run: |
@@ -75,8 +110,8 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
python-minor-version: ["9", "11"]
runs-on: "ubuntu-22.04"
python-minor-version: ["9", "12"]
runs-on: "ubuntu-24.04"
defaults:
run:
shell: bash
@@ -101,6 +136,10 @@ jobs:
- uses: ./.github/workflows/run_tests
with:
integration: true
- name: Test without pylance or pandas
run: |
pip uninstall -y pylance pandas
pytest -vv python/tests/test_table.py
# Make sure wheels are not included in the Rust cache
- name: Delete wheels
run: rm -rf target/wheels
@@ -127,7 +166,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
@@ -157,7 +196,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
@@ -168,7 +207,7 @@ jobs:
run: rm -rf target/wheels
pydantic1x:
timeout-minutes: 30
runs-on: "ubuntu-22.04"
runs-on: "ubuntu-24.04"
defaults:
run:
shell: bash
@@ -189,6 +228,7 @@ jobs:
- name: Install lancedb
run: |
pip install "pydantic<2"
pip install pyarrow==16
pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests]
pip install tantivy
- name: Run tests

View File

@@ -24,8 +24,8 @@ runs:
- name: pytest (with integration)
shell: bash
if: ${{ inputs.integration == 'true' }}
run: pytest -m "not slow" -x -v --durations=30 python/python/tests
run: pytest -m "not slow" -vv --durations=30 python/python/tests
- name: pytest (no integration tests)
shell: bash
if: ${{ inputs.integration != 'true' }}
run: pytest -m "not slow and not s3_test" -x -v --durations=30 python/python/tests
run: pytest -m "not slow and not s3_test" -vv --durations=30 python/python/tests

View File

@@ -40,6 +40,9 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
@@ -157,151 +160,33 @@ jobs:
windows:
runs-on: windows-2022
strategy:
matrix:
target:
- x86_64-pc-windows-msvc
- aarch64-pc-windows-msvc
defaults:
run:
working-directory: rust/lancedb
steps:
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install Protoc v21.12
working-directory: C:\
run: choco install --no-progress protoc
- name: Build
run: |
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
7z x protoc.zip
Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
rustup target add ${{ matrix.target }}
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo build --features remote --tests --locked --target ${{ matrix.target }}
- name: Run tests
# Can only run tests when target matches host
if: ${{ matrix.target == 'x86_64-pc-windows-msvc' }}
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo test --features remote --locked
windows-arm64-cross:
# We cross compile in Node releases, so we want to make sure
# this can run successfully.
runs-on: ubuntu-latest
container: alpine:edge
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependencies
run: |
set -e
apk add protobuf-dev curl clang lld llvm19 grep npm bash msitools sed
curl --proto '=https' --tlsv1.3 -sSf https://raw.githubusercontent.com/rust-lang/rustup/refs/heads/master/rustup-init.sh | sh -s -- -y
source $HOME/.cargo/env
rustup target add aarch64-pc-windows-msvc
mkdir -p sysroot
cd sysroot
sh ../ci/sysroot-aarch64-pc-windows-msvc.sh
- name: Check
env:
CC: clang
AR: llvm-ar
C_INCLUDE_PATH: /usr/aarch64-pc-windows-msvc/usr/include
CARGO_BUILD_TARGET: aarch64-pc-windows-msvc
RUSTFLAGS: -Ctarget-feature=+crt-static,+neon,+fp16,+fhm,+dotprod -Clinker=lld -Clink-arg=/LIBPATH:/usr/aarch64-pc-windows-msvc/usr/lib -Clink-arg=arm64rt.lib
run: |
source $HOME/.cargo/env
cargo check --features remote --locked
windows-arm64:
runs-on: windows-4x-arm
steps:
- name: Install Git
run: |
Invoke-WebRequest -Uri "https://github.com/git-for-windows/git/releases/download/v2.44.0.windows.1/Git-2.44.0-64-bit.exe" -OutFile "git-installer.exe"
Start-Process -FilePath "git-installer.exe" -ArgumentList "/VERYSILENT", "/NORESTART" -Wait
shell: powershell
- name: Add Git to PATH
run: |
Add-Content $env:GITHUB_PATH "C:\Program Files\Git\bin"
$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
shell: powershell
- name: Configure Git symlinks
run: git config --global core.symlinks true
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.13"
- name: Install Visual Studio Build Tools
run: |
Invoke-WebRequest -Uri "https://aka.ms/vs/17/release/vs_buildtools.exe" -OutFile "vs_buildtools.exe"
Start-Process -FilePath "vs_buildtools.exe" -ArgumentList "--quiet", "--wait", "--norestart", "--nocache", `
"--installPath", "C:\BuildTools", `
"--add", "Microsoft.VisualStudio.Component.VC.Tools.ARM64", `
"--add", "Microsoft.VisualStudio.Component.VC.Tools.x86.x64", `
"--add", "Microsoft.VisualStudio.Component.Windows11SDK.22621", `
"--add", "Microsoft.VisualStudio.Component.VC.ATL", `
"--add", "Microsoft.VisualStudio.Component.VC.ATLMFC", `
"--add", "Microsoft.VisualStudio.Component.VC.Llvm.Clang" -Wait
shell: powershell
- name: Add Visual Studio Build Tools to PATH
run: |
$vsPath = "C:\BuildTools\VC\Tools\MSVC"
$latestVersion = (Get-ChildItem $vsPath | Sort-Object {[version]$_.Name} -Descending)[0].Name
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\MSVC\$latestVersion\bin\Hostx64\arm64"
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\MSVC\$latestVersion\bin\Hostx64\x64"
Add-Content $env:GITHUB_PATH "C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\arm64"
Add-Content $env:GITHUB_PATH "C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64"
Add-Content $env:GITHUB_PATH "C:\BuildTools\VC\Tools\Llvm\x64\bin"
# Add MSVC runtime libraries to LIB
$env:LIB = "C:\BuildTools\VC\Tools\MSVC\$latestVersion\lib\arm64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\um\arm64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\ucrt\arm64"
Add-Content $env:GITHUB_ENV "LIB=$env:LIB"
# Add INCLUDE paths
$env:INCLUDE = "C:\BuildTools\VC\Tools\MSVC\$latestVersion\include;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\um;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared"
Add-Content $env:GITHUB_ENV "INCLUDE=$env:INCLUDE"
shell: powershell
- name: Install Rust
run: |
Invoke-WebRequest https://win.rustup.rs/x86_64 -OutFile rustup-init.exe
.\rustup-init.exe -y --default-host aarch64-pc-windows-msvc
shell: powershell
- name: Add Rust to PATH
run: |
Add-Content $env:GITHUB_PATH "$env:USERPROFILE\.cargo\bin"
shell: powershell
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install 7-Zip ARM
run: |
New-Item -Path 'C:\7zip' -ItemType Directory
Invoke-WebRequest https://7-zip.org/a/7z2408-arm64.exe -OutFile C:\7zip\7z-installer.exe
Start-Process -FilePath C:\7zip\7z-installer.exe -ArgumentList '/S' -Wait
shell: powershell
- name: Add 7-Zip to PATH
run: Add-Content $env:GITHUB_PATH "C:\Program Files\7-Zip"
shell: powershell
- name: Install Protoc v21.12
working-directory: C:\
run: |
if (Test-Path 'C:\protoc') {
Write-Host "Protoc directory exists, skipping installation"
return
}
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
& 'C:\Program Files\7-Zip\7z.exe' x protoc.zip
shell: powershell
- name: Add Protoc to PATH
run: Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
- name: Run tests
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo test --target aarch64-pc-windows-msvc --features remote --locked
msrv:
# Check the minimum supported Rust version
name: MSRV Check - Rust v${{ matrix.msrv }}

View File

@@ -1,33 +0,0 @@
name: update_package_lock
description: "Update node's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./node
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

View File

@@ -1,33 +0,0 @@
name: update_package_lock_nodejs
description: "Update nodejs's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./nodejs
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

View File

@@ -1,21 +1,27 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.8.4
rev: v0.9.9
hooks:
- id: ruff
- repo: local
hooks:
- id: local-biome-check
name: biome check
entry: npx @biomejs/biome@1.8.3 check --config-path nodejs/biome.json nodejs/
language: system
types: [text]
files: "nodejs/.*"
exclude: nodejs/lancedb/native.d.ts|nodejs/dist/.*|nodejs/examples/.*
- id: ruff
# - repo: https://github.com/RobertCraigie/pyright-python
# rev: v1.1.395
# hooks:
# - id: pyright
# args: ["--project", "python"]
# additional_dependencies: [pyarrow-stubs]
- repo: local
hooks:
- id: local-biome-check
name: biome check
entry: npx @biomejs/biome@1.8.3 check --config-path nodejs/biome.json nodejs/
language: system
types: [text]
files: "nodejs/.*"
exclude: nodejs/lancedb/native.d.ts|nodejs/dist/.*|nodejs/examples/.*

2799
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -21,32 +21,32 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.23.2", "features" = ["dynamodb"] }
lance-io = { version = "=0.23.2" }
lance-index = { version = "=0.23.2" }
lance-linalg = { version = "=0.23.2" }
lance-table = { version = "=0.23.2" }
lance-testing = { version = "=0.23.2" }
lance-datafusion = { version = "=0.23.2" }
lance-encoding = { version = "=0.23.2" }
lance = { "version" = "=0.29.1", "features" = ["dynamodb"], tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
# Note that this one does not include pyarrow
arrow = { version = "53.2", optional = false }
arrow-array = "53.2"
arrow-data = "53.2"
arrow-ipc = "53.2"
arrow-ord = "53.2"
arrow-schema = "53.2"
arrow-arith = "53.2"
arrow-cast = "53.2"
arrow = { version = "55.1", optional = false }
arrow-array = "55.1"
arrow-data = "55.1"
arrow-ipc = "55.1"
arrow-ord = "55.1"
arrow-schema = "55.1"
arrow-arith = "55.1"
arrow-cast = "55.1"
async-trait = "0"
datafusion = { version = "44.0", default-features = false }
datafusion-catalog = "44.0"
datafusion-common = { version = "44.0", default-features = false }
datafusion-execution = "44.0"
datafusion-expr = "44.0"
datafusion-physical-plan = "44.0"
datafusion = { version = "47.0", default-features = false }
datafusion-catalog = "47.0"
datafusion-common = { version = "47.0", default-features = false }
datafusion-execution = "47.0"
datafusion-expr = "47.0"
datafusion-physical-plan = "47.0"
env_logger = "0.11"
half = { "version" = "=2.4.1", default-features = false, features = [
half = { "version" = "=2.5.0", default-features = false, features = [
"num-traits",
] }
futures = "0"
@@ -57,15 +57,16 @@ pin-project = "1.0.7"
snafu = "0.8"
url = "2"
num-traits = "0.2"
rand = "0.8"
rand = "0.9"
regex = "1.10"
lazy_static = "1"
semver = "1.0.25"
# Temporary pins to work around downstream issues
# https://github.com/apache/arrow-rs/commit/2fddf85afcd20110ce783ed5b4cdeb82293da30b
chrono = "=0.4.39"
chrono = "=0.4.41"
# https://github.com/RustCrypto/formats/issues/1684
base64ct = "=1.6.0"
# Workaround for: https://github.com/eira-fransham/crunchy/issues/13
crunchy = "=0.2.2"
# Workaround for: https://github.com/Lokathor/bytemuck/issues/306
bytemuck_derive = ">=1.8.1, <1.9.0"

127
README.md
View File

@@ -1,86 +1,97 @@
<a href="https://cloud.lancedb.com" target="_blank">
<img src="https://github.com/user-attachments/assets/92dad0a2-2a37-4ce1-b783-0d1b4f30a00c" alt="LanceDB Cloud Public Beta" width="100%" style="max-width: 100%;">
</a>
<div align="center">
<p align="center">
<img width="275" alt="LanceDB Logo" src="https://github.com/lancedb/lancedb/assets/5846846/37d7c7ad-c2fd-4f56-9f16-fffb0d17c73a">
[![LanceDB](docs/src/assets/hero-header.png)](https://lancedb.com)
[![Website](https://img.shields.io/badge/-Website-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://lancedb.com/)
[![Blog](https://img.shields.io/badge/Blog-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/-Discord-100000?style=for-the-badge&logo=discord&logoColor=white&labelColor=645cfb&color=645cfb)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/-Twitter-100000?style=for-the-badge&logo=x&logoColor=white&labelColor=645cfb&color=645cfb)](https://twitter.com/lancedb)
[![LinkedIn](https://img.shields.io/badge/-LinkedIn-100000?style=for-the-badge&logo=linkedin&logoColor=white&labelColor=645cfb&color=645cfb)](https://www.linkedin.com/company/lancedb/)
**Developer-friendly, database for multimodal AI**
<a href='https://github.com/lancedb/vectordb-recipes/tree/main' target="_blank"><img alt='LanceDB' src='https://img.shields.io/badge/VectorDB_Recipes-100000?style=for-the-badge&logo=LanceDB&logoColor=white&labelColor=645cfb&color=645cfb'/></a>
<a href='https://lancedb.github.io/lancedb/' target="_blank"><img alt='lancdb' src='https://img.shields.io/badge/DOCS-100000?style=for-the-badge&logo=lancdb&logoColor=white&labelColor=645cfb&color=645cfb'/></a>
[![Blog](https://img.shields.io/badge/Blog-12100E?style=for-the-badge&logoColor=white)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white)](https://twitter.com/lancedb)
[![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20LanceDB%20Guru-006BFF?style=for-the-badge)](https://gurubase.io/g/lancedb)
<img src="docs/src/assets/lancedb.png" alt="LanceDB" width="50%">
</p>
# **The Multimodal AI Lakehouse**
<img max-width="750px" alt="LanceDB Multimodal Search" src="https://github.com/lancedb/lancedb/assets/917119/09c5afc5-7816-4687-bae4-f2ca194426ec">
[**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.github.io/lancedb/) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors)
**The ultimate multimodal data platform for AI/ML applications.**
LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease.
LanceDB is a central location where developers can build, train and analyze their AI workloads.
</p>
</div>
<hr />
<br>
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
## **Demo: Multimodal Search by Keyword, Vector or with SQL**
<img max-width="750px" alt="LanceDB Multimodal Search" src="https://github.com/lancedb/lancedb/assets/917119/09c5afc5-7816-4687-bae4-f2ca194426ec">
The key features of LanceDB include:
## **Star LanceDB to get updates!**
* Production-scale vector search with no servers to manage.
<details>
<summary>⭐ Click here ⭐ to see how fast we're growing!</summary>
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=lancedb/lancedb&theme=dark&type=Date">
<img width="100%" src="https://api.star-history.com/svg?repos=lancedb/lancedb&theme=dark&type=Date">
</picture>
</details>
* Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
## **Key Features**:
* Support for vector similarity search, full-text search and SQL.
- **Fast Vector Search**: Search billions of vectors in milliseconds with state-of-the-art indexing.
- **Comprehensive Search**: Support for vector similarity search, full-text search and SQL.
- **Multimodal Support**: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
- **Advanced Features**: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.
* Native Python and Javascript/Typescript support.
### **Products**:
- **Open Source & Local**: 100% open source, runs locally or in your cloud. No vendor lock-in.
- **Cloud and Enterprise**: Production-scale vector search with no servers to manage. Complete data sovereignty and security.
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
### **Ecosystem**:
- **Columnar Storage**: Built on the Lance columnar format for efficient storage and analytics.
- **Seamless Integration**: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
- **Rich Ecosystem**: Integrations with [**LangChain** 🦜️🔗](https://python.langchain.com/docs/integrations/vectorstores/lancedb/), [**LlamaIndex** 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
* GPU support in building vector index(*).
## **How to Install**:
* Ecosystem integrations with [LangChain 🦜️🔗](https://python.langchain.com/docs/integrations/vectorstores/lancedb/), [LlamaIndex 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
Follow the [Quickstart](https://lancedb.github.io/lancedb/basic/) doc to set up LanceDB locally.
LanceDB's core is written in Rust 🦀 and is built using <a href="https://github.com/lancedb/lance">Lance</a>, an open-source columnar format designed for performant ML workloads.
**API & SDK:** We also support Python, Typescript and Rust SDKs
## Quick Start
| Interface | Documentation |
|-----------|---------------|
| Python SDK | https://lancedb.github.io/lancedb/python/python/ |
| Typescript SDK | https://lancedb.github.io/lancedb/js/globals/ |
| Rust SDK | https://docs.rs/lancedb/latest/lancedb/index.html |
| REST API | https://docs.lancedb.com/api-reference/introduction |
**Javascript**
```shell
npm install @lancedb/lancedb
```
## **Join Us and Contribute**
```javascript
import * as lancedb from "@lancedb/lancedb";
We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.
const db = await lancedb.connect("data/sample-lancedb");
const table = await db.createTable("vectors", [
{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
{ id: 2, vector: [1.1, 1.2], item: "bar", price: 50 },
], {mode: 'overwrite'});
If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our [**Discord**](https://discord.gg/G5DcmnZWKB) server.
[**Check out the GitHub Issues**](https://github.com/lancedb/lancedb/issues) if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.
## **Contributors**
<a href="https://github.com/lancedb/lancedb/graphs/contributors">
<img src="https://contrib.rocks/image?repo=lancedb/lancedb" />
</a>
const query = table.vectorSearch([0.1, 0.3]).limit(2);
const results = await query.toArray();
## **Stay in Touch With Us**
<div align="center">
// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.query().where("price >= 10").toArray();
```
</br>
**Python**
```shell
pip install lancedb
```
[![Website](https://img.shields.io/badge/-Website-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://lancedb.com/)
[![Blog](https://img.shields.io/badge/Blog-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/-Discord-100000?style=for-the-badge&logo=discord&logoColor=white&labelColor=645cfb&color=645cfb)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/-Twitter-100000?style=for-the-badge&logo=x&logoColor=white&labelColor=645cfb&color=645cfb)](https://twitter.com/lancedb)
[![LinkedIn](https://img.shields.io/badge/-LinkedIn-100000?style=for-the-badge&logo=linkedin&logoColor=white&labelColor=645cfb&color=645cfb)](https://www.linkedin.com/company/lancedb/)
```python
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()
```
## Blogs, Tutorials & Videos
* 📈 <a href="https://blog.lancedb.com/benchmarking-random-access-in-lance/">2000x better performance with Lance over Parquet</a>
* 🤖 <a href="https://github.com/lancedb/vectordb-recipes/tree/main/examples/Youtube-Search-QA-Bot">Build a question and answer bot with LanceDB</a>
</div>

View File

@@ -1,21 +0,0 @@
#!/bin/bash
set -e
ARCH=${1:-x86_64}
# We pass down the current user so that when we later mount the local files
# into the container, the files are accessible by the current user.
pushd ci/manylinux_node
docker build \
-t lancedb-node-manylinux-$ARCH \
--build-arg="ARCH=$ARCH" \
--build-arg="DOCKER_USER=$(id -u)" \
--progress=plain \
.
popd
# We turn on memory swap to avoid OOM killer
docker run \
-v $(pwd):/io -w /io \
--memory-swap=-1 \
lancedb-node-manylinux-$ARCH \
bash ci/manylinux_node/build_lancedb.sh $ARCH

View File

@@ -1,34 +0,0 @@
# Builds the macOS artifacts (nodejs binaries).
# Usage: ./ci/build_macos_artifacts_nodejs.sh [target]
# Targets supported: x86_64-apple-darwin aarch64-apple-darwin
set -e
prebuild_rust() {
# Building here for the sake of easier debugging.
pushd rust/lancedb
echo "Building rust library for $1"
export RUST_BACKTRACE=1
cargo build --release --target $1
popd
}
build_node_binaries() {
pushd nodejs
echo "Building nodejs library for $1"
export RUST_TARGET=$1
npm run build-release
popd
}
if [ -n "$1" ]; then
targets=$1
else
targets="x86_64-apple-darwin aarch64-apple-darwin"
fi
echo "Building artifacts for targets: $targets"
for target in $targets
do
prebuild_rust $target
build_node_binaries $target
done

View File

@@ -1,5 +1,5 @@
# Many linux dockerfile with Rust, Node, and Lance dependencies installed.
# This container allows building the node modules native libraries in an
# This container allows building the node modules native libraries in an
# environment with a very old glibc, so that we are compatible with a wide
# range of linux distributions.
ARG ARCH=x86_64
@@ -9,10 +9,6 @@ FROM quay.io/pypa/manylinux_2_28_${ARCH}
ARG ARCH=x86_64
ARG DOCKER_USER=default_user
# Install static openssl
COPY install_openssl.sh install_openssl.sh
RUN ./install_openssl.sh ${ARCH} > /dev/null
# Protobuf is also installed as root.
COPY install_protobuf.sh install_protobuf.sh
RUN ./install_protobuf.sh ${ARCH}
@@ -21,7 +17,7 @@ ENV DOCKER_USER=${DOCKER_USER}
# Create a group and user, but only if it doesn't exist
RUN echo ${ARCH} && id -u ${DOCKER_USER} >/dev/null 2>&1 || adduser --user-group --create-home --uid ${DOCKER_USER} build_user
# We switch to the user to install Rust and Node, since those like to be
# We switch to the user to install Rust and Node, since those like to be
# installed at the user level.
USER ${DOCKER_USER}

View File

@@ -1,19 +0,0 @@
#!/bin/bash
# Builds the nodejs module for manylinux. Invoked by ci/build_linux_artifacts_nodejs.sh.
set -e
ARCH=${1:-x86_64}
if [ "$ARCH" = "x86_64" ]; then
export OPENSSL_LIB_DIR=/usr/local/lib64/
else
export OPENSSL_LIB_DIR=/usr/local/lib/
fi
export OPENSSL_STATIC=1
export OPENSSL_INCLUDE_DIR=/usr/local/include/openssl
#Alpine doesn't have .bashrc
FILE=$HOME/.bashrc && test -f $FILE && source $FILE
cd nodejs
npm ci
npm run build-release

View File

@@ -4,14 +4,6 @@ set -e
ARCH=${1:-x86_64}
TARGET_TRIPLE=${2:-x86_64-unknown-linux-gnu}
if [ "$ARCH" = "x86_64" ]; then
export OPENSSL_LIB_DIR=/usr/local/lib64/
else
export OPENSSL_LIB_DIR=/usr/local/lib/
fi
export OPENSSL_STATIC=1
export OPENSSL_INCLUDE_DIR=/usr/local/include/openssl
#Alpine doesn't have .bashrc
FILE=$HOME/.bashrc && test -f $FILE && source $FILE

View File

@@ -1,26 +0,0 @@
#!/bin/bash
# Builds openssl from source so we can statically link to it
# this is to avoid the error we get with the system installation:
# /usr/bin/ld: <library>: version node not found for symbol SSLeay@@OPENSSL_1.0.1
# /usr/bin/ld: failed to set dynamic section sizes: Bad value
set -e
git clone -b OpenSSL_1_1_1v \
--single-branch \
https://github.com/openssl/openssl.git
pushd openssl
if [[ $1 == x86_64* ]]; then
ARCH=linux-x86_64
else
# gnu target
ARCH=linux-aarch64
fi
./Configure no-shared $ARCH
make
make install

41
ci/parse_requirements.py Normal file
View File

@@ -0,0 +1,41 @@
import argparse
import toml
def parse_dependencies(pyproject_path, extras=None):
with open(pyproject_path, "r") as file:
pyproject = toml.load(file)
dependencies = pyproject.get("project", {}).get("dependencies", [])
for dependency in dependencies:
print(dependency)
optional_dependencies = pyproject.get("project", {}).get(
"optional-dependencies", {}
)
if extras:
for extra in extras.split(","):
for dep in optional_dependencies.get(extra, []):
print(dep)
def main():
parser = argparse.ArgumentParser(
description="Generate requirements.txt from pyproject.toml"
)
parser.add_argument("path", type=str, help="Path to pyproject.toml")
parser.add_argument(
"--extras",
type=str,
help="Comma-separated list of extras to include",
default="",
)
args = parser.parse_args()
parse_dependencies(args.path, args.extras)
if __name__ == "__main__":
main()

174
ci/set_lance_version.py Normal file
View File

@@ -0,0 +1,174 @@
import argparse
import sys
import json
def run_command(command: str) -> str:
"""
Run a shell command and return stdout as a string.
If exit code is not 0, raise an exception with the stderr output.
"""
import subprocess
result = subprocess.run(command, shell=True, capture_output=True, text=True)
if result.returncode != 0:
raise Exception(f"Command failed with error: {result.stderr.strip()}")
return result.stdout.strip()
def get_latest_stable_version() -> str:
version_line = run_command("cargo info lance | grep '^version:'")
version = version_line.split(" ")[1].strip()
return version
def get_latest_preview_version() -> str:
lance_tags = run_command(
"git ls-remote --tags https://github.com/lancedb/lance.git | grep 'refs/tags/v[0-9beta.-]\\+$'"
).splitlines()
lance_tags = (
tag.split("refs/tags/")[1]
for tag in lance_tags
if "refs/tags/" in tag and "beta" in tag
)
from packaging.version import Version
latest = max(
(tag[1:] for tag in lance_tags if tag.startswith("v")), key=lambda t: Version(t)
)
return str(latest)
def extract_features(line: str) -> list:
"""
Extracts the features from a line in Cargo.toml.
Example: 'lance = { "version" = "=0.29.0", "features" = ["dynamodb"] }'
Returns: ['dynamodb']
"""
import re
match = re.search(r'"features"\s*=\s*\[(.*?)\]', line)
if match:
features_str = match.group(1)
return [f.strip('"') for f in features_str.split(",")]
return []
def update_cargo_toml(line_updater):
"""
Updates the Cargo.toml file by applying the line_updater function to each line.
The line_updater function should take a line as input and return the updated line.
"""
with open("Cargo.toml", "r") as f:
lines = f.readlines()
new_lines = []
for line in lines:
if line.startswith("lance"):
# Update the line using the provided function
new_lines.append(line_updater(line))
else:
# Keep the line unchanged
new_lines.append(line)
with open("Cargo.toml", "w") as f:
f.writelines(new_lines)
def set_stable_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", "features" = ["dynamodb"] }
lance-io = "=0.29.0"
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
if features:
return f'{package_name} = {{ "version" = "={version}", "features" = {json.dumps(features)} }}\n'
else:
return f'{package_name} = "={version}"\n'
update_cargo_toml(line_updater)
def set_preview_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", "features" = ["dynamodb"], tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
base_version = version.split("-")[0] # Get the base version without beta suffix
if features:
return f'{package_name} = {{ "version" = "={base_version}", "features" = {json.dumps(features)}, "tag" = "v{version}", "git" = "https://github.com/lancedb/lance.git" }}\n'
else:
return f'{package_name} = {{ "version" = "={base_version}", "tag" = "v{version}", "git" = "https://github.com/lancedb/lance.git" }}\n'
update_cargo_toml(line_updater)
def set_local_version():
"""
Sets lines to
lance = { path = "../lance/rust/lance", features = ["dynamodb"] }
lance-io = { path = "../lance/rust/lance-io" }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
if features:
return f'{package_name} = {{ "path" = "../lance/rust/{package_name}", "features" = {json.dumps(features)} }}\n'
else:
return f'{package_name} = {{ "path" = "../lance/rust/{package_name}" }}\n'
update_cargo_toml(line_updater)
parser = argparse.ArgumentParser(description="Set the version of the Lance package.")
parser.add_argument(
"version",
type=str,
help="The version to set for the Lance package. Use 'stable' for the latest stable version, 'preview' for latest preview version, or a specific version number (e.g., '0.1.0'). You can also specify 'local' to use a local path.",
)
args = parser.parse_args()
if args.version == "stable":
latest_stable_version = get_latest_stable_version()
print(
f"Found latest stable version: \033[1mv{latest_stable_version}\033[0m",
file=sys.stderr,
)
set_stable_version(latest_stable_version)
elif args.version == "preview":
latest_preview_version = get_latest_preview_version()
print(
f"Found latest preview version: \033[1mv{latest_preview_version}\033[0m",
file=sys.stderr,
)
set_preview_version(latest_preview_version)
elif args.version == "local":
set_local_version()
else:
# Parse the version number.
version = args.version
# Ignore initial v if present.
if version.startswith("v"):
version = version[1:]
if "beta" in version:
set_preview_version(version)
else:
set_stable_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)

30
ci/update_lockfiles.sh Executable file
View File

@@ -0,0 +1,30 @@
#!/usr/bin/env bash
set -euo pipefail
AMEND=false
for arg in "$@"; do
if [[ "$arg" == "--amend" ]]; then
AMEND=true
fi
done
# This updates the lockfile without building
cargo metadata --quiet > /dev/null
pushd nodejs || exit 1
npm install --package-lock-only --silent
popd
pushd node || exit 1
npm install --package-lock-only --silent
popd
if git diff --quiet --exit-code; then
echo "No lockfile changes to commit; skipping amend."
elif $AMEND; then
git add Cargo.lock nodejs/package-lock.json node/package-lock.json
git commit --amend --no-edit
else
git add Cargo.lock nodejs/package-lock.json node/package-lock.json
git commit -m "Update lockfiles"
fi

View File

@@ -2,7 +2,7 @@
LanceDB docs are deployed to https://lancedb.github.io/lancedb/.
Docs is built and deployed automatically by [Github Actions](.github/workflows/docs.yml)
Docs is built and deployed automatically by [Github Actions](../.github/workflows/docs.yml)
whenever a commit is pushed to the `main` branch. So it is possible for the docs to show
unreleased features.

View File

@@ -124,6 +124,9 @@ nav:
- Overview: hybrid_search/hybrid_search.md
- Comparing Rerankers: hybrid_search/eval.md
- Airbnb financial data example: notebooks/hybrid_search.ipynb
- Late interaction with MultiVector search:
- Overview: guides/multi-vector.md
- Example: notebooks/Multivector_on_LanceDB.ipynb
- RAG:
- Vanilla RAG: rag/vanilla_rag.md
- Multi-head RAG: rag/multi_head_rag.md
@@ -190,6 +193,7 @@ nav:
- Pandas and PyArrow: python/pandas_and_pyarrow.md
- Polars: python/polars_arrow.md
- DuckDB: python/duckdb.md
- Datafusion: python/datafusion.md
- LangChain:
- LangChain 🔗: integrations/langchain.md
- LangChain demo: notebooks/langchain_demo.ipynb
@@ -202,6 +206,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- 🎯 Examples:
- Overview: examples/index.md
- 🐍 Python:
@@ -233,13 +238,6 @@ nav:
- 👾 JavaScript (vectordb): javascript/modules.md
- 👾 JavaScript (lancedb): js/globals.md
- 🦀 Rust: https://docs.rs/lancedb/latest/lancedb/
- ☁️ LanceDB Cloud:
- Overview: cloud/index.md
- API reference:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
- Quick start: basic.md
- Concepts:
@@ -251,6 +249,7 @@ nav:
- Data management: concepts/data_management.md
- Guides:
- Working with tables: guides/tables.md
- Working with SQL: guides/sql_querying.md
- Building an ANN index: ann_indexes.md
- Vector Search: search.md
- Full-text search (native): fts.md
@@ -260,6 +259,9 @@ nav:
- Overview: hybrid_search/hybrid_search.md
- Comparing Rerankers: hybrid_search/eval.md
- Airbnb financial data example: notebooks/hybrid_search.ipynb
- Late interaction with MultiVector search:
- Overview: guides/multi-vector.md
- Document search Example: notebooks/Multivector_on_LanceDB.ipynb
- RAG:
- Vanilla RAG: rag/vanilla_rag.md
- Multi-head RAG: rag/multi_head_rag.md
@@ -324,6 +326,7 @@ nav:
- Pandas and PyArrow: python/pandas_and_pyarrow.md
- Polars: python/polars_arrow.md
- DuckDB: python/duckdb.md
- Datafusion: python/datafusion.md
- LangChain 🦜️🔗↗: integrations/langchain.md
- LangChain.js 🦜️🔗↗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
- LlamaIndex 🦙↗: integrations/llamaIndex.md
@@ -332,6 +335,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- Examples:
- examples/index.md
- 🐍 Python:
@@ -363,13 +367,6 @@ nav:
- Javascript (vectordb): javascript/modules.md
- Javascript (lancedb): js/globals.md
- Rust: https://docs.rs/lancedb/latest/lancedb/index.html
- LanceDB Cloud:
- Overview: cloud/index.md
- API reference:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
extra_css:
- styles/global.css
@@ -377,6 +374,7 @@ extra_css:
extra_javascript:
- "extra_js/init_ask_ai_widget.js"
- "extra_js/reo.js"
extra:
analytics:

View File

@@ -171,7 +171,7 @@ paths:
distance_type:
type: string
description: |
The distance metric to use for search. L2, Cosine, Dot and Hamming are supported. Default is L2.
The distance metric to use for search. l2, Cosine, Dot and Hamming are supported. Default is l2.
bypass_vector_index:
type: boolean
description: |
@@ -450,7 +450,7 @@ paths:
type: string
nullable: false
description: |
The metric type to use for the index. L2, Cosine, Dot are supported.
The metric type to use for the index. l2, Cosine, Dot are supported.
index_type:
type: string
responses:

View File

@@ -0,0 +1,5 @@
{% extends "base.html" %}
{% block announce %}
📚 Starting June 1st, 2025, please use <a href="https://lancedb.github.io/documentation" target="_blank" rel="noopener noreferrer">lancedb.github.io/documentation</a> for the latest docs.
{% endblock %}

View File

@@ -69,7 +69,7 @@ Lance supports `IVF_PQ` index type by default.
The following IVF_PQ paramters can be specified:
- **distance_type**: The distance metric to use. By default it uses euclidean distance "`L2`".
- **distance_type**: The distance metric to use. By default it uses euclidean distance "`l2`".
We also support "cosine" and "dot" distance as well.
- **num_partitions**: The number of partitions in the index. The default is the square root
of the number of rows.
@@ -291,7 +291,7 @@ Product quantization can lead to approximately `16 * sizeof(float32) / 1 = 64` t
`num_partitions` is used to decide how many partitions the first level `IVF` index uses.
Higher number of partitions could lead to more efficient I/O during queries and better accuracy, but it takes much more time to train.
On `SIFT-1M` dataset, our benchmark shows that keeping each partition 1K-4K rows lead to a good latency / recall.
On `SIFT-1M` dataset, our benchmark shows that keeping each partition 4K-8K rows lead to a good latency / recall.
`num_sub_vectors` specifies how many Product Quantization (PQ) short codes to generate on each vector. The number should be a factor of the vector dimension. Because
PQ is a lossy compression of the original vector, a higher `num_sub_vectors` usually results in

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

BIN
docs/src/assets/lancedb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -2,7 +2,7 @@
LanceDB Cloud is a SaaS (software-as-a-service) solution that runs serverless in the cloud, clearly separating storage from compute. It's designed to be highly scalable without breaking the bank. LanceDB Cloud is currently in private beta with general availability coming soon, but you can apply for early access with the private beta release by signing up below.
[Try out LanceDB Cloud](https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms){ .md-button .md-button--primary }
[Try out LanceDB Cloud (Public Beta)](https://cloud.lancedb.com){ .md-button .md-button--primary }
## Architecture

View File

@@ -59,7 +59,7 @@ Then the greedy search routine operates as follows:
There are three key parameters to set when constructing an HNSW index:
* `metric`: Use an `L2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `metric`: Use an `l2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `m`: The number of neighbors to select for each vector in the HNSW graph.
* `ef_construction`: The number of candidates to evaluate during the construction of the HNSW graph.

View File

@@ -47,7 +47,7 @@ We can combine the above concepts to understand how to build and query an IVF-PQ
There are three key parameters to set when constructing an IVF-PQ index:
* `metric`: Use an `L2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `metric`: Use an `l2` euclidean distance metric. We also support `dot` and `cosine` distance.
* `num_partitions`: The number of partitions in the IVF portion of the index.
* `num_sub_vectors`: The number of sub-vectors that will be created during Product Quantization (PQ).
@@ -56,7 +56,7 @@ In Python, the index can be created as follows:
```python
# Create and train the index for a 1536-dimensional vector
# Make sure you have enough data in the table for an effective training step
tbl.create_index(metric="L2", num_partitions=256, num_sub_vectors=96)
tbl.create_index(metric="l2", num_partitions=256, num_sub_vectors=96)
```
!!! note
`num_partitions`=256 and `num_sub_vectors`=96 does not work for every dataset. Those values needs to be adjusted for your particular dataset.

View File

@@ -54,7 +54,7 @@ As mentioned, after creating embedding, each data point is represented as a vect
Points that are close to each other in vector space are considered similar (or appear in similar contexts), and points that are far away are considered dissimilar. To quantify this closeness, we use distance as a metric which can be measured in the following way -
1. **Euclidean Distance (L2)**: It calculates the straight-line distance between two points (vectors) in a multidimensional space.
1. **Euclidean Distance (l2)**: It calculates the straight-line distance between two points (vectors) in a multidimensional space.
2. **Cosine Similarity**: It measures the cosine of the angle between two vectors, providing a normalized measure of similarity based on their direction.
3. **Dot product**: It is calculated as the sum of the products of their corresponding components. To measure relatedness it considers both the magnitude and direction of the vectors.

View File

@@ -8,15 +8,5 @@ LanceDB provides language APIs, allowing you to embed a database in your languag
* 👾 [JavaScript](examples_js.md) examples
* 🦀 Rust examples (coming soon)
## Python Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Ultralytics Explorer 🚀**<br>[![Ultralytics](https://img.shields.io/badge/Ultralytics-Docs-green?labelColor=0f3bc4&style=flat-square&logo=https://cdn.prod.website-files.com/646dd1f1a3703e451ba81ecc/64994922cf2a6385a4bf4489_UltralyticsYOLO_mark_blue.svg&link=https://docs.ultralytics.com/datasets/explorer/)](https://docs.ultralytics.com/datasets/explorer/)<br>[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb) | - 🔍 **Explore CV Datasets**: Semantic search, SQL queries, vector similarity, natural language.<br>- 🖥️ **GUI & Python API**: Seamless dataset interaction.<br>- ⚡ **Efficient & Scalable**: Leverages LanceDB for large datasets.<br>- 📊 **Detailed Analysis**: Easily analyze data patterns.<br>- 🌐 **Browser GUI Demo**: Create embeddings, search images, run queries. |
| **Website Chatbot🤖**<br>[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/lancedb/lancedb-vercel-chatbot)<br>[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Flancedb%2Flancedb-vercel-chatbot&amp;env=OPENAI_API_KEY&amp;envDescription=OpenAI%20API%20Key%20for%20chat%20completion.&amp;project-name=lancedb-vercel-chatbot&amp;repository-name=lancedb-vercel-chatbot&amp;demo-title=LanceDB%20Chatbot%20Demo&amp;demo-description=Demo%20website%20chatbot%20with%20LanceDB.&amp;demo-url=https%3A%2F%2Flancedb.vercel.app&amp;demo-image=https%3A%2F%2Fi.imgur.com%2FazVJtvr.png) | - 🌐 **Chatbot from Sitemap/Docs**: Create a chatbot using site or document context.<br>- 🚀 **Embed LanceDB in Next.js**: Lightweight, on-prem storage.<br>- 🧠 **AI-Powered Context Retrieval**: Efficiently access relevant data.<br>- 🔧 **Serverless & Native JS**: Seamless integration with Next.js.<br>- ⚡ **One-Click Deploy on Vercel**: Quick and easy setup.. |
## Nodejs Applications powered by LanceDB
| Project Name | Description |
| --- | --- |
| **Langchain Writing Assistant✍ **<br>[![Github](../assets/github.svg)](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/lanchain_writing_assistant) | - **📂 Data Source Integration**: Use your own data by specifying data source file, and the app instantly processes it to provide insights. <br>- **🧠 Intelligent Suggestions**: Powered by LangChain.js and LanceDB, it improves writing productivity and accuracy. <br>- **💡 Enhanced Writing Experience**: It delivers real-time contextual insights and factual suggestions while the user writes. |
!!! tip "Hosted LanceDB"
If you want S3 cost-efficiency and local performance via a simple serverless API, checkout **LanceDB Cloud**. For private deployments, high performance at extreme scale, or if you have strict security requirements, talk to us about **LanceDB Enterprise**. [Learn more](https://docs.lancedb.com/)

1
docs/src/extra_js/reo.js Normal file
View File

@@ -0,0 +1 @@
!function(){var e,t,n;e="9627b71b382d201",t=function(){Reo.init({clientID:"9627b71b382d201"})},(n=document.createElement("script")).src="https://static.reo.dev/"+e+"/reo.js",n.defer=!0,n.onload=t,document.head.appendChild(n)}();

View File

@@ -0,0 +1,85 @@
# Late interaction & MultiVector embedding type
Late interaction is a technique used in retrieval that calculates the relevance of a query to a document by comparing their multi-vector representations. The key difference between late interaction and other popular methods:
![late interaction vs other methods](https://raw.githubusercontent.com/lancedb/assets/b035a0ceb2c237734e0d393054c146d289792339/docs/assets/integration/colbert-blog-interaction.svg)
[ Illustration from https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter-in-search/]
<b>No interaction:</b> Refers to independently embedding the query and document, that are compared to calcualte similarity without any interaction between them. This is typically used in vector search operations.
<b>Partial interaction</b> Refers to a specific approach where the similarity computation happens primarily between query vectors and document vectors, without extensive interaction between individual components of each. An example of this is dual-encoder models like BERT.
<b>Early full interaction</b> Refers to techniques like cross-encoders that process query and docs in pairs with full interaction across various stages of encoding. This is a powerful, but relatively slower technique. Because it requires processing query and docs in pairs, doc embeddings can't be pre-computed for fast retrieval. This is why cross encoders are typically used as reranking models combined with vector search. Learn more about [LanceDB Reranking support](https://lancedb.github.io/lancedb/reranking/).
<b>Late interaction</b> Late interaction is a technique that calculates the doc and query similarity independently and then the interaction or evaluation happens during the retrieval process. This is typically used in retrieval models like ColBERT. Unlike early interaction, It allows speeding up the retrieval process without compromising the depth of semantic analysis.
## Internals of ColBERT
Let's take a look at the steps involved in performing late interaction based retrieval using ColBERT:
• ColBERT employs BERT-based encoders for both queries `(fQ)` and documents `(fD)`
• A single BERT model is shared between query and document encoders and special tokens distinguish input types: `[Q]` for queries and `[D]` for documents
**Query Encoder (fQ):**
• Query q is tokenized into WordPiece tokens: `q1, q2, ..., ql`. `[Q]` token is prepended right after BERT's `[CLS]` token
• If query length < Nq, it's padded with [MASK] tokens up to Nq.
The padded sequence goes through BERT's transformer architecture
Final embeddings are L2-normalized.
**Document Encoder (fD):**
Document d is tokenized into tokens `d1, d2, ..., dm`. `[D]` token is prepended after `[CLS]` token
Unlike queries, documents are NOT padded with `[MASK]` tokens
Document tokens are processed through BERT and the same linear layer
**Late Interaction:**
Late interaction estimates relevance score `S(q,d)` using embedding `Eq` and `Ed`. Late interaction happens after independent encoding
For each query embedding, maximum similarity is computed against all document embeddings
The similarity measure can be cosine similarity or squared L2 distance
**MaxSim Calculation:**
```
S(q,d) := Σ max(Eqi⋅EdjT)
i∈|Eq| j∈|Ed|
```
This finds the best matching document embedding for each query embedding
Captures relevance based on strongest local matches between contextual embeddings
## LanceDB MultiVector type
LanceDB supports multivector type, this is useful when you have multiple vectors for a single item (e.g. with ColBert and ColPali).
You can index on a column with multivector type and search on it, the query can be single vector or multiple vectors. For now, only cosine metric is supported for multivector search. The vector value type can be float16, float32 or float64. LanceDB integrateds [ConteXtualized Token Retriever(XTR)](https://arxiv.org/abs/2304.01982), which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.
```python
import lancedb
import numpy as np
import pyarrow as pa
db = lancedb.connect("data/multivector_demo")
schema = pa.schema(
[
pa.field("id", pa.int64()),
# float16, float32, and float64 are supported
pa.field("vector", pa.list_(pa.list_(pa.float32(), 256))),
]
)
data = [
{
"id": i,
"vector": np.random.random(size=(2, 256)).tolist(),
}
for i in range(1024)
]
tbl = db.create_table("my_table", data=data, schema=schema)
# only cosine similarity is supported for multi-vectors
tbl.create_index(metric="cosine")
# query with single vector
query = np.random.random(256).astype(np.float16)
tbl.search(query).to_arrow()
# query with multiple vectors
query = np.random.random(size=(2, 256))
tbl.search(query).to_arrow()
```
Find more about vector search in LanceDB [here](https://lancedb.github.io/lancedb/search/#multivector-type).

View File

@@ -0,0 +1,68 @@
You can use DuckDB and Apache Datafusion to query your LanceDB tables using SQL.
This guide will show how to query Lance tables them using both.
We will re-use the dataset [created previously](./pandas_and_pyarrow.md):
```python
import lancedb
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
table = db.create_table("pd_table", data=data)
```
## Querying a LanceDB Table with DuckDb
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to DuckDB through the Arrow compatibility layer.
To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query.
```python
import duckdb
arrow_table = table.to_lance()
duckdb.query("SELECT * FROM arrow_table")
```
```
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
```
## Querying a LanceDB Table with Apache Datafusion
Have the required imports before doing any querying.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-ffi-dataset"
```
Register the table created with the Datafusion session context.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic"
```
```
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
```

View File

@@ -342,7 +342,7 @@ For **read and write access**, LanceDB will need a policy such as:
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},
@@ -374,7 +374,7 @@ For **read-only access**, LanceDB will need a policy such as:
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},

View File

@@ -765,7 +765,10 @@ This can be used to update zero to all rows depending on how many rows match the
];
const tbl = await db.createTable("my_table", data)
await tbl.update({vector: [10, 10]}, { where: "x = 2"})
await tbl.update({
values: { vector: [10, 10] },
where: "x = 2"
});
```
=== "vectordb (deprecated)"
@@ -784,7 +787,10 @@ This can be used to update zero to all rows depending on how many rows match the
];
const tbl = await db.createTable("my_table", data)
await tbl.update({ where: "x = 2", values: {vector: [10, 10]} })
await tbl.update({
where: "x = 2",
values: { vector: [10, 10] }
});
```
#### Updating using a sql query

View File

@@ -4,6 +4,9 @@ LanceDB is an open-source vector database for AI that's designed to store, manag
Both the database and the underlying data format are designed from the ground up to be **easy-to-use**, **scalable** and **cost-effective**.
!!! tip "Hosted LanceDB"
If you want S3 cost-efficiency and local performance via a simple serverless API, checkout **LanceDB Cloud**. For private deployments, high performance at extreme scale, or if you have strict security requirements, talk to us about **LanceDB Enterprise**. [Learn more](https://docs.lancedb.com/)
![](assets/lancedb_and_lance.png)
## Truly multi-modal
@@ -20,7 +23,7 @@ LanceDB **OSS** is an **open-source**, batteries-included embedded vector databa
LanceDB **Cloud** is a SaaS (software-as-a-service) solution that runs serverless in the cloud, making the storage clearly separated from compute. It's designed to be cost-effective and highly scalable without breaking the bank. LanceDB Cloud is currently in private beta with general availability coming soon, but you can apply for early access with the private beta release by signing up below.
[Try out LanceDB Cloud](https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms){ .md-button .md-button--primary }
[Try out LanceDB Cloud (Public Beta) Now](https://cloud.lancedb.com){ .md-button .md-button--primary }
## Why use LanceDB?

View File

@@ -0,0 +1,183 @@
### genkitx-lancedb
This is a lancedb plugin for genkit framework. It allows you to use LanceDB for ingesting and rereiving data using genkit framework.
![integration-banner-genkit](https://github.com/user-attachments/assets/a6cc28af-98e9-4425-b87c-7ab139bd7893)
### Installation
```bash
pnpm install genkitx-lancedb
```
### Usage
Adding LanceDB plugin to your genkit instance.
```ts
import { lancedbIndexerRef, lancedb, lancedbRetrieverRef, WriteMode } from 'genkitx-lancedb';
import { textEmbedding004, vertexAI } from '@genkit-ai/vertexai';
import { gemini } from '@genkit-ai/vertexai';
import { z, genkit } from 'genkit';
import { Document } from 'genkit/retriever';
import { chunk } from 'llm-chunk';
import { readFile } from 'fs/promises';
import path from 'path';
import pdf from 'pdf-parse/lib/pdf-parse';
const ai = genkit({
plugins: [
// vertexAI provides the textEmbedding004 embedder
vertexAI(),
// the local vector store requires an embedder to translate from text to vector
lancedb([
{
dbUri: '.db', // optional lancedb uri, default to .db
tableName: 'table', // optional table name, default to table
embedder: textEmbedding004,
},
]),
],
});
```
You can run this app with the following command:
```bash
genkit start -- tsx --watch src/index.ts
```
This'll add LanceDB as a retriever and indexer to the genkit instance. You can see it in the GUI view
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/e752f7f4-785b-4797-a11e-72ab06a531b7" />
**Testing retrieval on a sample table**
Let's see the raw retrieval results
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/b8d356ed-8421-4790-8fc0-d6af563b9657" />
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
<img width="1417" alt="Screenshot 2025-05-11 at 7 21 18PM" src="https://github.com/user-attachments/assets/77429525-36e2-4da6-a694-e58c1cf9eb83" />
## Creating a custom RAG flow
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
### Creating custom indexer flows
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
```ts
export const menuPdfIndexer = lancedbIndexerRef({
// Using all defaults, for dbUri, tableName, and embedder, etc
});
const chunkingConfig = {
minLength: 1000,
maxLength: 2000,
splitter: 'sentence',
overlap: 100,
delimiters: '',
} as any;
async function extractTextFromPdf(filePath: string) {
const pdfFile = path.resolve(filePath);
const dataBuffer = await readFile(pdfFile);
const data = await pdf(dataBuffer);
return data.text;
}
export const indexMenu = ai.defineFlow(
{
name: 'indexMenu',
inputSchema: z.string().describe('PDF file path'),
outputSchema: z.void(),
},
async (filePath: string) => {
filePath = path.resolve(filePath);
// Read the pdf.
const pdfTxt = await ai.run('extract-text', () =>
extractTextFromPdf(filePath)
);
// Divide the pdf text into segments.
const chunks = await ai.run('chunk-it', async () =>
chunk(pdfTxt, chunkingConfig)
);
// Convert chunks of text into documents to store in the index.
const documents = chunks.map((text) => {
return Document.fromText(text, { filePath });
});
// Add documents to the index.
await ai.index({
indexer: menuPdfIndexer,
documents,
options: {
writeMode: WriteMode.Overwrite,
} as any
});
}
);
```
<img width="1316" alt="Screenshot 2025-05-11 at 8 35 56PM" src="https://github.com/user-attachments/assets/e2a20ce4-d1d0-4fa2-9a84-f2cc26e3a29f" />
In your console, you can see the logs
<img width="511" alt="Screenshot 2025-05-11 at 7 19 14PM" src="https://github.com/user-attachments/assets/243f26c5-ed38-40b6-b661-002f40f0423a" />
### Creating custom retriever flows
You can also create custom retriever flows, utilizing more options and features provided by LanceDB.
```ts
export const menuRetriever = lancedbRetrieverRef({
tableName: "table", // Use the same table name as the indexer.
displayName: "Menu", // Use a custom display name.
export const menuQAFlow = ai.defineFlow(
{ name: "Menu", inputSchema: z.string(), outputSchema: z.string() },
async (input: string) => {
// retrieve relevant documents
const docs = await ai.retrieve({
retriever: menuRetriever,
query: input,
options: {
k: 3,
},
});
const extractedContent = docs.map(doc => {
if (doc.content && Array.isArray(doc.content) && doc.content.length > 0) {
if (doc.content[0].media && doc.content[0].media.url) {
return doc.content[0].media.url;
}
}
return "No content found";
});
console.log("Extracted content:", extractedContent);
const { text } = await ai.generate({
model: gemini('gemini-2.0-flash'),
prompt: `
You are acting as a helpful AI assistant that can answer
questions about the food available on the menu at Genkit Grub Pub.
Use only the context provided to answer the question.
If you don't know, do not make up an answer.
Do not add or change items on the menu.
Context:
${extractedContent.join('\n\n')}
Question: ${input}`,
docs,
});
return text;
}
);
```
Now using our retrieval flow, we can ask question about the ingsted PDF
<img width="1306" alt="Screenshot 2025-05-11 at 7 18 45PM" src="https://github.com/user-attachments/assets/86c66b13-7c12-4d5f-9d81-ae36bfb1c346" />

View File

@@ -108,7 +108,7 @@ This method creates a scalar(for non-vector cols) or a vector index on a table.
|:---|:---|:---|:---|
|`vector_col`|`Optional[str]`| Provide if you want to create index on a vector column. |`None`|
|`col_name`|`Optional[str]`| Provide if you want to create index on a non-vector column. |`None`|
|`metric`|`Optional[str]` |Provide the metric to use for vector index. choice of metrics: 'L2', 'dot', 'cosine'. |`L2`|
|`metric`|`Optional[str]` |Provide the metric to use for vector index. choice of metrics: 'l2', 'dot', 'cosine'. |`l2`|
|`num_partitions`|`Optional[int]`|Number of partitions to use for the index.|`256`|
|`num_sub_vectors`|`Optional[int]` |Number of sub-vectors to use for the index.|`96`|
|`index_cache_size`|`Optional[int]` |Size of the index cache.|`None`|

View File

@@ -125,7 +125,7 @@ The exhaustive list of parameters for `LanceDBVectorStore` vector store are :
```
- **_table_exists(self, tbl_name: `Optional[str]` = `None`) -> `bool`** : Returns `True` if `tbl_name` exists in database.
- __create_index(
self, scalar: `Optional[bool]` = False, col_name: `Optional[str]` = None, num_partitions: `Optional[int]` = 256, num_sub_vectors: `Optional[int]` = 96, index_cache_size: `Optional[int]` = None, metric: `Optional[str]` = "L2",
self, scalar: `Optional[bool]` = False, col_name: `Optional[str]` = None, num_partitions: `Optional[int]` = 256, num_sub_vectors: `Optional[int]` = 96, index_cache_size: `Optional[int]` = None, metric: `Optional[str]` = "l2",
) -> `None`__ : Creates a scalar(for non-vector cols) or a vector index on a table.
Make sure your vector column has enough data before creating an index on it.

View File

@@ -10,7 +10,7 @@ Distance metrics type.
- [Cosine](MetricType.md#cosine)
- [Dot](MetricType.md#dot)
- [L2](MetricType.md#l2)
- [l2](MetricType.md#l2)
## Enumeration Members

View File

@@ -85,7 +85,7 @@ ___
`Optional` **metric\_type**: [`MetricType`](../enums/MetricType.md)
Metric type, L2 or Cosine
Metric type, l2 or Cosine
#### Defined in

View File

@@ -15,11 +15,9 @@ npm install @lancedb/lancedb
This will download the appropriate native library for your platform. We currently
support:
- Linux (x86_64 and aarch64)
- Linux (x86_64 and aarch64 on glibc and musl)
- MacOS (Intel and ARM/M1/M2)
- Windows (x86_64 only)
We do not yet support musl-based Linux (such as Alpine Linux) or aarch64 Windows.
- Windows (x86_64 and aarch64)
## Usage

View File

@@ -0,0 +1,53 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BooleanQuery
# Class: BooleanQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BooleanQuery()
```ts
new BooleanQuery(queries): BooleanQuery
```
Creates an instance of BooleanQuery.
#### Parameters
* **queries**: [[`Occur`](../enumerations/Occur.md), [`FullTextQuery`](../interfaces/FullTextQuery.md)][]
An array of (Occur, FullTextQuery objects) to combine.
Occur specifies whether the query must match, or should match.
#### Returns
[`BooleanQuery`](BooleanQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -0,0 +1,67 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BoostQuery
# Class: BoostQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BoostQuery()
```ts
new BoostQuery(
positive,
negative,
options?): BoostQuery
```
Creates an instance of BoostQuery.
The boost returns documents that match the positive query,
but penalizes those that match the negative query.
the penalty is controlled by the `negativeBoost` parameter.
#### Parameters
* **positive**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The positive query that boosts the relevance score.
* **negative**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The negative query that reduces the relevance score.
* **options?**
Optional parameters for the boost query.
- `negativeBoost`: The boost factor for the negative query (default is 0.0).
* **options.negativeBoost?**: `number`
#### Returns
[`BoostQuery`](BoostQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -126,6 +126,37 @@ the vectors.
***
### ivfFlat()
```ts
static ivfFlat(options?): Index
```
Create an IvfFlat index
This index groups vectors into partitions of similar vectors. Each partition keeps track of
a centroid which is the average value of all vectors in the group.
During a query the centroids are compared with the query vector to find the closest
partitions. The vectors in these partitions are then searched to find
the closest vectors.
The partitioning process is called IVF and the `num_partitions` parameter controls how
many groups to create.
Note that training an IVF FLAT index on a large dataset is a slow operation and
currently is also a memory intensive operation.
#### Parameters
* **options?**: `Partial`&lt;[`IvfFlatOptions`](../interfaces/IvfFlatOptions.md)&gt;
#### Returns
[`Index`](Index.md)
***
### ivfPq()
```ts

View File

@@ -0,0 +1,73 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MatchQuery
# Class: MatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MatchQuery()
```ts
new MatchQuery(
query,
column,
options?): MatchQuery
```
Creates an instance of MatchQuery.
#### Parameters
* **query**: `string`
The text query to search for.
* **column**: `string`
The name of the column to search within.
* **options?**
Optional parameters for the match query.
- `boost`: The boost factor for the query (default is 1.0).
- `fuzziness`: The fuzziness level for the query (default is 0).
- `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boost?**: `number`
* **options.fuzziness?**: `number`
* **options.maxExpansions?**: `number`
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MatchQuery`](MatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -33,20 +33,22 @@ Construct a MergeInsertBuilder. __Internal use only.__
### execute()
```ts
execute(data): Promise<void>
execute(data, execOptions?): Promise<MergeResult>
```
Executes the merge insert operation
Nothing is returned but the `Table` is updated
#### Parameters
* **data**: [`Data`](../type-aliases/Data.md)
* **execOptions?**: `Partial`&lt;[`WriteExecutionOptions`](../interfaces/WriteExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`MergeResult`](../interfaces/MergeResult.md)&gt;
the merge result
***

View File

@@ -0,0 +1,67 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MultiMatchQuery
# Class: MultiMatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MultiMatchQuery()
```ts
new MultiMatchQuery(
query,
columns,
options?): MultiMatchQuery
```
Creates an instance of MultiMatchQuery.
#### Parameters
* **query**: `string`
The text query to search for across multiple columns.
* **columns**: `string`[]
An array of column names to search within.
* **options?**
Optional parameters for the multi-match query.
- `boosts`: An array of boost factors for each column (default is 1.0 for all).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boosts?**: `number`[]
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -0,0 +1,64 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / PhraseQuery
# Class: PhraseQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new PhraseQuery()
```ts
new PhraseQuery(
query,
column,
options?): PhraseQuery
```
Creates an instance of `PhraseQuery`.
#### Parameters
* **query**: `string`
The phrase to search for in the specified column.
* **column**: `string`
The name of the column to search within.
* **options?**
Optional parameters for the phrase query.
- `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
* **options.slop?**: `number`
#### Returns
[`PhraseQuery`](PhraseQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -30,6 +30,53 @@ protected inner: Query | Promise<Query>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### execute()
```ts
@@ -159,7 +206,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;
@@ -262,7 +309,7 @@ nearestToText(query, columns?): Query
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **columns?**: `string`[]

View File

@@ -36,6 +36,49 @@ protected inner: NativeQueryType | Promise<NativeQueryType>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
***
### execute()
```ts
@@ -149,7 +192,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -40,7 +40,7 @@ Returns the name of the table
### add()
```ts
abstract add(data, options?): Promise<void>
abstract add(data, options?): Promise<AddResult>
```
Insert records into this Table.
@@ -54,14 +54,17 @@ Insert records into this Table.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AddResult`](../interfaces/AddResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table
***
### addColumns()
```ts
abstract addColumns(newColumnTransforms): Promise<void>
abstract addColumns(newColumnTransforms): Promise<AddColumnsResult>
```
Add new columns with defined values.
@@ -76,14 +79,17 @@ Add new columns with defined values.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AddColumnsResult`](../interfaces/AddColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after adding the columns.
***
### alterColumns()
```ts
abstract alterColumns(columnAlterations): Promise<void>
abstract alterColumns(columnAlterations): Promise<AlterColumnsResult>
```
Alter the name or nullability of columns.
@@ -96,7 +102,10 @@ Alter the name or nullability of columns.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AlterColumnsResult`](../interfaces/AlterColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after altering the columns.
***
@@ -117,8 +126,8 @@ wish to return to standard mode, call `checkoutLatest`.
#### Parameters
* **version**: `number`
The version to checkout
* **version**: `string` \| `number`
The version to checkout, could be version number or tag
#### Returns
@@ -252,7 +261,7 @@ await table.createIndex("my_float_col");
### delete()
```ts
abstract delete(predicate): Promise<void>
abstract delete(predicate): Promise<DeleteResult>
```
Delete the rows that satisfy the predicate.
@@ -263,7 +272,10 @@ Delete the rows that satisfy the predicate.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`DeleteResult`](../interfaces/DeleteResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table
***
@@ -284,7 +296,7 @@ Return a brief description of the table
### dropColumns()
```ts
abstract dropColumns(columnNames): Promise<void>
abstract dropColumns(columnNames): Promise<DropColumnsResult>
```
Drop one or more columns from the dataset
@@ -303,7 +315,10 @@ then call ``cleanup_files`` to remove the old files.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`DropColumnsResult`](../interfaces/DropColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after dropping the columns.
***
@@ -454,6 +469,28 @@ Modeled after ``VACUUM`` in PostgreSQL.
***
### prewarmIndex()
```ts
abstract prewarmIndex(name): Promise<void>
```
Prewarm an index in the table.
#### Parameters
* **name**: `string`
The name of the index.
This will load the index into memory. This may reduce the cold-start time for
future queries. If the index does not fit in the cache then this call may be
wasteful.
#### Returns
`Promise`&lt;`void`&gt;
***
### query()
```ts
@@ -575,7 +612,7 @@ of the given query
#### Parameters
* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md)
* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md) \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
the query, a vector or string
* **queryType?**: `string`
@@ -593,6 +630,50 @@ of the given query
***
### stats()
```ts
abstract stats(): Promise<TableStatistics>
```
Returns table and fragment statistics
#### Returns
`Promise`&lt;[`TableStatistics`](../interfaces/TableStatistics.md)&gt;
The table and fragment statistics
***
### tags()
```ts
abstract tags(): Promise<Tags>
```
Get a tags manager for this table.
Tags allow you to label specific versions of a table with a human-readable name.
The returned tags manager can be used to list, create, update, or delete tags.
#### Returns
`Promise`&lt;[`Tags`](Tags.md)&gt;
A tags manager for this table
#### Example
```typescript
const tagsManager = await table.tags();
await tagsManager.create("v1", 1);
const tags = await tagsManager.list();
console.log(tags); // { "v1": { version: 1, manifestSize: ... } }
```
***
### toArrow()
```ts
@@ -612,7 +693,7 @@ Return the table as an arrow table
#### update(opts)
```ts
abstract update(opts): Promise<void>
abstract update(opts): Promise<UpdateResult>
```
Update existing records in the Table
@@ -623,7 +704,10 @@ Update existing records in the Table
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object containing
the number of rows updated and the new version number
##### Example
@@ -634,7 +718,7 @@ table.update({where:"x = 2", values:{"vector": [10, 10]}})
#### update(opts)
```ts
abstract update(opts): Promise<void>
abstract update(opts): Promise<UpdateResult>
```
Update existing records in the Table
@@ -645,7 +729,10 @@ Update existing records in the Table
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object containing
the number of rows updated and the new version number
##### Example
@@ -656,7 +743,7 @@ table.update({where:"x = 2", valuesSql:{"x": "x + 1"}})
#### update(updates, options)
```ts
abstract update(updates, options?): Promise<void>
abstract update(updates, options?): Promise<UpdateResult>
```
Update existing records in the Table
@@ -679,10 +766,6 @@ repeatedly calilng this method.
* **updates**: `Record`&lt;`string`, `string`&gt; \| `Map`&lt;`string`, `string`&gt;
the
columns to update
Keys in the map should specify the name of the column to update.
Values in the map provide the new value of the column. These can
be SQL literal strings (e.g. "7" or "'foo'") or they can be expressions
based on the row being updated (e.g. "my_col + 1")
* **options?**: `Partial`&lt;[`UpdateOptions`](../interfaces/UpdateOptions.md)&gt;
additional options to control
@@ -690,7 +773,15 @@ repeatedly calilng this method.
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object
containing the number of rows updated and the new version number
Keys in the map should specify the name of the column to update.
Values in the map provide the new value of the column. These can
be SQL literal strings (e.g. "7" or "'foo'") or they can be expressions
based on the row being updated (e.g. "my_col + 1")
***
@@ -731,3 +822,26 @@ Retrieve the version of the table
#### Returns
`Promise`&lt;`number`&gt;
***
### waitForIndex()
```ts
abstract waitForIndex(indexNames, timeoutSeconds): Promise<void>
```
Waits for asynchronous indexing to complete on the table.
#### Parameters
* **indexNames**: `string`[]
The name of the indices to wait for
* **timeoutSeconds**: `number`
The number of seconds to wait before timing out
This will raise an error if the indices are not created and fully indexed within the timeout.
#### Returns
`Promise`&lt;`void`&gt;

View File

@@ -0,0 +1,35 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / TagContents
# Class: TagContents
## Constructors
### new TagContents()
```ts
new TagContents(): TagContents
```
#### Returns
[`TagContents`](TagContents.md)
## Properties
### manifestSize
```ts
manifestSize: number;
```
***
### version
```ts
version: number;
```

View File

@@ -0,0 +1,99 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Tags
# Class: Tags
## Constructors
### new Tags()
```ts
new Tags(): Tags
```
#### Returns
[`Tags`](Tags.md)
## Methods
### create()
```ts
create(tag, version): Promise<void>
```
#### Parameters
* **tag**: `string`
* **version**: `number`
#### Returns
`Promise`&lt;`void`&gt;
***
### delete()
```ts
delete(tag): Promise<void>
```
#### Parameters
* **tag**: `string`
#### Returns
`Promise`&lt;`void`&gt;
***
### getVersion()
```ts
getVersion(tag): Promise<number>
```
#### Parameters
* **tag**: `string`
#### Returns
`Promise`&lt;`number`&gt;
***
### list()
```ts
list(): Promise<Record<string, TagContents>>
```
#### Returns
`Promise`&lt;`Record`&lt;`string`, [`TagContents`](TagContents.md)&gt;&gt;
***
### update()
```ts
update(tag, version): Promise<void>
```
#### Parameters
* **tag**: `string`
* **version**: `number`
#### Returns
`Promise`&lt;`void`&gt;

View File

@@ -48,6 +48,53 @@ addQueryVector(vector): VectorQuery
***
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### bypassVectorIndex()
```ts
@@ -300,7 +347,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -0,0 +1,54 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQueryType
# Enumeration: FullTextQueryType
Enum representing the types of full-text queries supported.
- `Match`: Performs a full-text search for terms in the query string.
- `MatchPhrase`: Searches for an exact phrase match in the text.
- `Boost`: Boosts the relevance score of specific terms in the query.
- `MultiMatch`: Searches across multiple fields for the query terms.
## Enumeration Members
### Boolean
```ts
Boolean: "boolean";
```
***
### Boost
```ts
Boost: "boost";
```
***
### Match
```ts
Match: "match";
```
***
### MatchPhrase
```ts
MatchPhrase: "match_phrase";
```
***
### MultiMatch
```ts
MultiMatch: "multi_match";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Occur
# Enumeration: Occur
Enum representing the occurrence of terms in full-text queries.
- `Must`: The term must be present in the document.
- `Should`: The term should contribute to the document score, but is not required.
## Enumeration Members
### Must
```ts
Must: "MUST";
```
***
### Should
```ts
Should: "SHOULD";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Operator
# Enumeration: Operator
Enum representing the logical operators used in full-text queries.
- `And`: All terms must match.
- `Or`: At least one term must match.
## Enumeration Members
### And
```ts
And: "AND";
```
***
### Or
```ts
Or: "OR";
```

View File

@@ -0,0 +1,19 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / packBits
# Function: packBits()
```ts
function packBits(data): number[]
```
## Parameters
* **data**: `number`[]
## Returns
`number`[]

View File

@@ -9,37 +9,60 @@
- [embedding](namespaces/embedding/README.md)
- [rerankers](namespaces/rerankers/README.md)
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
## Classes
- [BooleanQuery](classes/BooleanQuery.md)
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)
- [MakeArrowTableOptions](classes/MakeArrowTableOptions.md)
- [MatchQuery](classes/MatchQuery.md)
- [MergeInsertBuilder](classes/MergeInsertBuilder.md)
- [MultiMatchQuery](classes/MultiMatchQuery.md)
- [PhraseQuery](classes/PhraseQuery.md)
- [Query](classes/Query.md)
- [QueryBase](classes/QueryBase.md)
- [RecordBatchIterator](classes/RecordBatchIterator.md)
- [Table](classes/Table.md)
- [TagContents](classes/TagContents.md)
- [Tags](classes/Tags.md)
- [VectorColumnOptions](classes/VectorColumnOptions.md)
- [VectorQuery](classes/VectorQuery.md)
## Interfaces
- [AddColumnsResult](interfaces/AddColumnsResult.md)
- [AddColumnsSql](interfaces/AddColumnsSql.md)
- [AddDataOptions](interfaces/AddDataOptions.md)
- [AddResult](interfaces/AddResult.md)
- [AlterColumnsResult](interfaces/AlterColumnsResult.md)
- [ClientConfig](interfaces/ClientConfig.md)
- [ColumnAlteration](interfaces/ColumnAlteration.md)
- [CompactionStats](interfaces/CompactionStats.md)
- [ConnectionOptions](interfaces/ConnectionOptions.md)
- [CreateTableOptions](interfaces/CreateTableOptions.md)
- [DeleteResult](interfaces/DeleteResult.md)
- [DropColumnsResult](interfaces/DropColumnsResult.md)
- [ExecutableQuery](interfaces/ExecutableQuery.md)
- [FragmentStatistics](interfaces/FragmentStatistics.md)
- [FragmentSummaryStats](interfaces/FragmentSummaryStats.md)
- [FtsOptions](interfaces/FtsOptions.md)
- [FullTextQuery](interfaces/FullTextQuery.md)
- [FullTextSearchOptions](interfaces/FullTextSearchOptions.md)
- [HnswPqOptions](interfaces/HnswPqOptions.md)
- [HnswSqOptions](interfaces/HnswSqOptions.md)
- [IndexConfig](interfaces/IndexConfig.md)
- [IndexOptions](interfaces/IndexOptions.md)
- [IndexStatistics](interfaces/IndexStatistics.md)
- [IvfFlatOptions](interfaces/IvfFlatOptions.md)
- [IvfPqOptions](interfaces/IvfPqOptions.md)
- [MergeResult](interfaces/MergeResult.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md)
- [OptimizeStats](interfaces/OptimizeStats.md)
@@ -47,9 +70,12 @@
- [RemovalStats](interfaces/RemovalStats.md)
- [RetryConfig](interfaces/RetryConfig.md)
- [TableNamesOptions](interfaces/TableNamesOptions.md)
- [TableStatistics](interfaces/TableStatistics.md)
- [TimeoutConfig](interfaces/TimeoutConfig.md)
- [UpdateOptions](interfaces/UpdateOptions.md)
- [UpdateResult](interfaces/UpdateResult.md)
- [Version](interfaces/Version.md)
- [WriteExecutionOptions](interfaces/WriteExecutionOptions.md)
## Type Aliases
@@ -66,3 +92,4 @@
- [connect](functions/connect.md)
- [makeArrowTable](functions/makeArrowTable.md)
- [packBits](functions/packBits.md)

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AddColumnsResult
# Interface: AddColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AddResult
# Interface: AddResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AlterColumnsResult
# Interface: AlterColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -16,7 +16,7 @@ must be provided.
### dataType?
```ts
optional dataType: string;
optional dataType: string | DataType<Type, any>;
```
A new data type for the column. If not provided then the data type will not be changed.

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DeleteResult
# Interface: DeleteResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DropColumnsResult
# Interface: DropColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,37 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FragmentStatistics
# Interface: FragmentStatistics
## Properties
### lengths
```ts
lengths: FragmentSummaryStats;
```
Statistics on the number of rows in the table fragments
***
### numFragments
```ts
numFragments: number;
```
The number of fragments in the table
***
### numSmallFragments
```ts
numSmallFragments: number;
```
The number of uncompacted fragments in the table

View File

@@ -0,0 +1,77 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FragmentSummaryStats
# Interface: FragmentSummaryStats
## Properties
### max
```ts
max: number;
```
The number of rows in the fragment with the most rows
***
### mean
```ts
mean: number;
```
The mean number of rows in the fragments
***
### min
```ts
min: number;
```
The number of rows in the fragment with the fewest rows
***
### p25
```ts
p25: number;
```
The 25th percentile of number of rows in the fragments
***
### p50
```ts
p50: number;
```
The 50th percentile of number of rows in the fragments
***
### p75
```ts
p75: number;
```
The 75th percentile of number of rows in the fragments
***
### p99
```ts
p99: number;
```
The 99th percentile of number of rows in the fragments

View File

@@ -0,0 +1,25 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQuery
# Interface: FullTextQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)

View File

@@ -24,18 +24,18 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -24,18 +24,18 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -39,3 +39,11 @@ and the same name, then an error will be returned. This is true even if
that index is out of date.
The default is true
***
### waitTimeoutSeconds?
```ts
optional waitTimeoutSeconds: number;
```

View File

@@ -30,6 +30,17 @@ The type of the index
***
### loss?
```ts
optional loss: number;
```
The KMeans loss value of the index,
it is only present for vector indices.
***
### numIndexedRows
```ts

View File

@@ -0,0 +1,112 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / IvfFlatOptions
# Interface: IvfFlatOptions
Options to create an `IVF_FLAT` index
## Properties
### distanceType?
```ts
optional distanceType: "l2" | "cosine" | "dot" | "hamming";
```
Distance type to use to build the index.
Default value is "l2".
This is used when training the index to calculate the IVF partitions
(vectors are grouped in partitions with similar vectors according to this
distance type).
The distance type used to train an index MUST match the distance type used
to search the index. Failure to do so will yield inaccurate results.
The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
Note: the cosine distance is undefined when one (or both) of the vectors
are all zeros (there is no direction). These vectors are invalid and may
never be returned from a vector search.
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
l2 norm is 1), then dot distance is equivalent to the cosine distance.
"hamming" - Hamming distance. Hamming distance is a distance metric
calculated from the number of bits that are different between two vectors.
Hamming distance has a range of [0, dimension]. Note that the hamming distance
is only valid for binary vectors.
***
### maxIterations?
```ts
optional maxIterations: number;
```
Max iteration to train IVF kmeans.
When training an IVF FLAT index we use kmeans to calculate the partitions. This parameter
controls how many iterations of kmeans to run.
Increasing this might improve the quality of the index but in most cases these extra
iterations have diminishing returns.
The default value is 50.
***
### numPartitions?
```ts
optional numPartitions: number;
```
The number of IVF partitions to create.
This value should generally scale with the number of rows in the dataset.
By default the number of partitions is the square root of the number of
rows.
If this value is too large then the first part of the search (picking the
right partition) will be slow. If this value is too small then the second
part of the search (searching within a partition) will be slow.
***
### sampleRate?
```ts
optional sampleRate: number;
```
The number of vectors, per partition, to sample when training IVF kmeans.
When an IVF FLAT index is trained, we need to calculate partitions. These are groups
of vectors that are similar to each other. To do this we use an algorithm called kmeans.
Running kmeans on a large dataset can be slow. To speed this up we run kmeans on a
random sample of the data. This parameter controls the size of the sample. The total
number of vectors used to train the index is `sample_rate * num_partitions`.
Increasing this value might improve the quality of the index but in most cases the
default should be sufficient.
The default value is 256.

View File

@@ -31,13 +31,13 @@ The following distance types are available:
"l2" - Euclidean distance. This is a very common distance metric that
accounts for both magnitude and direction when determining the distance
between vectors. L2 distance has a range of [0, ∞).
between vectors. l2 distance has a range of [0, ∞).
"cosine" - Cosine distance. Cosine distance is a distance metric
calculated from the cosine similarity between two vectors. Cosine
similarity is a measure of similarity between two non-zero vectors of an
inner product space. It is defined to equal the cosine of the angle
between them. Unlike L2, the cosine distance is not affected by the
between them. Unlike l2, the cosine distance is not affected by the
magnitude of the vectors. Cosine distance has a range of [0, 2].
Note: the cosine distance is undefined when one (or both) of the vectors
@@ -46,7 +46,7 @@ never be returned from a vector search.
"dot" - Dot product. Dot distance is the dot product of two vectors. Dot
distance has a range of (-∞, ∞). If the vectors are normalized (i.e. their
L2 norm is 1), then dot distance is equivalent to the cosine distance.
l2 norm is 1), then dot distance is equivalent to the cosine distance.
***

View File

@@ -0,0 +1,39 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MergeResult
# Interface: MergeResult
## Properties
### numDeletedRows
```ts
numDeletedRows: number;
```
***
### numInsertedRows
```ts
numInsertedRows: number;
```
***
### numUpdatedRows
```ts
numUpdatedRows: number;
```
***
### version
```ts
version: number;
```

View File

@@ -20,3 +20,13 @@ The maximum number of rows to return in a single batch
Batches may have fewer rows if the underlying data is stored
in smaller chunks.
***
### timeoutMs?
```ts
optional timeoutMs: number;
```
Timeout for query execution in milliseconds

View File

@@ -0,0 +1,47 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / TableStatistics
# Interface: TableStatistics
## Properties
### fragmentStats
```ts
fragmentStats: FragmentStatistics;
```
Statistics on table fragments
***
### numIndices
```ts
numIndices: number;
```
The number of indices in the table
***
### numRows
```ts
numRows: number;
```
The number of rows in the table
***
### totalBytes
```ts
totalBytes: number;
```
The total number of bytes in the table

View File

@@ -0,0 +1,23 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / UpdateResult
# Interface: UpdateResult
## Properties
### rowsUpdated
```ts
rowsUpdated: number;
```
***
### version
```ts
version: number;
```

View File

@@ -0,0 +1,26 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / WriteExecutionOptions
# Interface: WriteExecutionOptions
## Properties
### timeoutMs?
```ts
optional timeoutMs: number;
```
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including the first.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,53 @@
# Apache Datafusion
In Python, LanceDB tables can also be queried with [Apache Datafusion](https://datafusion.apache.org/), an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. This means you can write complex SQL queries to analyze your data in LanceDB.
This integration is done via [Datafusion FFI](https://docs.rs/datafusion-ffi/latest/datafusion_ffi/), which provides a native integration between LanceDB and Datafusion.
The Datafusion FFI allows to pass down column selections and basic filters to LanceDB, reducing the amount of scanned data when executing your query. Additionally, the integration allows streaming data from LanceDB tables which allows to do aggregation larger-than-memory.
We can demonstrate this by first installing `datafusion` and `lancedb`.
```shell
pip install datafusion lancedb
```
We will re-use the dataset [created previously](./pandas_and_pyarrow.md):
```python
import lancedb
from datafusion import SessionContext
from lance import FFILanceTableProvider
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
lance_table = db.create_table("lance_table", data)
ctx = SessionContext()
ffi_lance_table = FFILanceTableProvider(
lance_table.to_lance(), with_row_id=True, with_row_addr=True
)
ctx.register_table_provider("ffi_lance_table", ffi_lance_table)
```
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to Datafusion through the Datafusion FFI integration layer.
To query the resulting Lance dataset in Datafusion, you first need to register the dataset with Datafusion and then just reference it by the same name in your SQL query.
```python
ctx.table("ffi_lance_table")
ctx.sql("SELECT * FROM ffi_lance_table")
```
```
┌─────────────┬─────────┬────────┬─────────────────┬─────────────────┐
│ vector │ item │ price │ _rowid │ _rowaddr │
│ float[] │ varchar │ double │ bigint unsigned │ bigint unsigned │
├─────────────┼─────────┼────────┼─────────────────┼─────────────────┤
│ [3.1, 4.1] │ foo │ 10.0 │ 0 │ 0 │
│ [5.9, 26.5] │ bar │ 20.0 │ 1 │ 1 │
└─────────────┴─────────┴────────┴─────────────────┴─────────────────┘
```

View File

@@ -59,8 +59,6 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.embeddings.open_clip.OpenClipEmbeddings
::: lancedb.embeddings.utils.with_embeddings
## Context
::: lancedb.context.contextualize

View File

@@ -15,7 +15,7 @@ Currently, LanceDB supports the following metrics:
| Metric | Description |
| --------- | --------------------------------------------------------------------------- |
| `l2` | [Euclidean / L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
| `l2` | [Euclidean / l2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
| `cosine` | [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) |
| `dot` | [Dot Production](https://en.wikipedia.org/wiki/Dot_product) |
| `hamming` | [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) |
@@ -138,6 +138,19 @@ LanceDB supports binary vectors as a data type, and has the ability to search bi
--8<-- "python/python/tests/docs/test_binary_vector.py:async_binary_vector"
```
=== "TypeScript"
```ts
--8<-- "nodejs/examples/search.test.ts:import"
--8<-- "nodejs/examples/search.test.ts:import_bin_util"
--8<-- "nodejs/examples/search.test.ts:ingest_binary_data"
--8<-- "nodejs/examples/search.test.ts:search_binary_data"
```
## Multivector type
LanceDB supports multivector type, this is useful when you have multiple vectors for a single item (e.g. with ColBert and ColPali).

View File

@@ -7,7 +7,7 @@ performed on the top-k results returned by the vector search. However, pre-filte
option that performs the filter prior to vector search. This can be useful to narrow down
the search space of a very large dataset to reduce query latency.
Note that both pre-filtering and post-filtering can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to set `use_index=false` if you know that the filter is highly selective.
Note that both pre-filtering and post-filtering can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to call `bypass_vector_index()` if you know that the filter is highly selective.
Similarly, a highly selective post-filter can lead to false positives. Increasing both `nprobes` and `refine_factor` can mitigate this issue. When deciding between pre-filtering and post-filtering, pre-filtering is generally the safer choice if you're uncertain.

View File

@@ -8,6 +8,10 @@ For trouble shooting, the best place to ask is in our Discord, under the relevan
language channel. By asking in the language-specific channel, it makes it more
likely that someone who knows the answer will see your question.
## Common issues
* Multiprocessing with `fork` is not supported. You should use `spawn` instead.
## Enabling logging
To provide more information, especially for LanceDB Cloud related issues, enable
@@ -31,3 +35,9 @@ print the resolved query plan. You can use the `explain_plan` method to do this:
* Python Sync: [LanceQueryBuilder.explain_plan][lancedb.query.LanceQueryBuilder.explain_plan]
* Python Async: [AsyncQueryBase.explain_plan][lancedb.query.AsyncQueryBase.explain_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.explainPlan](/lancedb/js/classes/QueryBase/#explainplan)
To understand how a query was actually executed—including metrics like execution time, number of rows processed, I/O stats, and more—use the analyze_plan method. This executes the query and returns a physical execution plan annotated with runtime metrics, making it especially helpful for performance tuning and debugging.
* Python Sync: [LanceQueryBuilder.analyze_plan][lancedb.query.LanceQueryBuilder.analyze_plan]
* Python Async: [AsyncQueryBase.analyze_plan][lancedb.query.AsyncQueryBase.analyze_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.analyzePlan](/lancedb/js/classes/QueryBase/#analyzePlan)

View File

@@ -7,3 +7,4 @@ tantivy==0.20.1
--extra-index-url https://download.pytorch.org/whl/cpu
torch
polars>=0.19, <=1.3.0
datafusion

3
java/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
*.iml
.java-version

View File

@@ -8,13 +8,16 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.17.0-final.0</version>
<version>0.20.1-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>LanceDB Core</name>
<packaging>jar</packaging>
<properties>
<rust.release.build>false</rust.release.build>
</properties>
<dependencies>
<dependency>
@@ -68,7 +71,7 @@
</goals>
<configuration>
<path>lancedb-jni</path>
<release>true</release>
<release>${rust.release.build}</release>
<!-- Copy native libraries to target/classes for runtime access -->
<copyTo>${project.build.directory}/classes/nativelib</copyTo>
<copyWithPlatformDir>true</copyWithPlatformDir>

View File

@@ -1,16 +1,25 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import io.questdb.jar.jni.JarJniLoader;
import java.io.Closeable;
import java.util.List;
import java.util.Optional;
/**
* Represents LanceDB database.
*/
/** Represents LanceDB database. */
public class Connection implements Closeable {
static {
JarJniLoader.loadLib(Connection.class, "/nativelib", "lancedb_jni");
@@ -18,14 +27,11 @@ public class Connection implements Closeable {
private long nativeConnectionHandle;
/**
* Connect to a LanceDB instance.
*/
/** Connect to a LanceDB instance. */
public static native Connection connect(String uri);
/**
* Get the names of all tables in the database. The names are sorted in
* ascending order.
* Get the names of all tables in the database. The names are sorted in ascending order.
*
* @return the table names
*/
@@ -34,8 +40,7 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param limit The number of results to return.
* @return the table names
@@ -45,12 +50,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @return the table names
*/
public List<String> tableNames(String startAfter) {
@@ -58,12 +62,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
@@ -72,22 +75,19 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
public native List<String> tableNames(
Optional<String> startAfter, Optional<Integer> limit);
public native List<String> tableNames(Optional<String> startAfter, Optional<Integer> limit);
/**
* Closes this connection and releases any system resources associated with it. If
* the connection is
* already closed, then invoking this method has no effect.
* Closes this connection and releases any system resources associated with it. If the connection
* is already closed, then invoking this method has no effect.
*/
@Override
public void close() {
@@ -98,8 +98,7 @@ public class Connection implements Closeable {
}
/**
* Native method to release the Lance connection resources associated with the
* given handle.
* Native method to release the Lance connection resources associated with the given handle.
*
* @param handle The native handle to the connection resource.
*/

Some files were not shown because too many files have changed in this diff Show More