Compare commits

..

205 Commits

Author SHA1 Message Date
Lance Release
f744b785f8 Bump version: 0.25.0-beta.2 → 0.25.0 2025-09-04 08:32:44 +00:00
Lance Release
2e3f745820 Bump version: 0.25.0-beta.1 → 0.25.0-beta.2 2025-09-04 08:32:43 +00:00
Jack Ye
683aaed716 chore: upgrade lance to 0.35.0 (#2625) 2025-09-04 01:31:13 -07:00
Lance Release
48f7b20daa Bump version: 0.22.0-beta.0 → 0.22.0-beta.1 2025-09-03 17:51:36 +00:00
Lance Release
4dd399ca29 Bump version: 0.25.0-beta.0 → 0.25.0-beta.1 2025-09-03 17:50:41 +00:00
Jack Ye
e6f1da31dc chore: upgrade lance to 0.34.0-beta.4 (#2621) 2025-09-02 21:33:55 -07:00
Wyatt Alt
a9ea785b15 fix: remote python sdk namespace typing (#2620)
This changes the default values for some namespace parameters in the
remote python SDK from None to [], to match the underlying code it
calls.

Prior to this commit, failing to supply "namespace" with the remote SDK
would cause an error because the underlying code it dispatches to does
not consider None to be valid input.
2025-09-02 16:32:32 -07:00
Colin Patrick McCabe
cc38453391 fix!: fix doctest in query.py (#2622)
Fix doctest in query.py to include cumulative_cpu, now that lance
includes that.
2025-09-02 15:47:32 -07:00
Lance Release
47747287b6 Bump version: 0.21.4-beta.1 → 0.22.0-beta.0 2025-08-29 21:20:57 +00:00
Lance Release
0847e666a0 Bump version: 0.24.4-beta.1 → 0.25.0-beta.0 2025-08-29 21:19:51 +00:00
Wyatt Alt
981f8427e6 chore: update lance (#2610)
Adds storage_options to object_store wrap() to adhere to upstream lance
change.
2025-08-29 13:41:02 -07:00
Will Jones
f6846004ca feat: add name parameter to remaining Python create index calls (#2617)
## Summary
This PR adds the missing `name` parameter to `create_scalar_index` and
`create_fts_index` methods in the Python SDK, which was inadvertently
omitted when it was added to `create_index` in PR #2586.

## Changes
- Add `name: Optional[str] = None` parameter to abstract
`Table.create_scalar_index` and `Table.create_fts_index` methods
- Update `LanceTable` implementation to accept and pass the `name`
parameter to the underlying Rust layer
- Update `RemoteTable` implementation to accept and pass the `name`
parameter
- Enhanced tests to verify custom index names work correctly for both
scalar and FTS indices
- When `name` is not provided, default names are generated (e.g.,
`{column}_idx`)

## Test plan
- [x] Added test cases for custom names in scalar index creation
- [x] Added test cases for custom names in FTS index creation  
- [x] Verified existing tests continue to pass
- [x] Code formatting and linting checks pass

This ensures API consistency across all index creation methods in the
LanceDB Python SDK.

Fixes #2616

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-27 14:02:48 -07:00
Jack Ye
faf8973624 feat!: support multi-level namespace (#2603)
This PR adds support of multi-level namespace in a LanceDB database,
according to the Lance Namespace spec.

This allows users to create namespace inside a database connection,
perform create, drop, list, list_tables in a namespace. (other
operations like update, describe will be in a follow-up PR)

The 3 types of database connections behave like the following:
1 Local database connections will continue to have just a flat list of
tables for backwards compatibility.
2. Remote database connections will make REST API calls according to the
APIs in the Lance Namespace spec.
3. Lance Namespace connections will invoke the corresponding operations
against the specific namespace implementation which could have different
behaviors regarding these APIs.

All the table APIs now take identifier instead of name, for example
`/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is
directly in the root namespace, the API call is identical. If the table
is in a namespace, then the full table ID should be used, with `$` as
the default delimiter (`.` is a special character and creates issues
with URL parsing so `$` is used), for example
`/v1/table/ns1$table1/create`. If a different parameter needs to be
passed in, user can configure the `id_delimiter` in client config and
that becomes a query parameter, for example
`/v1/table/ns1__table1/create?delimiter=__`

The Python and Typescript APIs are kept backwards compatible, but the
following Rust APIs are not:
1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>`
is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace:
&[String]) -> Result<()>`
2. `Connection::drop_all_tables(&self) -> Result<()>` is now
`Connection::drop_all_tables(&self, name: impl AsRef<str>) ->
Result<()>`
2025-08-27 12:07:55 -07:00
Weston Pace
fabe37274f feat: add __getitems__ method impl for torch integration (#2596)
This allows a lancedb Table to act as a torch dataset.
2025-08-25 13:23:22 -07:00
Lance Release
6839ac3509 Bump version: 0.21.4-beta.0 → 0.21.4-beta.1 2025-08-22 03:55:22 +00:00
Lance Release
b88422e515 Bump version: 0.24.4-beta.0 → 0.24.4-beta.1 2025-08-22 03:54:34 +00:00
BubbleCal
8d60685ede chore: upgrade lance to 0.33.0-beta.4 (#2604)
detials:
https://github.com/lancedb/lance/releases/tag/untagged-5191abd48c1fbe76f746

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-08-21 21:18:48 +08:00
Jack Ye
04285a4a4e feat(python): integrate with lance namespace (#2599)
This PR integrates `lancedb` with `lance-namespace` so that users can
use LanceDB client to access Lance tables in any catalog services. In
general, we expect most of the logic to be delegated to the existing
`LanceDBConnection` and `LanceTable`, but the namespace implemenation
will control how table is created, dropped, and describe where the table
is stored with any related storage options like access credentials.

The implementation currently only supports a 1 level namespace that
directly contains tables. We will introduce nested namespace support in
a separated PR.

Users are expected to use it in the following way:

```python
>>> import lancedb
>>> import pyarrow as pa
>>> # Connect using GlueNamespace
>>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"})
>>> # Create a table with schema
>>> schema = pa.schema([
...     pa.field("id", pa.int64()),
...     pa.field("vector", pa.list_(pa.float32(), 2))
... ])
>>> table = db.create_table("my_table", schema=schema)
>>> # List tables
>>> db.table_names()
['my_table']
```
2025-08-20 15:46:16 -07:00
Lance Release
d4a41b5663 Bump version: 0.21.3 → 0.21.4-beta.0 2025-08-19 22:56:52 +00:00
Lance Release
adc3daa462 Bump version: 0.24.3 → 0.24.4-beta.0 2025-08-19 22:56:05 +00:00
Will Jones
acbfa6c012 feat: upgrade lance to 0.33.0-beta.3 (#2598)
Change logs:
*
[v0.33.0-beta.3](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.3)
*
[v0.33.0-beta.2](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.2)
*
[v0.33.0-beta.1](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.1)

Important changes:

* Row-level conflict resolution for delete operations
* Fixes #2593
* Fix for keeping tombstones fields around, preventing cleanup of
dropped columns.
2025-08-19 13:45:15 -07:00
Vitali Lovich
d602e9f98c fix: make cloud features optional (#2567) (#2568)
This shrinks the size of a local embedded build that can disable all the
default features. When combined with
https://github.com/lancedb/lance/pull/4362 and the dependencies are
updated to point to the fix, this resolves #2567 fully.

Verified by patching the workspace to redirect to my clone of lance with
the PR applied.
```
cargo tree -p lancedb -e no-build -e no-dev --no-default-features -i aws-config | less
```

The reason that lance itself needs to change too is that many
dependencies within that project depend on lance-io/default and lancedb
depends on them which transitively ends up enabling the cloud
regardless. The PR in lance removes the dependency on lance-io/default
from all sibling crates.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-08-15 16:46:52 -07:00
Will Jones
ad09234d59 feat: allow setting train=False and name on indices (#2586)
Enables two new parameters when building indices:

* `name`: Allows explicitly setting a name on the index. Default is
`{col_name}_idx`.
* `train` (default `True`): When set to `False`, an empty index will be
immediately created.

The upgrade of Lance means there are also additional behaviors from
cd76a993b8:

* When a scalar index is created on a Table, it will be kept around even
if all rows are deleted or updated.
* Scalar indices can be created on empty tables. They will default to
`train=False` if the table is empty.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-08-15 14:00:26 -07:00
Lance Release
0c34ffb252 Bump version: 0.21.3-beta.0 → 0.21.3 2025-08-15 18:03:26 +00:00
Lance Release
d9f333d828 Bump version: 0.21.2 → 0.21.3-beta.0 2025-08-15 18:02:43 +00:00
Lance Release
bb809abd4b Bump version: 0.24.3-beta.0 → 0.24.3 2025-08-15 18:02:04 +00:00
Lance Release
c87530f7a3 Bump version: 0.24.2 → 0.24.3-beta.0 2025-08-15 18:02:04 +00:00
Will Jones
1eb1beecd6 ci: remove more mentions of node (#2595)
I promise this time I tested it locally :)
2025-08-15 11:01:02 -07:00
Yuval Lifshitz
ce550e6c45 feat: add missing rust examples (#2583)
all 3 example are running now with:
```
cargo run --example simple
cargo run --example full_text_search
cargo run --example ivf_pq
```

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-08-15 10:38:58 -07:00
Will Jones
d3bae1f3a3 ci: drop old node mention (#2594)
This broke release here:
https://github.com/lancedb/lancedb/actions/runs/16993824504/job/48179542912
2025-08-15 09:51:19 -07:00
Will Jones
dcf53c4506 fix: limit and offset support paginating through FTS and vector search results (#2592)
Adds tests to ensure that users can paginate through simple scan, FTS,
and vector search results using `limit` and `offset`.

Tests upstream work: https://github.com/lancedb/lance/pull/4318

Closes #2459
2025-08-15 08:55:12 -07:00
Ryan Green
941eada703 docs: update indexing and compaction docs (#2362)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Clarified and expanded explanations of data management concepts in
LanceDB.
- Added notes on automatic background fragment compaction and
incremental reindexing support in LanceDB Cloud/Enterprise.
- Updated details on disabling interim exhaustive kNN search during
background reindexing.
  - Improved formatting and removed outdated FTS reindexing subsection.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-08-15 12:41:47 -02:30
Weston Pace
ed640a76d9 feat: add take_offsets and take_row_ids (#2584)
These operations have existed in lance for a long while and many users
need to drop down to lance for this capability. This PR adds the API and
implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't
currently add any load to `BaseTable`. I'm not sure that is sustainable
as base table implementations may want to specialize how they handle
this method. However, I figure it is a good starting point.

In addition, unlike Lance, this API does not currently guarantee
anything about the order of the take results. This is necessary for the
fallback filter approach to work (SQL filters cannot guarantee result
order)
2025-08-15 06:48:24 -07:00
Will Jones
296205ef96 feat: upgrade lance to v0.33.0 (#2591)
https://github.com/lancedb/lance/releases/tag/v0.33.0
2025-08-14 12:11:19 -07:00
Weston Pace
16beaaa656 ci: fix broken CI checks (#2585) 2025-08-13 10:05:57 -07:00
Tomoko Uchida
4ff87b1f4a feat: add hybrid search example in Rust (#2579)
Hello!

I'm new to lancedb and interested in the Rust SDK.
I couldn't find a good hybrid search example in Rust, so I created one.

## Usage

```bash
$ cargo run --quiet --example hybrid_search --features=sentence-transformers
Result: Python is a popular programming language.
Result: Mount Everest is the highest mountain in the world.
Result: The first computer programmer was Ada Lovelace.
Result: Coffee is one of the most popular beverages in the world.
Result: Basketball is a sport played with a ball and a hoop.
```
2025-08-12 08:22:19 -07:00
Shawn
0532ef2358 chore(deps): update crunchy to 0.2.4 (#2581)
Hi,

I'm try to build goose (rely on lancedb) for android/termux.
Found out some depsendencies need to update. 

https://github.com/block/goose/pull/3890

0.2.4 update
- nmathewson Fix cross-compilation between windows and non-windows.

https://github.com/shawn111/lancedb/actions/runs/16871317860
windows and linux build passed

https://github.com/shawn111/lancedb/actions/runs/16871859398

Signed-off-by: Shawn Wang <shawn111@gmail.com>
2025-08-11 18:00:00 -07:00
BubbleCal
dcf7334c1f chore: upgrade lance to v0.32.2-beta.1 (#2580)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-08-08 17:00:54 +08:00
Will Jones
8ffe992a6f fix: always uses slashes in table uris (#2575)
Closes #2574
2025-08-05 12:12:57 -07:00
Will Jones
9d683e4f0b feat: infer vector columns when name contains 'vector' or 'embedding' (#2547)
## Summary

- Enhanced vector column detection to use substring matching instead of
exact matching
- Now detects columns with names containing "vector" or "embedding"
(case-insensitive)
- Added integer vector support to Node.js implementation (matching
Python)
- Comprehensive test coverage for both float and integer vector types

## Changes

### Python (`python/python/lancedb/table.py`)
- Updated `_infer_target_schema()` to use substring matching with helper
function `_is_vector_column()`
- Preserved original field names instead of forcing "vector"
- Consolidated duplicate logic for better maintainability

### Node.js (`nodejs/lancedb/arrow.ts`)
- Enhanced type inference with `nameSuggestsVectorColumn()` helper
function
- Added `isAllIntegers()` function with performance optimization (checks
first 10 elements)
- Implemented integer vector support using `Uint8` type (matching
Python)
- Improved type safety by removing `any` usage

### Tests
- **Python**: Added
`test_infer_target_schema_with_vector_embedding_names()` in
`test_util.py`
- **Node.js**: Added comprehensive test case in `arrow.test.ts`
- Both test suites cover various naming patterns and integer/float
vector types

## Examples of newly supported column names:
- `user_vector`, `text_embedding`, `doc_embeddings`
- `my_vector_field`, `embedding_model`
- `VECTOR_COL`, `Vector_Mixed` (case-insensitive)
- Both float and integer arrays are properly converted to fixed-size
lists

## Test plan
- [x] All existing tests pass (backward compatibility maintained)
- [x] New tests pass for both Python and Node.js implementations
- [x] Integer vector detection works correctly in Node.js
- [x] Code passes linting and formatting checks
- [x] Performance optimized for large vector arrays

Fixes #2546

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-04 15:36:49 -07:00
Will Jones
0a1ea1858d chore: remove vectordb package (#2564)
```shell
git rm -r rust/ffi
git rm -r node
git rm ci/build_windows_artifacts.ps1
git rm ci/build_windows_artifacts_nodejs.ps1
git rm ci/build_linux_artifacts.sh
git rm ci/build_macos_artifacts.sh
git rm -r ci/manylinux_node
git rm .github/workflows/node.yml
```
2025-08-04 14:14:33 -07:00
Poornachandra.A.N
7d0127b376 feat(embeddings): add siglip embedding support to lancedb (#2499)
###  Summary

This PR adds **SigLIP** (Sigmoid Loss Image Pretraining) as a new
embedding model in the LanceDB embedding registry. SigLIP improves
image-text alignment performance using sigmoid-based contrastive loss
and offers robust zero-shot generalization.

Fixes #2498 

### What’s Implemented

#### 1. `SigLIP` Embedding Class

* Added `SigLIP` support under `python/lancedb/embeddings/siglip.py`
* Implements:

  * `compute_source_embeddings`
  * `_batch_generate_embeddings`
  * Normalization logic
  * Batch-wise progress logging for image embedding

#### 2. Registry Integration

* Registered `SigLIP` in `embeddings/__init__.py`
* `SigLIP` now usable via `connect(..., embedding="siglip")`

#### 3. Evaluation Benchmark Support

* Added SigLIP to `test_embeddings_slow.py` for side-by-side
benchmarking with OpenCLIP and ImageBind


###  New Test Methods

####  `test_siglip`

* End-to-end test to verify embeddings table creation and vector shape
for SigLIP
![WhatsApp Image 2025-07-10 at 18 00
27_a3368163](https://github.com/user-attachments/assets/e5582ee1-80a3-43d7-a7a1-26ceecce9f4d)


####  `test_siglip_vs_openclip_vs_imagebind_benchmark_full`

* Benchmarks:

  * **Recall\@1 / 5 / 10**
  * **mAP (Mean Average Precision)**
  * **Embedding & Search Latency**
  * Dimensionality reporting
![WhatsApp Image 2025-07-10 at 18 12
13_22c67a84](https://github.com/user-attachments/assets/455bf30f-62b7-4684-a3f3-ad52e2a1ffe5)


###  Notes

* SigLIP outputs 768D embeddings (vs 512D for OpenCLIP)
* Benchmark shows competitive performance despite higher dimensionality
* I'm still new to contributing to open-source and learning as I go.
Please feel free to suggest any improvements — I'm happy to make
changes!
2025-08-04 11:42:39 -07:00
Will Jones
02595dc475 feat: add overall timeout parameter to remote client (#2550)
## Summary
- Adds an overall `timeout` parameter to `TimeoutConfig` that limits the
total time for the entire request
- Can be set via config or `LANCE_CLIENT_TIMEOUT` environment variable
- Exposed in Python and Node.js bindings
- Includes comprehensive tests

## Test plan
- [x] Unit tests for Rust TimeoutConfig
- [x] Integration tests for Python bindings  
- [x] Integration tests for Node.js bindings
- [x] All existing tests pass

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-04 10:06:55 -07:00
Reed Loden
f23327af79 fix: use SPDX-compliant license name for nodejs packages (#2558)
Update license field from `Apache 2.0` to be `Apache-2.0` for all
Node.js packages.

This was causing GitHub's Dependency Review license check to fail with:
> The validity of the licenses of the dependencies below could not be
determined. Ensure that they are valid SPDX licenses
2025-08-04 09:54:53 -07:00
Wyatt Alt
c7afa724dd chore: update npm lockfile (#2563) 2025-07-30 18:28:06 -07:00
BubbleCal
c359cec504 chore: upgrade lance to 0.32.1-beta.2 (#2562)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-30 14:31:04 -07:00
Mark McCaskey
fe76496a59 fix: .nprobes method in python bindings, improve error messages (#2556)
`nprobes` with a value greater than 20 fails with the minimum error:

```
self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30

    def minimum_nprobes(self, minimum_nprobes: int) -> Self:
        """Set the minimum number of probes to use.

        See `nprobes` for more details.

        These partitions will be searched on every indexed vector query and will
        increase recall at the expense of latency.
        """
>       self._inner.minimum_nprobes(minimum_nprobes)
E       ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes

python/lancedb/query.py:2744: ValueError
```

Putting the max set before the min seems reasonable but it causes this
reasonable case to fail:
```
def test_nprobes_min_max_works_sync(table):
    LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list()
```

with

```
self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4

    def maximum_nprobes(self, maximum_nprobes: int) -> Self:
        """Set the maximum number of probes to use.

        See `nprobes` for more details.

        If this value is greater than `minimum_nprobes` then the excess partitions
        will be searched only if we have not found enough results.

        This can be useful when there is a narrow filter to allow these queries to
        spend more time searching and avoid potential false negatives.

        If this value is 0 then no limit will be applied and all partitions could be
        searched if needed to satisfy the limit.
        """
>       self._inner.maximum_nprobes(maximum_nprobes)
E       ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes

python/lancedb/query.py:2761: ValueError
```.

The case I care about is where min == max, but this solution handles it
even if they're not. If both min and max exist, we set both to the
minimum and then set the max. This isn't 100% the same as the minimum
setter checks for 0 on the min and `.nprobes` does not do any sanity
checking at all. But I figured this was the most reasonable and general
solution without touching more of this code.

As part of this I noticed the error messages were a bit ambiguous so I
made them symmetric and clarified them while I was here.
2025-07-30 09:23:25 -07:00
Weston Pace
67ec1fe75c feat: don't repartition for the sake of the metadata eraser (#2559)
The `MetadataEraserExec` is super lightweight and doesn't really justify
partitioning. I had a plan recently that was partitioning just for this
node and that seems wasteful.
2025-07-29 19:26:30 -07:00
Lance Release
70d9b04ba5 Bump version: 0.21.2-beta.2 → 0.21.2 2025-07-25 20:32:41 +00:00
Lance Release
b0d4a79c35 Bump version: 0.21.2-beta.1 → 0.21.2-beta.2 2025-07-25 20:31:50 +00:00
Lance Release
f79295c697 Bump version: 0.24.2-beta.2 → 0.24.2 2025-07-25 20:31:15 +00:00
Lance Release
381fad9b65 Bump version: 0.24.2-beta.1 → 0.24.2-beta.2 2025-07-25 20:31:15 +00:00
Tristan Zajonc
055bf91d3e fix: handle empty list with schema in table creation (#2548)
## Summary
Fixes IndexError when creating tables with empty list data and a
provided schema. Previously, `_into_pyarrow_reader()` would attempt to
access `data[0]` on empty lists, causing an IndexError. Now properly
handles empty lists by using the provided schema.

Also adds regression tests for GitHub issues #1968 and #303 to prevent
future regressions with empty table scenarios.

## Changes
- Fix IndexError in `_into_pyarrow_reader()` for empty list + schema
case
- Add Optional[pa.Schema] parameter to handle empty data gracefully  
- Add `test_create_table_empty_list_with_schema` for the IndexError fix
- Add `test_create_empty_then_add_data` for issue #1968
- Add `test_search_empty_table` for issue #303

## Test plan
- [x] All new regression tests pass
- [x] Existing tests continue to pass
- [x] Code formatted with `make format`
2025-07-25 10:23:43 +08:00
Will Jones
050f0086b8 feat: upgrade Lance to v0.32.0 (#2543)
Changelog: https://github.com/lancedb/lance/releases/tag/v0.32.0

Fixes #2521
2025-07-24 19:22:53 -07:00
Tristan Zajonc
10fa23e0d6 fix(python): expose register function in embeddings module (#2544)
## Summary
Fixes #2541

**Problem**: The `register` function was not accessible via `from
lancedb.embeddings import register` as documented, causing ImportError
for users trying to create custom embedding functions.

**Solution**: Added `register` to the exports in
`python/lancedb/embeddings/__init__.py` to match the documented API and
follow the same pattern as other registry functions (`get_registry`,
`EmbeddingFunctionRegistry`).

**Root Cause**: The function existed in `lancedb.embeddings.registry`
but wasn't exposed through the main embeddings module interface.

## Changes
- Add `register` to imports in
`/python/python/lancedb/embeddings/__init__.py`

## Test Plan
- [x] Verified `from lancedb.embeddings import register` works as
documented
- [x] Confirmed existing embedding tests pass
- [x] Checked that the fix follows existing patterns (same as
`get_registry`)
- [x] Validated linting and formatting passes

## References
Fixes #2541
2025-07-24 15:30:06 -07:00
yihong
43d9fc28b0 fix: can not build on python3.9 for dev (#2477)
This patch fix can not build on python3.9 dev

the reason is that for ibm-watsonx-ai the min version is py3.10

more can check on `pyoven` https://pyoven.org/package/ibm-watsonx-ai/

also fix tiny md lint

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-07-24 12:39:04 -07:00
aniaan
f45f0d0431 fix(python): correct type annotations in EmbeddingFunctionRegistry (#2478)
- Fix register() method's alias parameter type from 'str = None' to
'Optional[str] = None'
- Add return type annotation 'Type[EmbeddingFunction]' to get() method
- Import Type from typing module for proper type hints
2025-07-24 12:31:49 -07:00
Tristan Zajonc
b9e3c36d82 fix: replace broken documentation URLs in error messages (#2533)
Replaces broken 404 URL and unhelpful documentation links in type error
messages with working URL and inline list of supported data types.

**Before**: Points to
https://lancedb.github.io/lance/read_and_write.html (404 error)
**After**: Lists supported types inline and points to
https://lancedb.github.io/lancedb/guides/tables/
2025-07-24 12:30:27 -07:00
Chen Chongchen
3cd7dd3375 fix: to_pydantic typing (#2517)
currently, to_pydantic will always return LanceModel. If type checking
is enabled in my project. I have to use `cast(data,
List[RealModelType])` to solve type error. This PR uses generic to solve
this problem.
2025-07-24 12:30:15 -07:00
Tristan Zajonc
12d4ce4cfe fix: resolve flaky Node.js integration test for mirrored store (#2539)
## Summary
- Fixed flaky Node.js integration test for mirrored store functionality
- Converted callback-based `fs.readdir()` to `fs.promises.readdir()`
with proper async/await
- Used unique temporary directories to prevent test isolation issues
- Updated test expectations to match current IVF-PQ index file structure

## Problem
The mirrored store integration test was experiencing random failures in
CI with errors like:
- `expected 2 to equal 1` at various assertion points
- `done() called multiple times`

## Root Causes Identified
1. **Race conditions**: Mixing callback-based filesystem operations with
async functions created timing issues where assertions ran before
filesystem operations completed
2. **Test isolation**: Multiple tests shared the same temp directory
(`tmpdir()`), causing one test to see files from another
3. **Outdated expectations**: IVF-PQ indexes now create 2 files
(`auxiliary.idx` + `index.idx`) instead of 1, but the test expected only
1

## Solution
- Replace all `fs.readdir()` callbacks with `fs.promises.readdir()` and
`await`
- Use `fs.promises.mkdtemp()` to create unique temporary directories for
each test run
- Update index file count expectations from 1 to 2 files to match
current Lance behavior
- Add descriptive assertion labels for easier debugging

## Analysis
The mirroring implementation in `MirroringObjectStore::put_opts` is
synchronous - it awaits writes to both secondary (local) and primary
(S3) stores before returning. The test failures were due to
callback/async pattern mismatch and test isolation issues, not actual
async mirroring behavior.

## Test plan
- [x] Local tests are running without timing-based failures
- [x] Integration tests with AWS credentials pass in CI

This resolves the flaky failures including 'expected 2 to equal 1'
assertions and 'done() called multiple times' errors seen in CI runs.
2025-07-24 12:07:05 -07:00
Will Jones
3d1f102087 feat: allow Python and Typescript users to create Sessions (#2530)
## Summary
- Exposes `Session` in Python and Typescript so users can set the
`index_cache_size_bytes` and `metadata_cache_size_bytes`
* The `Session` is attached to the `Connection`, and thus shared across
all tables in that connection.
- Adds deprecation warnings for table-level cache configuration


🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-24 12:06:29 -07:00
Tristan Zajonc
81afd8a42f fix: use local random state in FTS test fixtures to prevent flaky failures (#2532)
## Summary
Fixes intermittent CI failures in `test_search_fts[False]` where boolean
FTS queries were returning fewer results than expected due to
non-deterministic test data generation.

## Problem
The test was using global `random` and `np.random` without seeding,
causing the boolean query `MatchQuery("puppy", "text") &
MatchQuery("runs", "text")` to sometimes return only 3 results instead
of the expected 5, leading to `AssertionError: assert 3 == 5`.

## Solution
- Replace global random calls with local `random.Random(42)` and
`np.random.RandomState(42)` objects in test fixtures
- Ensures deterministic test data while maintaining test isolation
- No impact on other tests since random state is scoped to fixtures only

## Test Results
-  `test_search_fts[False]` now passes consistently
-  All other FTS tests continue to pass 
-  No regression in other test suites (verified with `test_basic`)
-  Maintains existing test behavior and coverage
2025-07-24 11:30:02 -07:00
Tristan Zajonc
c2aa03615a fix: correct grammar in LanceDB cloud connection error message (#2537)
## Summary

Fixed a minor grammar error in the error message for missing API key
when connecting to LanceDB cloud.

## Changes

- Changed 'api_key is required to connected LanceDB cloud' to 'api_key
is required to connect to LanceDB cloud'
- Location: `python/python/lancedb/__init__.py:95`

## Test plan

- Error message formatting is correct and grammatical
- No functional changes to existing behavior
2025-07-24 09:56:06 -07:00
Tristan Zajonc
d2c6759e7f fix: use import stubs to prevent MLX doctest collection failures (#2536)
## Summary
- Add `create_import_stub()` helper to `embeddings/utils.py` for
handling optional dependencies
- Fix MLX doctest collection failures by using import stubs in
`gte_mlx_model.py`
- Module now imports successfully for doctest collection even when MLX
is not installed

## Changes
- **New utility function**: `create_import_stub()` creates placeholder
objects that allow class inheritance but raise helpful errors when used
- **Updated MLX model**: Uses import stubs instead of direct imports
that fail immediately
- **Graceful degradation**: Clear error messages when MLX functionality
is accessed without MLX installed

## Test Results
-  `pytest --doctest-modules python/lancedb` now passes (with and
without MLX installed)
-  All existing tests continue to pass
-  MLX functionality works normally when MLX is installed
-  Helpful error messages when MLX functionality is used without MLX
installed

Fixes #2538

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-07-23 16:25:33 -07:00
Weston Pace
94fb9f364a feat: update lance version to 0.32.0-b2 (#2525) 2025-07-23 12:23:10 -07:00
Will Jones
fbff244ed8 chore: add claude md files (#2531)
Gives basic context to Claude about how to do common tasks in the repo.
2025-07-23 12:20:36 -07:00
Xuanwo
7e7466d224 ci: enable trust publishing for rust crates (#2529) 2025-07-23 14:53:52 +08:00
Lance Release
cceaf27d79 Bump version: 0.21.2-beta.0 → 0.21.2-beta.1 2025-07-22 15:41:13 +00:00
Lance Release
7a15337e03 Bump version: 0.24.2-beta.0 → 0.24.2-beta.1 2025-07-22 15:40:17 +00:00
BubbleCal
96c66fd087 feat: support multivector for JS SDK (#2527)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-22 21:19:34 +08:00
Will Jones
0579303602 feat: allow setting custom Session on ListingDatabase (#2526)
## Summary

Add support for providing a custom `Session` when connecting to a
`ListingDatabase`. This allows users to configure object store
registries, caching, and other session-related settings while
maintaining full backward compatibility.

## Usage Example

```rust
use std::sync::Arc;
use lancedb::connect;

let custom_session = Arc::new(lance::session::Session::default());

let db = connect("/path/to/database")
    .session(custom_session)
    .execute()
    .await?;
```

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-21 16:28:39 -07:00
Jack Ye
75edb8756c feat(java): integrate lance-namespace to lancedb Java SDK (#2524) 2025-07-21 14:21:21 -07:00
Will Jones
88283110f4 fix: handle input with missing columns when using embedding functions (#2516)
## Summary

Fixes #2515 by implementing comprehensive support for missing columns in
Arrow table inputs when using embedding functions.

### Problem
Previously, when an Arrow table was passed to `fromDataToBuffer` with
missing columns and a schema containing embedding functions, the system
would fail because `applyEmbeddingsFromMetadata` expected all columns to
be present in the table.

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-18 15:54:25 -07:00
Lance Release
b3a637fdeb Bump version: 0.21.1 → 0.21.2-beta.0 2025-07-18 16:03:28 +00:00
Lance Release
ce24457531 Bump version: 0.24.1 → 0.24.2-beta.0 2025-07-18 16:02:37 +00:00
BubbleCal
087fe6343d test: fix random data may break test case (#2514)
this test adds a new vector and then performs vector search with
distance range.
this may fail if the new vector becomes the closest one to the query
vector

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-18 16:15:06 +08:00
Wyatt Alt
ab8cbe62dd fix: excessive object storage handle creation in create_table (#2505)
This fixes two bugs with create_table storage handle reuse. First issue
is, the database object did not previously carry a session that
create_table operations could reuse for create_table operations.

Second issue is, the inheritance logic for create_table and open_table
was causing empty storage options (i.e Some({})) to get sent, instead of
None. Lance handles these differently:

* When None is set, the object store held in the session's storage
registry that was created at "connect" is used. This value stays in the
cache long-term (probably as long as the db reference is held).
* When Some({}) is sent, LanceDB will create a new connection and cache
it for an empty key. However, that cached value will remain valid only
as long as the client holds a reference to the table. After that, the
cache is poisoned and the next create_table with the same key, will
create a new connection. This confounds reuse if e.g python gc's the
table object before another table is created.

My feeling is that the second path, if intentional, is probably meant to
serve cases where tables are overriding settings and the cached
connection is assumed not to be generally applicable. The bug is we were
engaging that mechanism for all tables.
2025-07-17 16:27:23 -07:00
Ayush Chaurasia
f076bb41f4 feat: add support for returning all scores with rerankers (#2509)
Previously `return_score="all"` was supported only for the default
reranker (RRF) and not the model based rerankers.
This adds support for keeping all scores in the base reranker so that
all model based rerankers can use it. Its a slower path than keeping
just the relevance score but can be useful in debugging
2025-07-15 21:03:03 +05:30
BubbleCal
902fb83d54 fix: set_lance_version may miss features when upgrading lance (#2510)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-15 20:11:10 +08:00
BubbleCal
779118339f chore: upgrade lance to 0.31.2-beta.3 (#2508)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-15 17:08:11 +08:00
BubbleCal
03b62599d7 feat: support ngram tokenizer (#2507)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-15 16:36:08 +08:00
Benjamin Schmidt
4c999fb651 chore: fix cleanupOlderThan docs (#2504)
Thanks for all your work.

The docstring for `OptimizeOptions ` seems to reference a non-existent
method on `Table`. I believe this is the correct example for
`cleanupOlderThan`.

This also appears in the generated docs, but I assume they live
downstream from this code?
2025-07-15 16:23:10 +08:00
Lance Release
6d23d32ab5 Bump version: 0.21.1-beta.2 → 0.21.1 2025-07-10 21:36:59 +00:00
Lance Release
704cec34e1 Bump version: 0.21.1-beta.1 → 0.21.1-beta.2 2025-07-10 21:36:26 +00:00
Lance Release
a300a238db Bump version: 0.24.1-beta.2 → 0.24.1 2025-07-10 21:36:02 +00:00
Lance Release
a41ff1df0a Bump version: 0.24.1-beta.1 → 0.24.1-beta.2 2025-07-10 21:36:02 +00:00
Weston Pace
77b005d849 feat: update lance to 0.31.1 (#2501)
This is preparation for a stable release
2025-07-10 14:35:29 -07:00
CyrusAttoun
167fccc427 fix: change 'return' to 'raise' for unimplemented remote table function (#2484)
just noticed that we're doing a 'return' instead of a 'raise' while
trying to get remote functionality working for my project. I went ahead
and implemented tests for both of the unimplemented functions (to_pandas
and to_arrow) while I was in there.

---------

Co-authored-by: Cyrus Attoun <jattoun1@gmail.com>
2025-07-09 14:27:08 -07:00
Lance Release
2bffbcefa5 Bump version: 0.21.1-beta.0 → 0.21.1-beta.1 2025-07-09 05:54:20 +00:00
Lance Release
905552f993 Bump version: 0.24.1-beta.0 → 0.24.1-beta.1 2025-07-09 05:53:28 +00:00
BubbleCal
e4898c9313 chore: sync node package-lock (#2491)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-09 12:34:03 +08:00
BubbleCal
cab36d94b2 feat: support to specify num_partitions and num_bits (#2488) 2025-07-09 11:36:09 +08:00
Weston Pace
b64252d4fd chore: don't require exact version of half (#2489)
I can't find any reason for pinning this dependency and the fact that it
is pinned can be kind of annoying to use downstream (e.g. datafusion
currently requires >= 2.6).
2025-07-08 08:36:04 -07:00
Lance Release
6fc006072c Bump version: 0.21.0 → 0.21.1-beta.0 2025-07-07 21:01:30 +00:00
Lance Release
d4bb59b542 Bump version: 0.24.0 → 0.24.1-beta.0 2025-07-07 21:00:38 +00:00
Wyatt Alt
6b2dd6de51 chore: update lance to 31.1-beta.2 (#2487) 2025-07-07 12:53:16 -07:00
BubbleCal
dbccd9e4f1 chore: upgrade lance to 0.31.1-beta.1 (#2486)
this also upgrades:
- datafusion 47.0 -> 48.0
- half 2.5.0 -> 2.6.0

to be consistent with lance

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-07 22:16:43 +08:00
Will Jones
b12ebfed4c fix: only monotonically update dataset (#2479)
Make sure we only update the latest version if it's actually newer. This
is important if there are concurrent queries, as they can take different
amounts of time.
2025-07-01 08:29:37 -07:00
Weston Pace
1dadb2aefa feat: upgrade to lance 0.31.0-beta.1 (#2469)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Updated dependencies to newer versions for improved compatibility and
stability.

* **Refactor**
* Improved internal handling of data ranges and stream lifetimes for
enhanced performance and reliability.
* Simplified code style for Python query object conversions without
affecting functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-30 11:10:53 -07:00
Haoyu Weng
eb9784d7f2 feat(python): batch Ollama embed calls (#2453)
Other embedding integrations such as Cohere and OpenAI already send
requests in batches. We should do that for Ollama too to improve
throughput.

The Ollama [`.embed`
API](63ca747622/ollama/_client.py (L359-L378))
was added in version 0.3.0 (almost a year ago) so I updated the version
requirement in pyproject.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved compatibility with newer versions of the "ollama" package by
requiring version 0.3.0 or higher.
- Enhanced embedding generation to process batches of texts more
efficiently and reliably.
- **Refactor**
	- Improved type consistency and clarity for embedding-related methods.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-30 08:28:14 -07:00
Kilerd Chan
ba755626cc fix: expose parsing error coming from invalid object store uri (#2475)
this PR is to expose the error from `ListingCatalog::open_path` which
unwrap the Result coming from `ObjectStore::from_uri` to avoid panic
2025-06-30 10:33:18 +08:00
Keming
7760799cb8 docs: fix multivector notebook markdown style (#2447)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Improved formatting and clarity in instructional text within the
Multivector on LanceDB notebook.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-27 15:34:01 -07:00
Will Jones
4beb2d2877 fix(python): make sure explain_plan works with FTS queries (#2466)
## Summary

Fixes issue #2465 where FTS explain plans only showed basic `LanceScan`
instead of detailed execution plans with FTS query details, limits, and
offsets.

## Root Cause

The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing
the `.full_text_search()` call before calling explain/analyze plan,
causing them to operate on the base query without FTS context.

## Changes

- **Fixed** `explain_plan()` and `analyze_plan()` in `src/query.rs` to
call `.full_text_search()`
- **Added comprehensive test coverage** for FTS explain plans with
limits, offsets, and filters
- **Updated existing tests** to expect correct behavior instead of buggy
behavior

## Before/After

**Before (broken):**
```
LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true
```

**After (fixed):**
```
ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score]
  Take: columns="_rowid, _score, (id), (text)"
    CoalesceBatchesExec: target_batch_size=1024
      GlobalLimitExec: skip=2, fetch=4
        MatchQuery: query=test
```

## Test Plan

- [x] All new FTS explain plan tests pass 
- [x] Existing tests continue to pass
- [x] FTS queries now show proper execution plans with MatchQuery,
limits, filters

Closes #2465

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added new test cases to verify explain plan output for full-text
search, vector queries with pagination, and queries with filters.

* **Bug Fixes**
* Improved the accuracy of explain plan and analysis output for
full-text search queries, ensuring the correct query details are
reflected.

* **Refactor**
* Enhanced the formatting and hierarchical structure of execution plans
for hybrid queries, providing clearer and more detailed plan
representations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-26 23:35:14 -07:00
Lance Release
a00b8595d1 Bump version: 0.21.0-beta.0 → 0.21.0 2025-06-20 05:47:06 +00:00
Lance Release
9c8314b4fd Bump version: 0.20.1-beta.2 → 0.21.0-beta.0 2025-06-20 05:46:27 +00:00
Lance Release
c625b6f2b2 Bump version: 0.24.0-beta.0 → 0.24.0 2025-06-20 05:46:05 +00:00
Lance Release
bec8fe6547 Bump version: 0.23.1-beta.2 → 0.24.0-beta.0 2025-06-20 05:46:04 +00:00
BubbleCal
dc1150c011 chore: upgrade lance to 0.30.0 (#2451)
lance [release
details](https://github.com/lancedb/lance/releases/tag/v0.30.0)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated dependency specifications to use exact version numbers instead
of referencing a git repository and tag.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-20 11:27:20 +08:00
Will Jones
afaefc6264 ci: fix package lock again (#2449)
We are able to push commits over here:
cb7293e073/.github/workflows/make-release-commit.yml (L88-L95)

So I think it's safe to assume this will work.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated workflow configuration to improve authentication and branch
targeting for automated release processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-19 08:51:48 -07:00
BubbleCal
cb70ff8cee feat!: switch default FTS to native lance FTS (#2428)
This switches the default FTS to native lance FTS for Python sync table
API, the other APIs have switched to native implementation already

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The default behavior for creating a full-text search index now uses
the new implementation rather than the legacy one.
- **Bug Fixes**
- Improved handling and error messages for phrase queries in full-text
search.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-19 10:38:34 +08:00
BubbleCal
cbb5a841b1 feat: support prefix matching and must_not clause (#2441) 2025-06-19 10:32:32 +08:00
Lance Release
c72f6770fd Bump version: 0.20.1-beta.1 → 0.20.1-beta.2 2025-06-18 23:33:57 +00:00
Lance Release
e5a80a5e86 Bump version: 0.23.1-beta.1 → 0.23.1-beta.2 2025-06-18 23:33:05 +00:00
Will Jones
8d0a7fad1f ci: try again to fix node lockfiles (#2445)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the release workflow to explicitly check out the main branch
during the publishing process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-18 14:45:39 -07:00
LuQQiu
b80d4d0134 chore: update Lance to v0.30.0-beta.1 (#2444)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies for improved stability and
compatibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-18 14:15:39 -07:00
satya-nutella
9645fe52c2 fix: improve error handling and embedding logic in arrow.ts (#2433)
- Enhanced error messages for schema inference failures to suggest
providing an explicit schema.
- Updated embedding application logic to check for existing destination
columns, allowing for filling embeddings in columns that are all null.
- Added comments for clarity on handling existing columns during
embedding application.

Fixes https://github.com/lancedb/lancedb/issues/2183

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **Bug Fixes**
  - Improved error messages for schema inference to enhance readability.
- Prevented redundant embedding application by skipping columns that
already contain data, avoiding unnecessary errors and computations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-18 12:45:11 -07:00
Lance Release
b77314168d Bump version: 0.20.1-beta.0 → 0.20.1-beta.1 2025-06-17 23:22:50 +00:00
Lance Release
e08d45e090 Bump version: 0.23.1-beta.0 → 0.23.1-beta.1 2025-06-17 23:22:00 +00:00
Will Jones
2e3ddb8382 ci: fix lockfile failure for vectordb node (#2443)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated release workflow to set a specific Git user name and email for
automated commits during the package publishing process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 15:14:11 -07:00
Wyatt Alt
627ca4c810 chore: update lance to v0.29.1-beta.2 (#2442)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to use a newer version of the Lance
library.
- **New Features**
- Added support for a new query occurrence type labeled "MUST NOT" in
search filters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 14:02:13 -07:00
Lance Release
f8dae4ffe9 Bump version: 0.20.0 → 0.20.1-beta.0 2025-06-16 16:30:14 +00:00
Lance Release
9eb6119468 Bump version: 0.23.0 → 0.23.1-beta.0 2025-06-16 16:29:22 +00:00
Weston Pace
59b57e30ed feat: add maximum and minimum nprobes properties (#2430)
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.

- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.

- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-13 15:18:29 -07:00
BubbleCal
fec8d58f06 feat: support a bunch or FTS features in JS SDK (#2431)
- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-12 17:04:19 +08:00
BubbleCal
84ded9d678 feat: support new FTS features in python SDK (#2411)
- AND operator
- phrase query slop param
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for combining full-text search queries using AND/OR
operators, enabling more flexible query composition.
- Introduced new query types and parameters, including boolean queries,
operator selection, occurrence constraints, and phrase slop for advanced
search scenarios.
- Enhanced asynchronous search to accept rich full-text query objects
directly.

- **Bug Fixes**
- Improved handling and validation of full-text search queries in both
synchronous and asynchronous search operations.

- **Tests**
- Updated and expanded tests to cover new full-text query types and
their usage in search functions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-06 14:33:46 +08:00
Wyatt Alt
65696d9713 chore: update lance in lancedb (#2424)
This updates lance to v0.29.1-beta.1.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies for improved consistency and
reliability. No changes to user-facing functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 19:06:51 -07:00
Will Jones
e2f2ea32e4 ci: fix vectordb release (#2422)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the release workflow to include an additional step for
improved process reliability. No changes to user-facing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 17:06:02 -07:00
Lance Release
d5f2eca754 Bump version: 0.20.0-beta.3 → 0.20.0 2025-06-04 21:08:31 +00:00
Lance Release
7fa455a8a5 Bump version: 0.20.0-beta.2 → 0.20.0-beta.3 2025-06-04 21:07:59 +00:00
Lance Release
8f42b5874e Bump version: 0.23.0-beta.3 → 0.23.0 2025-06-04 21:07:39 +00:00
Lance Release
274f19f560 Bump version: 0.23.0-beta.2 → 0.23.0-beta.3 2025-06-04 21:07:38 +00:00
Will Jones
fbcbc75b5b feat: upgrade lance to stable version (#2420)
Adds a script to change the lance dependency easily. To make this
change, I just had to run:

```bash
python ci/set_lance_version.py stable
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a script to automate updating the Lance package version in
project dependencies.
- **Chores**
- Updated workflows to improve lockfile management and automate updates
during releases and publishing.
- Switched Lance dependencies from git-based references to fixed version
numbers for improved stability.
- Enhanced lockfile update script with an option to amend commits and
quieter output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-06-04 13:34:30 -07:00
Will Jones
008f389bd0 ci: commit updated Cargo.lock (#2418)
Follow up to #2416

Forgot to do `git add`.
Also need to delete old actions updating package lock.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
  - Removed legacy workflows related to updating package lock files.
- Improved the update lockfiles script to ensure updated lockfiles are
always included in amended commits.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 08:40:38 -07:00
Lance Release
91af6518d9 Updating package-lock.json 2025-06-04 07:15:07 +00:00
Lance Release
af6819762c Updating package-lock.json 2025-06-04 07:14:50 +00:00
Lance Release
7acece493d Bump version: 0.20.0-beta.1 → 0.20.0-beta.2 2025-06-04 07:14:39 +00:00
Lance Release
20e017fedc Bump version: 0.23.0-beta.1 → 0.23.0-beta.2 2025-06-04 07:13:44 +00:00
Jack Ye
74e578b3c8 feat: upgrade lance to v0.29.0-beta.2 (#2419)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated various internal dependencies to newer versions for improved
stability and compatibility.
  - Increased the version number for the Python package.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-03 15:16:26 -07:00
Lance Release
d92d9eb3d2 Updating package-lock.json 2025-06-03 16:28:18 +00:00
Lance Release
b6cdce7bc9 Updating package-lock.json 2025-06-03 16:28:02 +00:00
Lance Release
316b406265 Bump version: 0.20.0-beta.0 → 0.20.0-beta.1 2025-06-03 16:27:53 +00:00
Lance Release
8825c7c1dd Bump version: 0.23.0-beta.0 → 0.23.0-beta.1 2025-06-03 16:26:58 +00:00
David Myriel
81c85ff702 docs: announcement for Documentation (#2410)
Just letting people know where to look starting June 1st. 

Both docsites should be pointing to lancedb.github.io/documentation.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Added a notification banner to the documentation site informing users
about a new URL for accessing the latest documentation starting June
1st, 2025. The message includes a clickable link that opens in a new
tab.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-06-03 08:55:02 -07:00
Will Jones
570f2154d5 ci: automatically update Cargo.lock (#2416)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated workflow to ignore changes in the `Cargo.lock` file during
documentation checks, reducing unnecessary workflow failures.
- Enhanced release process by adding automated lockfile updates for
Node.js and Rust components.
- Removed an obsolete package-lock update job from the publishing
workflow to streamline releases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-03 07:49:21 -07:00
Lance Release
0525c055fc Updating package-lock.json 2025-05-31 04:29:20 +00:00
Lance Release
38d11291da Updating package-lock.json 2025-05-31 03:48:11 +00:00
Lance Release
258e682574 Updating package-lock.json 2025-05-31 03:47:55 +00:00
Lance Release
d7afa600b8 Bump version: 0.19.2-beta.0 → 0.20.0-beta.0 2025-05-31 03:47:37 +00:00
Lance Release
5c7303ab2e Bump version: 0.22.2-beta.0 → 0.23.0-beta.0 2025-05-31 03:47:13 +00:00
Will Jones
5895ef4039 ci: revert unnecessary version bump (#2415)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Downgraded version numbers for the Node.js, Python, and Rust packages.
No other user-facing changes were made.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 16:51:14 -07:00
Jack Ye
0528cd858a fix: avoid failing list_indices for any unknown index (#2413)
Closes #2412 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved the reliability of listing indices by logging warnings for
errors and skipping problematic entries, ensuring successful results are
returned.
- Internal indices used for optimization are now excluded from the
visible list of indices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 14:43:12 -07:00
Jack Ye
6582f43422 feat: upgrade lance to v0.29.0-beta.1 (#2414)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies for improved stability and
compatibility. No user-facing changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-30 13:47:41 -07:00
BubbleCal
5c7f63388d feat!: upgrade lance to v0.28.0 (#2404)
this introduces some breaking changes in terms of rust API of creating
FTS index, and the default index params changed

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Updated default settings for full-text search (FTS) index creation:
stemming, stop word removal, and ASCII folding are now enabled by
default, while token position storage is disabled by default.

- **Refactor**
- Simplified and streamlined the configuration and handling of FTS index
parameters for improved maintainability and consistency across
interfaces.
- Enhanced serialization and request construction for FTS index
parameters to reduce manual handling and improve code clarity.
- Improved test coverage by explicitly enabling positional indexing in
FTS tests to support phrase queries.

- **Chores**
- Upgraded all internal dependencies related to FTS indexing to the
latest version for enhanced compatibility and performance.
- Updated package versions for Node.js, Python, and Rust components to
the latest beta releases.
- Improved CI workflows by adding Rust toolchain setup with formatting
and linting tools.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2025-05-29 15:19:24 -07:00
Renato Marroquin
d0bc671cac docs: add example for querying a lance table with SQL (#2389)
Adds example for querying a dataset with SQL

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Added new guides on querying LanceDB tables using SQL with DuckDB and
Apache Datafusion.
- Included detailed instructions for integrating LanceDB with Datafusion
in Python.
- Updated navigation to include Datafusion and SQL querying
documentation.
- Improved formatting in TypeScript and vectordb update examples for
consistency.

- **Tests**
- Added a new test demonstrating SQL querying on Lance tables via
DataFusion integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-05-29 06:14:38 -07:00
David Myriel
d37e17593d [doc] Add New Readme Page (#2405)
Added a new readme for better navigation, updated language and more
detail

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Updated the README with a modernized header, improved structure, and
clearer descriptions of features and architecture.
- Expanded and reorganized key features and product offerings for better
clarity.
- Simplified installation instructions and added a table of SDK
interfaces with documentation links.
- Enhanced community and contributor sections with new visuals and links
to social and support channels.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-27 17:45:17 +02:00
Lance Release
cb726d370e Updating package-lock.json 2025-05-23 22:36:54 +00:00
Lance Release
23ee132546 Updating package-lock.json 2025-05-23 21:58:58 +00:00
Lance Release
7fa090d330 Updating package-lock.json 2025-05-23 21:58:43 +00:00
Lance Release
07bc1c5397 Bump version: 0.19.1 → 0.19.2-beta.0 2025-05-23 21:58:31 +00:00
Lance Release
d7a9dbb9fc Bump version: 0.22.1 → 0.22.2-beta.0 2025-05-23 21:58:17 +00:00
Jack Ye
00487afc7d feat: upgrade lance to v0.27.3-beta.2 (#2403)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies for improved compatibility and
stability. No changes to user-facing features.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-23 14:53:13 -07:00
BubbleCal
1902d65aad docs: update the num_partitions recommendation (#2401) 2025-05-23 23:45:37 +08:00
Lance Release
c4fbb65b8e Updating package-lock.json 2025-05-22 07:06:03 +00:00
Lance Release
875ed7ae6f Updating package-lock.json 2025-05-22 05:58:59 +00:00
Lance Release
95a46a57ba Updating package-lock.json 2025-05-22 05:58:43 +00:00
Lance Release
51561e31a0 Bump version: 0.19.1-beta.6 → 0.19.1 2025-05-22 05:58:05 +00:00
Lance Release
7b19120578 Bump version: 0.19.1-beta.5 → 0.19.1-beta.6 2025-05-22 05:58:00 +00:00
Lance Release
745c34a6a9 Bump version: 0.22.1-beta.6 → 0.22.1 2025-05-22 05:57:20 +00:00
Lance Release
db8fa2454d Bump version: 0.22.1-beta.5 → 0.22.1-beta.6 2025-05-22 05:57:20 +00:00
Lei Xu
a67a7b4b42 chore: use stable lance (#2398)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies to use a stable release version for
improved consistency and reliability. No changes to application features
or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-21 22:34:29 -07:00
Lei Xu
496846e532 chore: bump lance version (#2397)
- Bump lance version and prepare a new release.
- Bump rust toolchain to 1.86, because GHA ubuntu does not have 1.83
`cargo-fmt` anymore
2025-05-21 14:15:55 -07:00
Ayush Chaurasia
dadcfebf8e docs: add logos in genkit docs page (#2391)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added an integration banner image to the beginning of the
Genkitx-LanceDB documentation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-20 01:40:12 +05:30
Lance Release
67033dbd7f Updating package-lock.json 2025-05-16 00:25:41 +00:00
Lance Release
05a85cfc2a Updating package-lock.json 2025-05-15 23:44:27 +00:00
Lance Release
40c5d3d72b Updating package-lock.json 2025-05-15 23:44:10 +00:00
Lance Release
198f0f80c6 Bump version: 0.19.1-beta.4 → 0.19.1-beta.5 2025-05-15 23:43:32 +00:00
Lance Release
e3f2fd3892 Bump version: 0.22.1-beta.4 → 0.22.1-beta.5 2025-05-15 23:42:46 +00:00
Wyatt Alt
f401ccc599 chore: update lance to 0.27.1-beta.1 (#2388)
This is for fe14671f1

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to newer versions for improved stability
and performance. No changes to features or functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-15 16:09:01 -07:00
Ayush Chaurasia
81b59139f8 docs: add genkit integration docs (#2383)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added a comprehensive guide for integrating LanceDB with Genkit,
including installation instructions, setup, indexing, retrieval, and
building a custom RAG pipeline with example code and screenshots.
- Updated the documentation navigation to include the new Genkit
integration, making it accessible from the site menu.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-12 18:18:07 +05:30
ayush chaurasia
1026781ab6 Revert "update"
This reverts commit 9c699b8cd9.
2025-05-11 21:04:59 +05:30
ayush chaurasia
9c699b8cd9 update 2025-05-11 21:01:53 +05:30
Lance Release
34bec59bc3 Updating package-lock.json 2025-05-08 21:34:37 +00:00
Lance Release
a5fbbf0d66 Updating package-lock.json 2025-05-08 20:20:18 +00:00
Lance Release
b42721167b Updating package-lock.json 2025-05-08 20:20:00 +00:00
Lance Release
543dec9ff0 Bump version: 0.19.1-beta.3 → 0.19.1-beta.4 2025-05-08 20:19:17 +00:00
Lance Release
04f962f6b0 Bump version: 0.22.1-beta.3 → 0.22.1-beta.4 2025-05-08 20:18:40 +00:00
LuQQiu
19e896ff69 chore: add default for result structs (#2377)
add default for result structs, when values are not provided, will go
with the default values

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved internal handling of table operation results to support
default values. No changes to user-facing features or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:09:11 -07:00
Will Jones
272e4103b2 feat: provide timeout parameter for merge_insert (#2378)
Provides the ability to set a timeout for merge insert. The default
underlying timeout is however long the first attempt takes, or if there
are multiple attempts, 30 seconds. This has two use cases:

1. Make the timeout shorter, when you want to fail if it takes too long.
2. Allow taking more time to do retries.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying a timeout when performing merge insert
operations in Python, Node.js, and Rust APIs.
- Introduced a new option to control the maximum allowed execution time
for merge inserts, including retry timeout handling.

- **Documentation**
- Updated and added documentation to describe the new timeout option and
its usage in APIs.

- **Tests**
- Added and updated tests to verify correct timeout behavior during
merge insert operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:07:05 -07:00
Wyatt Alt
75c257ebb6 fix: return IndexNotExist on remote drop index 404 (#2380)
Prior to this commit, attempting to drop an index that did not exist
would return a TableNotFound error stating that the target table does
not exist -- even when it did exist. Instead, we now return an
IndexNotFound error.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling when attempting to drop a non-existent index,
providing a more accurate error message.
- **Tests**
- Added a test to verify correct error reporting when dropping an index
that does not exist.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:24:05 -07:00
Wyatt Alt
9ee152eb42 fix: support __len__ on remote table (#2379)
This moves the __len__ method from LanceTable and RemoteTable to Table
so that child classes don't need to implement their own. In the process,
it fixes the implementation of RemoteTable's length method, which was
previously missing a return statement.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Centralized the table length functionality in the base table class,
simplifying subclass behavior.
- Removed redundant or non-functional length methods from specific table
classes.

- **Tests**
- Added a new test to verify correct table length reporting for remote
tables.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:23:39 -07:00
LuQQiu
c9ae1b1737 fix: add restore with tag in python and nodejs API (#2374)
add restore with tag API in python and nodejs API and add tests to guard
them

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The restore functionality now supports using version tags in addition
to numeric version identifiers, allowing you to revert tables to a state
marked by a tag.
- **Bug Fixes**
  - Restoring with an unknown tag now properly raises an error.
- **Documentation**
- Updated documentation and examples to clarify that restore accepts
both version numbers and tags.
- **Tests**
- Added new tests to verify restore behavior with version tags and error
handling for unknown tags.
  - Added tests for checkout and restore operations involving tags.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-06 16:12:58 -07:00
Lance Release
89dc80c42a Updating package-lock.json 2025-05-06 03:53:49 +00:00
Wyatt Alt
7b020ac799 chore: run cargo update (#2376) 2025-05-05 20:26:42 -07:00
Lance Release
529e774bbb Updating package-lock.json 2025-05-06 02:45:45 +00:00
Lance Release
7c12239305 Updating package-lock.json 2025-05-06 02:45:29 +00:00
Lance Release
d83424d6b4 Bump version: 0.19.1-beta.2 → 0.19.1-beta.3 2025-05-06 02:45:06 +00:00
Lance Release
8bf89f887c Bump version: 0.22.1-beta.2 → 0.22.1-beta.3 2025-05-06 02:44:39 +00:00
LuQQiu
b2160b2304 fix: fix backward compatibility with the add API (#2375)
add API originally returns a struct with request_id, add backward
compatibility for that

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of empty server responses for various data
operations to ensure consistent behavior across server versions.
- Added default values to version and numeric fields to prevent errors
when response data is incomplete.

- **Tests**
- Expanded tests to cover multiple server response scenarios, validating
correct version handling in data operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 19:26:27 -07:00
Lance Release
1bb82597be Updating package-lock.json 2025-05-06 01:21:13 +00:00
Lance Release
e4eee38b3c Updating package-lock.json 2025-05-06 00:09:39 +00:00
Lance Release
64fc2be503 Updating package-lock.json 2025-05-06 00:09:19 +00:00
Lance Release
dc8054e90d Bump version: 0.19.1-beta.1 → 0.19.1-beta.2 2025-05-06 00:08:55 +00:00
Lance Release
1684940946 Bump version: 0.22.1-beta.1 → 0.22.1-beta.2 2025-05-06 00:08:29 +00:00
LuQQiu
695813463c chore: reduce unneeded API calls for return version for write operations and improve test (#2373)
Reduce the duplicate code for remote write operation testing.
Avoid double call to remote to get version info, just return 0 instead
of suddenly adding extra API calls for end users when they are using old
servers.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added version tracking to table operation results, allowing users to
see the commit version associated with add, update, delete, merge, and
column modification operations.
- **Bug Fixes**
- Improved compatibility with legacy servers by standardizing version
information as zero when the server does not return a version.
- **Documentation**
- Clarified the meaning of the version field in operation results,
especially for cases involving legacy server responses.
- **Tests**
- Enhanced test coverage to verify correct behavior with both legacy and
modern server responses.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 16:47:19 -07:00
LuQQiu
ed594b0f76 feat: return version for all write operations (#2368)
return version info for all write operations (add, update, merge_insert
and column modification operations)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Table modification operations (add, update, delete, merge,
add/alter/drop columns) now return detailed result objects including
version numbers and operation statistics.
- Result objects provide clearer feedback such as rows affected and new
table version after each operation.

- **Documentation**
- Updated documentation to describe new result objects and their fields
for all relevant table operations.
- Added documentation for new result interfaces and updated method
return types in Node.js and Python APIs.

- **Tests**
- Enhanced test coverage to assert correctness of returned versioning
and operation metadata after table modifications.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 14:25:34 -07:00
245 changed files with 14720 additions and 17770 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.19.1-beta.1"
current_version = "0.22.0-beta.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.
@@ -50,11 +50,6 @@ pre_commit_hooks = [
optional_value = "final"
values = ["beta", "final"]
[[tool.bumpversion.files]]
filename = "node/package.json"
replace = "\"version\": \"{new_version}\","
search = "\"version\": \"{current_version}\","
[[tool.bumpversion.files]]
filename = "nodejs/package.json"
replace = "\"version\": \"{new_version}\","
@@ -66,39 +61,8 @@ glob = "nodejs/npm/*/package.json"
replace = "\"version\": \"{new_version}\","
search = "\"version\": \"{current_version}\","
# vectodb node binary packages
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-darwin-arm64\": \"{new_version}\""
search = "\"@lancedb/vectordb-darwin-arm64\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-darwin-x64\": \"{new_version}\""
search = "\"@lancedb/vectordb-darwin-x64\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-arm64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-arm64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-linux-x64-gnu\": \"{new_version}\""
search = "\"@lancedb/vectordb-linux-x64-gnu\": \"{current_version}\""
[[tool.bumpversion.files]]
glob = "node/package.json"
replace = "\"@lancedb/vectordb-win32-x64-msvc\": \"{new_version}\""
search = "\"@lancedb/vectordb-win32-x64-msvc\": \"{current_version}\""
# Cargo files
# ------------
[[tool.bumpversion.files]]
filename = "rust/ffi/node/Cargo.toml"
replace = "\nversion = \"{new_version}\""
search = "\nversion = \"{current_version}\""
[[tool.bumpversion.files]]
filename = "rust/lancedb/Cargo.toml"
replace = "\nversion = \"{new_version}\""

View File

@@ -5,8 +5,8 @@ on:
tags-ignore:
# We don't publish pre-releases for Rust. Crates.io is just a source
# distribution, so we don't need to publish pre-releases.
- 'v*-beta*'
- '*-v*' # for example, python-vX.Y.Z
- "v*-beta*"
- "*-v*" # for example, python-vX.Y.Z
env:
# This env var is used by Swatinem/rust-cache@v2 for the cache
@@ -19,6 +19,8 @@ env:
jobs:
build:
runs-on: ubuntu-22.04
permissions:
id-token: write
timeout-minutes: 30
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
@@ -31,6 +33,8 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- uses: rust-lang/crates-io-auth-action@v1
id: auth
- name: Publish the package
run: |
cargo publish -p lancedb --all-features --token ${{ secrets.CARGO_REGISTRY_TOKEN }}
cargo publish -p lancedb --all-features --token ${{ steps.auth.outputs.token }}

View File

@@ -56,22 +56,11 @@ jobs:
with:
node-version: 20
cache: 'npm'
cache-dependency-path: node/package-lock.json
- name: Install node dependencies
working-directory: node
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build node
working-directory: node
run: |
npm ci
npm run build
npm run tsc
- name: Create markdown files
working-directory: node
run: |
npx typedoc --plugin typedoc-plugin-markdown --out ../docs/src/javascript src/index.ts
- name: Build docs
working-directory: docs
run: |

View File

@@ -58,51 +58,3 @@ jobs:
run: |
cd docs/test/python
for d in *; do cd "$d"; echo "$d".py; python "$d".py; cd ..; done
test-node:
name: Test doc nodejs code
runs-on: ubuntu-24.04
timeout-minutes: 60
strategy:
fail-fast: false
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- name: Print CPU capabilities
run: cat /proc/cpuinfo
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
- name: Install protobuf
run: |
sudo apt update
sudo apt install -y protobuf-compiler
- name: Install dependecies needed for ubuntu
run: |
sudo apt install -y libssl-dev
rustup update && rustup default
- name: Rust cache
uses: swatinem/rust-cache@v2
- name: Install node dependencies
run: |
sudo swapoff -a
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
sudo swapon --show
cd node
npm ci
npm run build-release
cd ../docs
npm install
- name: Test
env:
LANCEDB_URI: ${{ secrets.LANCEDB_URI }}
LANCEDB_DEV_API_KEY: ${{ secrets.LANCEDB_DEV_API_KEY }}
run: |
cd docs
npm t

View File

@@ -35,6 +35,9 @@ jobs:
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
@@ -68,6 +71,9 @@ jobs:
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
@@ -110,4 +116,3 @@ jobs:
-Djdk.reflect.useDirectMethodHandle=false \
-Dio.netty.tryReflectionSetAccessible=true"
JAVA_HOME=$JAVA_17 mvn clean test

View File

@@ -84,6 +84,7 @@ jobs:
run: |
pip install bump-my-version PyGithub packaging
bash ci/bump_version.sh ${{ inputs.type }} ${{ inputs.bump-minor }} v $COMMIT_BEFORE_BUMP
bash ci/update_lockfiles.sh --amend
- name: Push new version tag
if: ${{ !inputs.dry_run }}
uses: ad-m/github-push-action@master
@@ -92,11 +93,3 @@ jobs:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
branch: ${{ github.ref }}
tags: true
- uses: ./.github/workflows/update_package_lock
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
- uses: ./.github/workflows/update_package_lock_nodejs
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,147 +0,0 @@
name: Node
on:
push:
branches:
- main
pull_request:
paths:
- node/**
- rust/ffi/node/**
- .github/workflows/node.yml
- docker-compose.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
#
# Use native CPU to accelerate tests if possible, especially for f16
# target-cpu=haswell fixes failing ci build
RUSTFLAGS: "-C debuginfo=1 -C target-cpu=haswell -C target-feature=+f16c,+avx2,+fma"
RUST_BACKTRACE: "1"
jobs:
linux:
name: Linux (Node ${{ matrix.node-version }})
timeout-minutes: 30
strategy:
matrix:
node-version: [ "18", "20" ]
runs-on: "ubuntu-22.04"
defaults:
run:
shell: bash
working-directory: node
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: node/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build
run: |
npm ci
npm run build
npm run pack-build
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
run: npm run test
macos:
timeout-minutes: 30
runs-on: "macos-13"
defaults:
run:
shell: bash
working-directory: node
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
with:
node-version: 20
cache: 'npm'
cache-dependency-path: node/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: brew install protobuf
- name: Build
run: |
npm ci
npm run build
npm run pack-build
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
run: |
npm run test
aws-integtest:
timeout-minutes: 45
runs-on: "ubuntu-22.04"
defaults:
run:
shell: bash
working-directory: node
env:
AWS_ACCESS_KEY_ID: ACCESSKEY
AWS_SECRET_ACCESS_KEY: SECRETKEY
AWS_DEFAULT_REGION: us-west-2
# this one is for s3
AWS_ENDPOINT: http://localhost:4566
# this one is for dynamodb
DYNAMODB_ENDPOINT: http://localhost:4566
ALLOW_HTTP: true
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
with:
node-version: 20
cache: 'npm'
cache-dependency-path: node/package-lock.json
- name: start local stack
run: docker compose -f ../docker-compose.yml up -d --wait
- name: create s3
run: aws s3 mb s3://lancedb-integtest --endpoint $AWS_ENDPOINT
- name: create ddb
run: |
aws dynamodb create-table \
--table-name lancedb-integtest \
--attribute-definitions '[{"AttributeName": "base_uri", "AttributeType": "S"}, {"AttributeName": "version", "AttributeType": "N"}]' \
--key-schema '[{"AttributeName": "base_uri", "KeyType": "HASH"}, {"AttributeName": "version", "KeyType": "RANGE"}]' \
--provisioned-throughput '{"ReadCapacityUnits": 10, "WriteCapacityUnits": 10}' \
--endpoint-url $DYNAMODB_ENDPOINT
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build
run: |
npm ci
npm run build
npm run pack-build
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
run: npm run integration-test

View File

@@ -47,6 +47,9 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Lint
run: |
cargo fmt --all -- --check
@@ -76,7 +79,7 @@ jobs:
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: node/package-lock.json
cache-dependency-path: nodejs/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
@@ -113,7 +116,7 @@ jobs:
set -e
npm ci
npm run docs
if ! git diff --exit-code; then
if ! git diff --exit-code -- . ':(exclude)Cargo.lock'; then
echo "Docs need to be updated"
echo "Run 'npm run docs', fix any warnings, and commit the changes."
exit 1
@@ -134,7 +137,7 @@ jobs:
with:
node-version: 20
cache: 'npm'
cache-dependency-path: node/package-lock.json
cache-dependency-path: nodejs/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |

View File

@@ -365,202 +365,3 @@ jobs:
ARGS="$ARGS --tag preview"
fi
npm publish $ARGS
# ----------------------------------------------------------------------------
# vectordb release (legacy)
# ----------------------------------------------------------------------------
# TODO: delete this when we drop vectordb
node:
name: vectordb Typescript
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: node
steps:
- name: Checkout
uses: actions/checkout@v4
- uses: actions/setup-node@v3
with:
node-version: 20
cache: "npm"
cache-dependency-path: node/package-lock.json
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build
run: |
npm ci
npm run tsc
npm pack
- name: Upload Linux Artifacts
uses: actions/upload-artifact@v4
with:
name: node-package
path: |
node/vectordb-*.tgz
node-macos:
name: vectordb ${{ matrix.config.arch }}
strategy:
matrix:
config:
- arch: x86_64-apple-darwin
runner: macos-13
- arch: aarch64-apple-darwin
# xlarge is implicitly arm64.
runner: macos-14
runs-on: ${{ matrix.config.runner }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install system dependencies
run: brew install protobuf
- name: Install npm dependencies
run: |
cd node
npm ci
- name: Build MacOS native node modules
run: bash ci/build_macos_artifacts.sh ${{ matrix.config.arch }}
- name: Upload Darwin Artifacts
uses: actions/upload-artifact@v4
with:
name: node-native-darwin-${{ matrix.config.arch }}
path: |
node/dist/lancedb-vectordb-darwin*.tgz
node-linux-gnu:
name: vectordb (${{ matrix.config.arch}}-unknown-linux-gnu)
runs-on: ${{ matrix.config.runner }}
strategy:
fail-fast: false
matrix:
config:
- arch: x86_64
runner: ubuntu-latest
- arch: aarch64
# For successful fat LTO builds, we need a large runner to avoid OOM errors.
runner: warp-ubuntu-latest-arm64-4x
steps:
- name: Checkout
uses: actions/checkout@v4
# To avoid OOM errors on ARM, we create a swap file.
- name: Configure aarch64 build
if: ${{ matrix.config.arch == 'aarch64' }}
run: |
free -h
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo "/swapfile swap swap defaults 0 0" >> sudo /etc/fstab
# print info
swapon --show
free -h
- name: Build Linux Artifacts
run: |
bash ci/build_linux_artifacts.sh ${{ matrix.config.arch }} ${{ matrix.config.arch }}-unknown-linux-gnu
- name: Upload Linux Artifacts
uses: actions/upload-artifact@v4
with:
name: node-native-linux-${{ matrix.config.arch }}-gnu
path: |
node/dist/lancedb-vectordb-linux*.tgz
node-windows:
name: vectordb ${{ matrix.target }}
runs-on: windows-2022
strategy:
fail-fast: false
matrix:
target: [x86_64-pc-windows-msvc]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Protoc v21.12
working-directory: C:\
run: |
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
7z x protoc.zip
Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
- name: Install npm dependencies
run: |
cd node
npm ci
- name: Build Windows native node modules
run: .\ci\build_windows_artifacts.ps1 ${{ matrix.target }}
- name: Upload Windows Artifacts
uses: actions/upload-artifact@v4
with:
name: node-native-windows
path: |
node/dist/lancedb-vectordb-win32*.tgz
release:
name: vectordb NPM Publish
needs: [node, node-macos, node-linux-gnu, node-windows]
runs-on: ubuntu-latest
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/download-artifact@v4
with:
pattern: node-*
- name: Display structure of downloaded files
run: ls -R
- uses: actions/setup-node@v3
with:
node-version: 20
registry-url: "https://registry.npmjs.org"
- name: Publish to NPM
env:
NODE_AUTH_TOKEN: ${{ secrets.LANCEDB_NPM_REGISTRY_TOKEN }}
run: |
# Tag beta as "preview" instead of default "latest". See lancedb
# npm publish step for more info.
if [[ $GITHUB_REF =~ refs/tags/v(.*)-beta.* ]]; then
PUBLISH_ARGS="--tag preview"
fi
mv */*.tgz .
for filename in *.tgz; do
npm publish $PUBLISH_ARGS $filename
done
- name: Deprecate
env:
NODE_AUTH_TOKEN: ${{ secrets.LANCEDB_NPM_REGISTRY_TOKEN }}
# We need to deprecate the old package to avoid confusion.
# Each time we publish a new version, it gets undeprecated.
run: npm deprecate vectordb "Use @lancedb/lancedb instead."
- name: Notify Slack Action
uses: ravsamhq/notify-slack-action@2.3.0
if: ${{ always() }}
with:
status: ${{ job.status }}
notify_when: "failure"
notification_title: "{workflow} is failing"
env:
SLACK_WEBHOOK_URL: ${{ secrets.ACTION_MONITORING_SLACK }}
update-package-lock:
if: startsWith(github.ref, 'refs/tags/v')
needs: [release]
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: main
token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
fetch-depth: 0
lfs: true
- uses: ./.github/workflows/update_package_lock
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -24,8 +24,8 @@ runs:
- name: pytest (with integration)
shell: bash
if: ${{ inputs.integration == 'true' }}
run: pytest -m "not slow" -x -v --durations=30 python/python/tests
run: pytest -m "not slow" -vv --durations=30 python/python/tests
- name: pytest (no integration tests)
shell: bash
if: ${{ inputs.integration != 'true' }}
run: pytest -m "not slow and not s3_test" -x -v --durations=30 python/python/tests
run: pytest -m "not slow and not s3_test" -vv --durations=30 python/python/tests

View File

@@ -40,6 +40,9 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
@@ -160,8 +163,8 @@ jobs:
strategy:
matrix:
target:
- x86_64-pc-windows-msvc
- aarch64-pc-windows-msvc
- x86_64-pc-windows-msvc
- aarch64-pc-windows-msvc
defaults:
run:
working-directory: rust/lancedb

View File

@@ -1,33 +0,0 @@
name: update_package_lock
description: "Update node's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./node
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

View File

@@ -1,33 +0,0 @@
name: update_package_lock_nodejs
description: "Update nodejs's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./nodejs
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

3
.gitignore vendored
View File

@@ -31,9 +31,6 @@ python/dist
*.node
**/node_modules
**/.DS_Store
node/dist
node/examples/**/package-lock.json
node/examples/**/dist
nodejs/lancedb/native*
dist

80
CLAUDE.md Normal file
View File

@@ -0,0 +1,80 @@
LanceDB is a database designed for retrieval, including vector, full-text, and hybrid search.
It is a wrapper around Lance. There are two backends: local (in-process like SQLite) and
remote (against LanceDB Cloud).
The core of LanceDB is written in Rust. There are bindings in Python, Typescript, and Java.
Project layout:
* `rust/lancedb`: The LanceDB core Rust implementation.
* `python`: The Python bindings, using PyO3.
* `nodejs`: The Typescript bindings, using napi-rs
* `java`: The Java bindings
Common commands:
* Check for compiler errors: `cargo check --quiet --features remote --tests --examples`
* Run tests: `cargo test --quiet --features remote --tests`
* Run specific test: `cargo test --quiet --features remote -p <package_name> --test <test_name>`
* Lint: `cargo clippy --quiet --features remote --tests --examples`
* Format: `cargo fmt --all`
Before committing changes, run formatting.
## Coding tips
* When writing Rust doctests for things that require a connection or table reference,
write them as a function instead of a fully executable test. This allows type checking
to run but avoids needing a full test environment. For example:
```rust
/// ```
/// use lance_index::scalar::FullTextSearchQuery;
/// use lancedb::query::{QueryBase, ExecutableQuery};
///
/// # use lancedb::Table;
/// # async fn query(table: &Table) -> Result<(), Box<dyn std::error::Error>> {
/// let results = table.query()
/// .full_text_search(FullTextSearchQuery::new("hello world".into()))
/// .execute()
/// .await?;
/// # Ok(())
/// # }
/// ```
```
## Example plan: adding a new method on Table
Adding a new method involves first adding it to the Rust core, then exposing it
in the Python and TypeScript bindings. There are both local and remote tables.
Remote tables are implemented via a HTTP API and require the `remote` cargo
feature flag to be enabled. Python has both sync and async methods.
Rust core changes:
1. Add method on `Table` struct in `rust/lancedb/src/table.rs` (calls `BaseTable` trait).
2. Add method to `BaseTable` trait in `rust/lancedb/src/table.rs`.
3. Implement new trait method on `NativeTable` in `rust/lancedb/src/table.rs`.
* Test with unit test in `rust/lancedb/src/table.rs`.
4. Implement new trait method on `RemoteTable` in `rust/lancedb/src/remote/table.rs`.
* Test with unit test in `rust/lancedb/src/remote/table.rs` against mocked endpoint.
Python bindings changes:
1. Add PyO3 method binding in `python/src/table.rs`. Run `make develop` to compile bindings.
2. Add types for PyO3 method in `python/python/lancedb/_lancedb.pyi`.
3. Add method to `AsyncTable` class in `python/python/lancedb/table.py`.
4. Add abstract method to `Table` abstract base class in `python/python/lancedb/table.py`.
5. Add concrete sync method to `LanceTable` class in `python/python/lancedb/table.py`.
* Should use `LOOP.run()` to call the corresponding `AsyncTable` method.
6. Add concrete sync method to `RemoteTable` class in `python/python/lancedb/remote/table.py`.
7. Add unit test in `python/tests/test_table.py`.
TypeScript bindings changes:
1. Add napi-rs method binding on `Table` in `nodejs/src/table.rs`.
2. Run `npm run build` to generate TypeScript definitions.
3. Add typescript method on abstract class `Table` in `nodejs/src/table.ts`.
4. Add concrete method on `LocalTable` class in `nodejs/src/native_table.ts`.
* Note: despite the name, this class is also used for remote tables.
5. Add test in `nodejs/__test__/table.test.ts`.
6. Run `npm run docs` to generate TypeScript documentation.

3304
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,5 @@
[workspace]
members = [
"rust/ffi/node",
"rust/lancedb",
"nodejs",
"python",
"java/core/lancedb-jni",
]
members = ["rust/lancedb", "nodejs", "python", "java/core/lancedb-jni"]
# Python package needs to be built by maturin.
exclude = ["python"]
resolver = "2"
@@ -21,55 +15,51 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.27.0", "features" = ["dynamodb"], tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.35.0", default-features = false, "features" = ["dynamodb"] }
lance-io = { "version" = "=0.35.0", default-features = false }
lance-index = { "version" = "=0.35.0" }
lance-linalg = { "version" = "=0.35.0" }
lance-table = { "version" = "=0.35.0" }
lance-testing = { "version" = "=0.35.0" }
lance-datafusion = { "version" = "=0.35.0" }
lance-encoding = { "version" = "=0.35.0" }
# Note that this one does not include pyarrow
arrow = { version = "54.1", optional = false }
arrow-array = "54.1"
arrow-data = "54.1"
arrow-ipc = "54.1"
arrow-ord = "54.1"
arrow-schema = "54.1"
arrow-arith = "54.1"
arrow-cast = "54.1"
arrow = { version = "55.1", optional = false }
arrow-array = "55.1"
arrow-data = "55.1"
arrow-ipc = "55.1"
arrow-ord = "55.1"
arrow-schema = "55.1"
arrow-arith = "55.1"
arrow-cast = "55.1"
async-trait = "0"
datafusion = { version = "46.0", default-features = false }
datafusion-catalog = "46.0"
datafusion-common = { version = "46.0", default-features = false }
datafusion-execution = "46.0"
datafusion-expr = "46.0"
datafusion-physical-plan = "46.0"
datafusion = { version = "48.0", default-features = false }
datafusion-catalog = "48.0"
datafusion-common = { version = "48.0", default-features = false }
datafusion-execution = "48.0"
datafusion-expr = "48.0"
datafusion-physical-plan = "48.0"
env_logger = "0.11"
half = { "version" = "=2.4.1", default-features = false, features = [
half = { "version" = "2.6.0", default-features = false, features = [
"num-traits",
] }
futures = "0"
log = "0.4"
moka = { version = "0.12", features = ["future"] }
object_store = "0.11.0"
object_store = "0.12.0"
pin-project = "1.0.7"
snafu = "0.8"
url = "2"
num-traits = "0.2"
rand = "0.8"
rand = "0.9"
regex = "1.10"
lazy_static = "1"
semver = "1.0.25"
crunchy = "0.2.4"
# Temporary pins to work around downstream issues
# https://github.com/apache/arrow-rs/commit/2fddf85afcd20110ce783ed5b4cdeb82293da30b
chrono = "=0.4.39"
chrono = "=0.4.41"
# https://github.com/RustCrypto/formats/issues/1684
base64ct = "=1.6.0"
# Workaround for: https://github.com/eira-fransham/crunchy/issues/13
crunchy = "=0.2.2"
# Workaround for: https://github.com/Lokathor/bytemuck/issues/306
bytemuck_derive = ">=1.8.1, <1.9.0"

129
README.md
View File

@@ -1,94 +1,97 @@
<a href="https://cloud.lancedb.com" target="_blank">
<img src="https://github.com/user-attachments/assets/92dad0a2-2a37-4ce1-b783-0d1b4f30a00c" alt="LanceDB Cloud Public Beta" width="100%" style="max-width: 100%;">
</a>
<div align="center">
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/user-attachments/assets/ac270358-333e-4bea-a132-acefaa94040e">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/user-attachments/assets/b864d814-0d29-4784-8fd9-807297c758c0">
<img alt="LanceDB Logo" src="https://github.com/user-attachments/assets/b864d814-0d29-4784-8fd9-807297c758c0" width=300>
</picture>
[![LanceDB](docs/src/assets/hero-header.png)](https://lancedb.com)
[![Website](https://img.shields.io/badge/-Website-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://lancedb.com/)
[![Blog](https://img.shields.io/badge/Blog-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/-Discord-100000?style=for-the-badge&logo=discord&logoColor=white&labelColor=645cfb&color=645cfb)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/-Twitter-100000?style=for-the-badge&logo=x&logoColor=white&labelColor=645cfb&color=645cfb)](https://twitter.com/lancedb)
[![LinkedIn](https://img.shields.io/badge/-LinkedIn-100000?style=for-the-badge&logo=linkedin&logoColor=white&labelColor=645cfb&color=645cfb)](https://www.linkedin.com/company/lancedb/)
**Search More, Manage Less**
<a href='https://github.com/lancedb/vectordb-recipes/tree/main' target="_blank"><img alt='LanceDB' src='https://img.shields.io/badge/VectorDB_Recipes-100000?style=for-the-badge&logo=LanceDB&logoColor=white&labelColor=645cfb&color=645cfb'/></a>
<a href='https://lancedb.github.io/lancedb/' target="_blank"><img alt='lancdb' src='https://img.shields.io/badge/DOCS-100000?style=for-the-badge&logo=lancdb&logoColor=white&labelColor=645cfb&color=645cfb'/></a>
[![Blog](https://img.shields.io/badge/Blog-12100E?style=for-the-badge&logoColor=white)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white)](https://twitter.com/lancedb)
[![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20LanceDB%20Guru-006BFF?style=for-the-badge)](https://gurubase.io/g/lancedb)
<img src="docs/src/assets/lancedb.png" alt="LanceDB" width="50%">
</p>
# **The Multimodal AI Lakehouse**
<img max-width="750px" alt="LanceDB Multimodal Search" src="https://github.com/lancedb/lancedb/assets/917119/09c5afc5-7816-4687-bae4-f2ca194426ec">
[**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.github.io/lancedb/) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors)
**The ultimate multimodal data platform for AI/ML applications.**
LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease.
LanceDB is a central location where developers can build, train and analyze their AI workloads.
</p>
</div>
<hr />
<br>
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
## **Demo: Multimodal Search by Keyword, Vector or with SQL**
<img max-width="750px" alt="LanceDB Multimodal Search" src="https://github.com/lancedb/lancedb/assets/917119/09c5afc5-7816-4687-bae4-f2ca194426ec">
The key features of LanceDB include:
## **Star LanceDB to get updates!**
* Production-scale vector search with no servers to manage.
<details>
<summary>⭐ Click here ⭐ to see how fast we're growing!</summary>
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=lancedb/lancedb&theme=dark&type=Date">
<img width="100%" src="https://api.star-history.com/svg?repos=lancedb/lancedb&theme=dark&type=Date">
</picture>
</details>
* Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
## **Key Features**:
* Support for vector similarity search, full-text search and SQL.
- **Fast Vector Search**: Search billions of vectors in milliseconds with state-of-the-art indexing.
- **Comprehensive Search**: Support for vector similarity search, full-text search and SQL.
- **Multimodal Support**: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
- **Advanced Features**: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.
* Native Python and Javascript/Typescript support.
### **Products**:
- **Open Source & Local**: 100% open source, runs locally or in your cloud. No vendor lock-in.
- **Cloud and Enterprise**: Production-scale vector search with no servers to manage. Complete data sovereignty and security.
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
### **Ecosystem**:
- **Columnar Storage**: Built on the Lance columnar format for efficient storage and analytics.
- **Seamless Integration**: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
- **Rich Ecosystem**: Integrations with [**LangChain** 🦜️🔗](https://python.langchain.com/docs/integrations/vectorstores/lancedb/), [**LlamaIndex** 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
* GPU support in building vector index(*).
## **How to Install**:
* Ecosystem integrations with [LangChain 🦜️🔗](https://python.langchain.com/docs/integrations/vectorstores/lancedb/), [LlamaIndex 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
Follow the [Quickstart](https://lancedb.github.io/lancedb/basic/) doc to set up LanceDB locally.
LanceDB's core is written in Rust 🦀 and is built using <a href="https://github.com/lancedb/lance">Lance</a>, an open-source columnar format designed for performant ML workloads.
**API & SDK:** We also support Python, Typescript and Rust SDKs
## Quick Start
| Interface | Documentation |
|-----------|---------------|
| Python SDK | https://lancedb.github.io/lancedb/python/python/ |
| Typescript SDK | https://lancedb.github.io/lancedb/js/globals/ |
| Rust SDK | https://docs.rs/lancedb/latest/lancedb/index.html |
| REST API | https://docs.lancedb.com/api-reference/introduction |
**Javascript**
```shell
npm install @lancedb/lancedb
```
## **Join Us and Contribute**
```javascript
import * as lancedb from "@lancedb/lancedb";
We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.
const db = await lancedb.connect("data/sample-lancedb");
const table = await db.createTable("vectors", [
{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
{ id: 2, vector: [1.1, 1.2], item: "bar", price: 50 },
], {mode: 'overwrite'});
If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our [**Discord**](https://discord.gg/G5DcmnZWKB) server.
[**Check out the GitHub Issues**](https://github.com/lancedb/lancedb/issues) if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.
## **Contributors**
<a href="https://github.com/lancedb/lancedb/graphs/contributors">
<img src="https://contrib.rocks/image?repo=lancedb/lancedb" />
</a>
const query = table.vectorSearch([0.1, 0.3]).limit(2);
const results = await query.toArray();
## **Stay in Touch With Us**
<div align="center">
// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.query().where("price >= 10").toArray();
```
</br>
**Python**
```shell
pip install lancedb
```
[![Website](https://img.shields.io/badge/-Website-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://lancedb.com/)
[![Blog](https://img.shields.io/badge/Blog-100000?style=for-the-badge&labelColor=645cfb&color=645cfb)](https://blog.lancedb.com/)
[![Discord](https://img.shields.io/badge/-Discord-100000?style=for-the-badge&logo=discord&logoColor=white&labelColor=645cfb&color=645cfb)](https://discord.gg/zMM32dvNtd)
[![Twitter](https://img.shields.io/badge/-Twitter-100000?style=for-the-badge&logo=x&logoColor=white&labelColor=645cfb&color=645cfb)](https://twitter.com/lancedb)
[![LinkedIn](https://img.shields.io/badge/-LinkedIn-100000?style=for-the-badge&logo=linkedin&logoColor=white&labelColor=645cfb&color=645cfb)](https://www.linkedin.com/company/lancedb/)
```python
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()
```
## Blogs, Tutorials & Videos
* 📈 <a href="https://blog.lancedb.com/benchmarking-random-access-in-lance/">2000x better performance with Lance over Parquet</a>
* 🤖 <a href="https://github.com/lancedb/vectordb-recipes/tree/main/examples/Youtube-Search-QA-Bot">Build a question and answer bot with LanceDB</a>
</div>

View File

@@ -1,22 +0,0 @@
#!/bin/bash
set -e
ARCH=${1:-x86_64}
TARGET_TRIPLE=${2:-x86_64-unknown-linux-gnu}
# We pass down the current user so that when we later mount the local files
# into the container, the files are accessible by the current user.
pushd ci/manylinux_node
docker build \
-t lancedb-node-manylinux \
--build-arg="ARCH=$ARCH" \
--build-arg="DOCKER_USER=$(id -u)" \
--progress=plain \
.
popd
# We turn on memory swap to avoid OOM killer
docker run \
-v $(pwd):/io -w /io \
--memory-swap=-1 \
lancedb-node-manylinux \
bash ci/manylinux_node/build_vectordb.sh $ARCH $TARGET_TRIPLE

View File

@@ -1,34 +0,0 @@
# Builds the macOS artifacts (node binaries).
# Usage: ./ci/build_macos_artifacts.sh [target]
# Targets supported: x86_64-apple-darwin aarch64-apple-darwin
set -e
prebuild_rust() {
# Building here for the sake of easier debugging.
pushd rust/ffi/node
echo "Building rust library for $1"
export RUST_BACKTRACE=1
cargo build --release --target $1
popd
}
build_node_binaries() {
pushd node
echo "Building node library for $1"
npm run build-release -- --target $1
npm run pack-build -- --target $1
popd
}
if [ -n "$1" ]; then
targets=$1
else
targets="x86_64-apple-darwin aarch64-apple-darwin"
fi
echo "Building artifacts for targets: $targets"
for target in $targets
do
prebuild_rust $target
build_node_binaries $target
done

View File

@@ -1,42 +0,0 @@
# Builds the Windows artifacts (node binaries).
# Usage: .\ci\build_windows_artifacts.ps1 [target]
# Targets supported:
# - x86_64-pc-windows-msvc
# - i686-pc-windows-msvc
# - aarch64-pc-windows-msvc
function Prebuild-Rust {
param (
[string]$target
)
# Building here for the sake of easier debugging.
Push-Location -Path "rust/ffi/node"
Write-Host "Building rust library for $target"
$env:RUST_BACKTRACE=1
cargo build --release --target $target
Pop-Location
}
function Build-NodeBinaries {
param (
[string]$target
)
Push-Location -Path "node"
Write-Host "Building node library for $target"
npm run build-release -- --target $target
npm run pack-build -- --target $target
Pop-Location
}
$targets = $args[0]
if (-not $targets) {
$targets = "x86_64-pc-windows-msvc", "aarch64-pc-windows-msvc"
}
Write-Host "Building artifacts for targets: $targets"
foreach ($target in $targets) {
Prebuild-Rust $target
Build-NodeBinaries $target
}

View File

@@ -1,42 +0,0 @@
# Builds the Windows artifacts (nodejs binaries).
# Usage: .\ci\build_windows_artifacts_nodejs.ps1 [target]
# Targets supported:
# - x86_64-pc-windows-msvc
# - i686-pc-windows-msvc
# - aarch64-pc-windows-msvc
function Prebuild-Rust {
param (
[string]$target
)
# Building here for the sake of easier debugging.
Push-Location -Path "rust/lancedb"
Write-Host "Building rust library for $target"
$env:RUST_BACKTRACE=1
cargo build --release --target $target
Pop-Location
}
function Build-NodeBinaries {
param (
[string]$target
)
Push-Location -Path "nodejs"
Write-Host "Building nodejs library for $target"
$env:RUST_TARGET=$target
npm run build-release
Pop-Location
}
$targets = $args[0]
if (-not $targets) {
$targets = "x86_64-pc-windows-msvc", "aarch64-pc-windows-msvc"
}
Write-Host "Building artifacts for targets: $targets"
foreach ($target in $targets) {
Prebuild-Rust $target
Build-NodeBinaries $target
}

View File

@@ -1,27 +0,0 @@
# Many linux dockerfile with Rust, Node, and Lance dependencies installed.
# This container allows building the node modules native libraries in an
# environment with a very old glibc, so that we are compatible with a wide
# range of linux distributions.
ARG ARCH=x86_64
FROM quay.io/pypa/manylinux_2_28_${ARCH}
ARG ARCH=x86_64
ARG DOCKER_USER=default_user
# Protobuf is also installed as root.
COPY install_protobuf.sh install_protobuf.sh
RUN ./install_protobuf.sh ${ARCH}
ENV DOCKER_USER=${DOCKER_USER}
# Create a group and user, but only if it doesn't exist
RUN echo ${ARCH} && id -u ${DOCKER_USER} >/dev/null 2>&1 || adduser --user-group --create-home --uid ${DOCKER_USER} build_user
# We switch to the user to install Rust and Node, since those like to be
# installed at the user level.
USER ${DOCKER_USER}
COPY prepare_manylinux_node.sh prepare_manylinux_node.sh
RUN cp /prepare_manylinux_node.sh $HOME/ && \
cd $HOME && \
./prepare_manylinux_node.sh ${ARCH}

View File

@@ -1,13 +0,0 @@
#!/bin/bash
# Builds the node module for manylinux. Invoked by ci/build_linux_artifacts.sh.
set -e
ARCH=${1:-x86_64}
TARGET_TRIPLE=${2:-x86_64-unknown-linux-gnu}
#Alpine doesn't have .bashrc
FILE=$HOME/.bashrc && test -f $FILE && source $FILE
cd node
npm ci
npm run build-release
npm run pack-build -- -t $TARGET_TRIPLE

View File

@@ -1,15 +0,0 @@
#!/bin/bash
# Installs protobuf compiler. Should be run as root.
set -e
if [[ $1 == x86_64* ]]; then
ARCH=x86_64
else
# gnu target
ARCH=aarch_64
fi
PB_REL=https://github.com/protocolbuffers/protobuf/releases
PB_VERSION=23.1
curl -LO $PB_REL/download/v$PB_VERSION/protoc-$PB_VERSION-linux-$ARCH.zip
unzip protoc-$PB_VERSION-linux-$ARCH.zip -d /usr/local

View File

@@ -1,21 +0,0 @@
#!/bin/bash
set -e
install_node() {
echo "Installing node..."
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash
source "$HOME"/.bashrc
nvm install --no-progress 18
}
install_rust() {
echo "Installing rust..."
curl https://sh.rustup.rs -sSf | bash -s -- -y
export PATH="$PATH:/root/.cargo/bin"
}
install_node
install_rust

265
ci/set_lance_version.py Normal file
View File

@@ -0,0 +1,265 @@
import argparse
import sys
import json
def run_command(command: str) -> str:
"""
Run a shell command and return stdout as a string.
If exit code is not 0, raise an exception with the stderr output.
"""
import subprocess
result = subprocess.run(command, shell=True, capture_output=True, text=True)
if result.returncode != 0:
raise Exception(f"Command failed with error: {result.stderr.strip()}")
return result.stdout.strip()
def get_latest_stable_version() -> str:
version_line = run_command("cargo info lance | grep '^version:'")
version = version_line.split(" ")[1].strip()
return version
def get_latest_preview_version() -> str:
lance_tags = run_command(
"git ls-remote --tags https://github.com/lancedb/lance.git | grep 'refs/tags/v[0-9beta.-]\\+$'"
).splitlines()
lance_tags = (
tag.split("refs/tags/")[1]
for tag in lance_tags
if "refs/tags/" in tag and "beta" in tag
)
from packaging.version import Version
latest = max(
(tag[1:] for tag in lance_tags if tag.startswith("v")), key=lambda t: Version(t)
)
return str(latest)
def extract_features(line: str) -> list:
"""
Extracts the features from a line in Cargo.toml.
Example: 'lance = { "version" = "=0.29.0", "features" = ["dynamodb"] }'
Returns: ['dynamodb']
"""
import re
match = re.search(r'"features"\s*=\s*\[\s*(.*?)\s*\]', line, re.DOTALL)
if match:
features_str = match.group(1)
return [f.strip('"') for f in features_str.split(",") if len(f) > 0]
return []
def extract_default_features(line: str) -> bool:
"""
Checks if default-features = false is present in a line in Cargo.toml.
Example: 'lance = { "version" = "=0.29.0", default-features = false, "features" = ["dynamodb"] }'
Returns: True if default-features = false is present, False otherwise
"""
import re
match = re.search(r'default-features\s*=\s*false', line)
return match is not None
def dict_to_toml_line(package_name: str, config: dict) -> str:
"""
Converts a configuration dictionary to a TOML dependency line.
Dictionary insertion order is preserved (Python 3.7+), so the caller
controls the order of fields in the output.
Args:
package_name: The name of the package (e.g., "lance", "lance-io")
config: Dictionary with keys like "version", "path", "git", "tag", "features", "default-features"
The order of keys in this dict determines the order in the output.
Returns:
A properly formatted TOML line with a trailing newline
"""
# If only version is specified, use simple format
if len(config) == 1 and "version" in config:
return f'{package_name} = "{config["version"]}"\n'
# Otherwise, use inline table format
parts = []
for key, value in config.items():
if key == "default-features" and not value:
parts.append("default-features = false")
elif key == "features":
parts.append(f'"features" = {json.dumps(value)}')
elif isinstance(value, str):
parts.append(f'"{key}" = "{value}"')
else:
# This shouldn't happen with our current usage
parts.append(f'"{key}" = {json.dumps(value)}')
return f'{package_name} = {{ {", ".join(parts)} }}\n'
def update_cargo_toml(line_updater):
"""
Updates the Cargo.toml file by applying the line_updater function to each line.
The line_updater function should take a line as input and return the updated line.
"""
with open("Cargo.toml", "r") as f:
lines = f.readlines()
new_lines = []
lance_line = ""
is_parsing_lance_line = False
for line in lines:
if line.startswith("lance"):
# Check if this is a single-line or multi-line entry
# Single-line entries either:
# 1. End with } (complete inline table)
# 2. End with " (simple version string)
# Multi-line entries start with { but don't end with }
if line.strip().endswith("}") or line.strip().endswith('"'):
# Single-line entry - process immediately
new_lines.append(line_updater(line))
elif "{" in line and not line.strip().endswith("}"):
# Multi-line entry - start accumulating
lance_line = line
is_parsing_lance_line = True
else:
# Single-line entry without quotes or braces (shouldn't happen but handle it)
new_lines.append(line_updater(line))
elif is_parsing_lance_line:
lance_line += line
if line.strip().endswith("}"):
new_lines.append(line_updater(lance_line))
lance_line = ""
is_parsing_lance_line = False
else:
# Keep the line unchanged
new_lines.append(line)
with open("Cargo.toml", "w") as f:
f.writelines(new_lines)
def set_stable_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", default-features = false, "features" = ["dynamodb"] }
lance-io = { "version" = "=0.29.0", default-features = false }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
# Build config in desired order: version, default-features, features
config = {"version": f"={version}"}
if extract_default_features(line):
config["default-features"] = False
features = extract_features(line)
if features:
config["features"] = features
return dict_to_toml_line(package_name, config)
update_cargo_toml(line_updater)
def set_preview_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", default-features = false, "features" = ["dynamodb"], "tag" = "v0.29.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-io = { "version" = "=0.29.0", default-features = false, "tag" = "v0.29.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
base_version = version.split("-")[0] # Get the base version without beta suffix
# Build config in desired order: version, default-features, features, tag, git
config = {"version": f"={base_version}"}
if extract_default_features(line):
config["default-features"] = False
features = extract_features(line)
if features:
config["features"] = features
config["tag"] = f"v{version}"
config["git"] = "https://github.com/lancedb/lance.git"
return dict_to_toml_line(package_name, config)
update_cargo_toml(line_updater)
def set_local_version():
"""
Sets lines to
lance = { "path" = "../lance/rust/lance", default-features = false, "features" = ["dynamodb"] }
lance-io = { "path" = "../lance/rust/lance-io", default-features = false }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
# Build config in desired order: path, default-features, features
config = {"path": f"../lance/rust/{package_name}"}
if extract_default_features(line):
config["default-features"] = False
features = extract_features(line)
if features:
config["features"] = features
return dict_to_toml_line(package_name, config)
update_cargo_toml(line_updater)
parser = argparse.ArgumentParser(description="Set the version of the Lance package.")
parser.add_argument(
"version",
type=str,
help="The version to set for the Lance package. Use 'stable' for the latest stable version, 'preview' for latest preview version, or a specific version number (e.g., '0.1.0'). You can also specify 'local' to use a local path.",
)
args = parser.parse_args()
if args.version == "stable":
latest_stable_version = get_latest_stable_version()
print(
f"Found latest stable version: \033[1mv{latest_stable_version}\033[0m",
file=sys.stderr,
)
set_stable_version(latest_stable_version)
elif args.version == "preview":
latest_preview_version = get_latest_preview_version()
print(
f"Found latest preview version: \033[1mv{latest_preview_version}\033[0m",
file=sys.stderr,
)
set_preview_version(latest_preview_version)
elif args.version == "local":
set_local_version()
else:
# Parse the version number.
version = args.version
# Ignore initial v if present.
if version.startswith("v"):
version = version[1:]
if "beta" in version:
set_preview_version(version)
else:
set_stable_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)

27
ci/update_lockfiles.sh Executable file
View File

@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
AMEND=false
for arg in "$@"; do
if [[ "$arg" == "--amend" ]]; then
AMEND=true
fi
done
# This updates the lockfile without building
cargo metadata --quiet > /dev/null
pushd nodejs || exit 1
npm install --package-lock-only --silent
popd
if git diff --quiet --exit-code; then
echo "No lockfile changes to commit; skipping amend."
elif $AMEND; then
git add Cargo.lock nodejs/package-lock.json
git commit --amend --no-edit
else
git add Cargo.lock nodejs/package-lock.json
git commit -m "Update lockfiles"
fi

View File

@@ -105,8 +105,7 @@ markdown_extensions:
nav:
- Home:
- LanceDB: index.md
- 👉 Quickstart: quickstart.md
- 🏃🏼‍♂️ Basic Usage: basic.md
- 🏃🏼‍♂️ Quick start: basic.md
- 📚 Concepts:
- Vector search: concepts/vector_search.md
- Indexing:
@@ -194,6 +193,7 @@ nav:
- Pandas and PyArrow: python/pandas_and_pyarrow.md
- Polars: python/polars_arrow.md
- DuckDB: python/duckdb.md
- Datafusion: python/datafusion.md
- LangChain:
- LangChain 🔗: integrations/langchain.md
- LangChain demo: notebooks/langchain_demo.ipynb
@@ -206,6 +206,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- 🎯 Examples:
- Overview: examples/index.md
- 🐍 Python:
@@ -238,9 +239,7 @@ nav:
- 👾 JavaScript (lancedb): js/globals.md
- 🦀 Rust: https://docs.rs/lancedb/latest/lancedb/
- Getting Started:
- Quickstart: quickstart.md
- Basic Usage: basic.md
- Quick start: basic.md
- Concepts:
- Vector search: concepts/vector_search.md
- Indexing:
@@ -250,6 +249,7 @@ nav:
- Data management: concepts/data_management.md
- Guides:
- Working with tables: guides/tables.md
- Working with SQL: guides/sql_querying.md
- Building an ANN index: ann_indexes.md
- Vector Search: search.md
- Full-text search (native): fts.md
@@ -326,6 +326,7 @@ nav:
- Pandas and PyArrow: python/pandas_and_pyarrow.md
- Polars: python/polars_arrow.md
- DuckDB: python/duckdb.md
- Datafusion: python/datafusion.md
- LangChain 🦜️🔗↗: integrations/langchain.md
- LangChain.js 🦜️🔗↗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
- LlamaIndex 🦙↗: integrations/llamaIndex.md
@@ -334,6 +335,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- Examples:
- examples/index.md
- 🐍 Python:

View File

@@ -0,0 +1,5 @@
{% extends "base.html" %}
{% block announce %}
📚 Starting June 1st, 2025, please use <a href="https://lancedb.github.io/documentation" target="_blank" rel="noopener noreferrer">lancedb.github.io/documentation</a> for the latest docs.
{% endblock %}

12
docs/package-lock.json generated
View File

@@ -19,7 +19,7 @@
},
"../node": {
"name": "vectordb",
"version": "0.12.0",
"version": "0.21.2-beta.0",
"cpu": [
"x64",
"arm64"
@@ -65,11 +65,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.12.0",
"@lancedb/vectordb-darwin-x64": "0.12.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.12.0",
"@lancedb/vectordb-linux-x64-gnu": "0.12.0",
"@lancedb/vectordb-win32-x64-msvc": "0.12.0"
"@lancedb/vectordb-darwin-arm64": "0.21.2-beta.0",
"@lancedb/vectordb-darwin-x64": "0.21.2-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.21.2-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.21.2-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.21.2-beta.0"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",

View File

@@ -291,7 +291,7 @@ Product quantization can lead to approximately `16 * sizeof(float32) / 1 = 64` t
`num_partitions` is used to decide how many partitions the first level `IVF` index uses.
Higher number of partitions could lead to more efficient I/O during queries and better accuracy, but it takes much more time to train.
On `SIFT-1M` dataset, our benchmark shows that keeping each partition 1K-4K rows lead to a good latency / recall.
On `SIFT-1M` dataset, our benchmark shows that keeping each partition 4K-8K rows lead to a good latency / recall.
`num_sub_vectors` specifies how many Product Quantization (PQ) short codes to generate on each vector. The number should be a factor of the vector dimension. Because
PQ is a lossy compression of the original vector, a higher `num_sub_vectors` usually results in

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

BIN
docs/src/assets/lancedb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -1,4 +1,4 @@
# Basic Usage
# Quick start
!!! info "LanceDB can be run in a number of ways:"

View File

@@ -37,6 +37,10 @@ Depending on the use case and dataset, optimal compaction will have different re
- Its always better to use *batch* inserts rather than adding 1 row at a time (to avoid too small fragments). If single-row inserts are unavoidable, run compaction on a regular basis to merge them into larger fragments.
- Keep the number of fragments under 100, which is suitable for most use cases (for *really* large datasets of >500M rows, more fragments might be needed)
!!! note
LanceDB Cloud/Enterprise supports [auto-compaction](https://docs.lancedb.com/enterprise/architecture/architecture#write-path) which automatically optimizes fragments in the background as data changes.
## Deletion
Although Lance allows you to delete rows from a dataset, it does not actually delete the data immediately. It simply marks the row as deleted in the `DataFile` that represents a fragment. For a given version of the dataset, each fragment can have up to one deletion file (if no rows were ever deleted from that fragment, it will not have a deletion file). This is important to keep in mind because it means that the data is still there, and can be recovered if needed, as long as that version still exists based on your backup policy.
@@ -50,13 +54,9 @@ Reindexing is the process of updating the index to account for new data, keeping
Both LanceDB OSS and Cloud support reindexing, but the process (at least for now) is different for each, depending on the type of index.
When a reindex job is triggered in the background, the entire data is reindexed, but in the interim as new queries come in, LanceDB will combine results from the existing index with exhaustive kNN search on the new data. This is done to ensure that you're still searching on all your data, but it does come at a performance cost. The more data that you add without reindexing, the impact on latency (due to exhaustive search) can be noticeable.
In LanceDB OSS, re-indexing happens synchronously when you call either `create_index` or `optimize` on a table. In LanceDB Cloud, re-indexing happens asynchronously as you add and update data in your table.
### Vector reindex
By default, queries will search new data even if it has yet to be indexed. This is done using brute-force methods, such as kNN for vector search, and combined with the fast index search results. This is done to ensure that you're always searching over all your data, but it does come at a performance cost. Without reindexing, adding more data to a table will make queries slower and more expensive. This behavior can be disabled by setting the [fast_search](https://lancedb.github.io/lancedb/python/python/#lancedb.query.AsyncQuery.fast_search) parameter which will instruct the query to ignore un-indexed data.
* LanceDB Cloud supports incremental reindexing, where a background process will trigger a new index build for you automatically when new data is added to a dataset
* LanceDB Cloud/Enterprise supports [automatic incremental reindexing](https://docs.lancedb.com/core#vector-index) for vector, scalar, and FTS indices, where a background process will trigger a new index build for you automatically when new data is added or modified in a dataset
* LanceDB OSS requires you to manually trigger a reindex operation -- we are working on adding incremental reindexing to LanceDB OSS as well
### FTS reindex
FTS reindexing is supported in both LanceDB OSS and Cloud, but requires that it's manually rebuilt once you have a significant enough amount of new data added that needs to be reindexed. We [updated](https://github.com/lancedb/lancedb/pull/762) Tantivy's default heap size from 128MB to 1GB in LanceDB to make it much faster to reindex, by up to 10x from the default settings.

View File

@@ -0,0 +1,60 @@
# SQL Querying
You can use DuckDB and Apache Datafusion to query your LanceDB tables using SQL.
This guide will show how to query Lance tables them using both.
We will re-use the dataset [created previously](./tables.md):
```python
import lancedb
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
table = db.create_table("pd_table", data=data)
```
## Querying a LanceDB Table with DuckDb
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to DuckDB through the Arrow compatibility layer.
To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query.
```python
import duckdb
arrow_table = table.to_lance()
duckdb.query("SELECT * FROM arrow_table")
```
| vector | item | price |
| ----------- | ---- | ----- |
| [3.1, 4.1] | foo | 10.0 |
| [5.9, 26.5] | bar | 20.0 |
## Querying a LanceDB Table with Apache Datafusion
Have the required imports before doing any querying.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-ffi-dataset"
```
Register the table created with the Datafusion session context.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic"
```
| vector | item | price |
| ----------- | ---- | ----- |
| [3.1, 4.1] | foo | 10.0 |
| [5.9, 26.5] | bar | 20.0 |

View File

@@ -0,0 +1,183 @@
### genkitx-lancedb
This is a lancedb plugin for genkit framework. It allows you to use LanceDB for ingesting and rereiving data using genkit framework.
![integration-banner-genkit](https://github.com/user-attachments/assets/a6cc28af-98e9-4425-b87c-7ab139bd7893)
### Installation
```bash
pnpm install genkitx-lancedb
```
### Usage
Adding LanceDB plugin to your genkit instance.
```ts
import { lancedbIndexerRef, lancedb, lancedbRetrieverRef, WriteMode } from 'genkitx-lancedb';
import { textEmbedding004, vertexAI } from '@genkit-ai/vertexai';
import { gemini } from '@genkit-ai/vertexai';
import { z, genkit } from 'genkit';
import { Document } from 'genkit/retriever';
import { chunk } from 'llm-chunk';
import { readFile } from 'fs/promises';
import path from 'path';
import pdf from 'pdf-parse/lib/pdf-parse';
const ai = genkit({
plugins: [
// vertexAI provides the textEmbedding004 embedder
vertexAI(),
// the local vector store requires an embedder to translate from text to vector
lancedb([
{
dbUri: '.db', // optional lancedb uri, default to .db
tableName: 'table', // optional table name, default to table
embedder: textEmbedding004,
},
]),
],
});
```
You can run this app with the following command:
```bash
genkit start -- tsx --watch src/index.ts
```
This'll add LanceDB as a retriever and indexer to the genkit instance. You can see it in the GUI view
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/e752f7f4-785b-4797-a11e-72ab06a531b7" />
**Testing retrieval on a sample table**
Let's see the raw retrieval results
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/b8d356ed-8421-4790-8fc0-d6af563b9657" />
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
<img width="1417" alt="Screenshot 2025-05-11 at 7 21 18PM" src="https://github.com/user-attachments/assets/77429525-36e2-4da6-a694-e58c1cf9eb83" />
## Creating a custom RAG flow
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
### Creating custom indexer flows
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
```ts
export const menuPdfIndexer = lancedbIndexerRef({
// Using all defaults, for dbUri, tableName, and embedder, etc
});
const chunkingConfig = {
minLength: 1000,
maxLength: 2000,
splitter: 'sentence',
overlap: 100,
delimiters: '',
} as any;
async function extractTextFromPdf(filePath: string) {
const pdfFile = path.resolve(filePath);
const dataBuffer = await readFile(pdfFile);
const data = await pdf(dataBuffer);
return data.text;
}
export const indexMenu = ai.defineFlow(
{
name: 'indexMenu',
inputSchema: z.string().describe('PDF file path'),
outputSchema: z.void(),
},
async (filePath: string) => {
filePath = path.resolve(filePath);
// Read the pdf.
const pdfTxt = await ai.run('extract-text', () =>
extractTextFromPdf(filePath)
);
// Divide the pdf text into segments.
const chunks = await ai.run('chunk-it', async () =>
chunk(pdfTxt, chunkingConfig)
);
// Convert chunks of text into documents to store in the index.
const documents = chunks.map((text) => {
return Document.fromText(text, { filePath });
});
// Add documents to the index.
await ai.index({
indexer: menuPdfIndexer,
documents,
options: {
writeMode: WriteMode.Overwrite,
} as any
});
}
);
```
<img width="1316" alt="Screenshot 2025-05-11 at 8 35 56PM" src="https://github.com/user-attachments/assets/e2a20ce4-d1d0-4fa2-9a84-f2cc26e3a29f" />
In your console, you can see the logs
<img width="511" alt="Screenshot 2025-05-11 at 7 19 14PM" src="https://github.com/user-attachments/assets/243f26c5-ed38-40b6-b661-002f40f0423a" />
### Creating custom retriever flows
You can also create custom retriever flows, utilizing more options and features provided by LanceDB.
```ts
export const menuRetriever = lancedbRetrieverRef({
tableName: "table", // Use the same table name as the indexer.
displayName: "Menu", // Use a custom display name.
export const menuQAFlow = ai.defineFlow(
{ name: "Menu", inputSchema: z.string(), outputSchema: z.string() },
async (input: string) => {
// retrieve relevant documents
const docs = await ai.retrieve({
retriever: menuRetriever,
query: input,
options: {
k: 3,
},
});
const extractedContent = docs.map(doc => {
if (doc.content && Array.isArray(doc.content) && doc.content.length > 0) {
if (doc.content[0].media && doc.content[0].media.url) {
return doc.content[0].media.url;
}
}
return "No content found";
});
console.log("Extracted content:", extractedContent);
const { text } = await ai.generate({
model: gemini('gemini-2.0-flash'),
prompt: `
You are acting as a helpful AI assistant that can answer
questions about the food available on the menu at Genkit Grub Pub.
Use only the context provided to answer the question.
If you don't know, do not make up an answer.
Do not add or change items on the menu.
Context:
${extractedContent.join('\n\n')}
Question: ${input}`,
docs,
});
return text;
}
);
```
Now using our retrieval flow, we can ask question about the ingsted PDF
<img width="1306" alt="Screenshot 2025-05-11 at 7 18 45PM" src="https://github.com/user-attachments/assets/86c66b13-7c12-4d5f-9d81-ae36bfb1c346" />

View File

@@ -0,0 +1,53 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BooleanQuery
# Class: BooleanQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BooleanQuery()
```ts
new BooleanQuery(queries): BooleanQuery
```
Creates an instance of BooleanQuery.
#### Parameters
* **queries**: [[`Occur`](../enumerations/Occur.md), [`FullTextQuery`](../interfaces/FullTextQuery.md)][]
An array of (Occur, FullTextQuery objects) to combine.
Occur specifies whether the query must match, or should match.
#### Returns
[`BooleanQuery`](BooleanQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -40,6 +40,8 @@ Creates an instance of MatchQuery.
- `boost`: The boost factor for the query (default is 1.0).
- `fuzziness`: The fuzziness level for the query (default is 0).
- `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
- `prefixLength`: The number of beginning characters being unchanged for fuzzy matching.
* **options.boost?**: `number`
@@ -47,6 +49,10 @@ Creates an instance of MatchQuery.
* **options.maxExpansions?**: `number`
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
* **options.prefixLength?**: `number`
#### Returns
[`MatchQuery`](MatchQuery.md)

View File

@@ -33,7 +33,7 @@ Construct a MergeInsertBuilder. __Internal use only.__
### execute()
```ts
execute(data): Promise<MergeStats>
execute(data, execOptions?): Promise<MergeResult>
```
Executes the merge insert operation
@@ -42,11 +42,13 @@ Executes the merge insert operation
* **data**: [`Data`](../type-aliases/Data.md)
* **execOptions?**: `Partial`&lt;[`WriteExecutionOptions`](../interfaces/WriteExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;[`MergeStats`](../interfaces/MergeStats.md)&gt;
`Promise`&lt;[`MergeResult`](../interfaces/MergeResult.md)&gt;
Statistics about the merge operation: counts of inserted, updated, and deleted rows
the merge result
***

View File

@@ -38,9 +38,12 @@ Creates an instance of MultiMatchQuery.
* **options?**
Optional parameters for the multi-match query.
- `boosts`: An array of boost factors for each column (default is 1.0 for all).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boosts?**: `number`[]
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)

View File

@@ -19,7 +19,10 @@ including methods to retrieve the query type and convert the query to a dictiona
### new PhraseQuery()
```ts
new PhraseQuery(query, column): PhraseQuery
new PhraseQuery(
query,
column,
options?): PhraseQuery
```
Creates an instance of `PhraseQuery`.
@@ -32,6 +35,12 @@ Creates an instance of `PhraseQuery`.
* **column**: `string`
The name of the column to search within.
* **options?**
Optional parameters for the phrase query.
- `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
* **options.slop?**: `number`
#### Returns
[`PhraseQuery`](PhraseQuery.md)

View File

@@ -14,7 +14,7 @@ A builder for LanceDB queries.
## Extends
- [`QueryBase`](QueryBase.md)&lt;`NativeQuery`&gt;
- `StandardQueryBase`&lt;`NativeQuery`&gt;
## Properties
@@ -26,7 +26,7 @@ protected inner: Query | Promise<Query>;
#### Inherited from
[`QueryBase`](QueryBase.md).[`inner`](QueryBase.md#inner)
`StandardQueryBase.inner`
## Methods
@@ -73,7 +73,7 @@ AnalyzeExec verbose=true, metrics=[]
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
`StandardQueryBase.analyzePlan`
***
@@ -107,7 +107,7 @@ single query)
#### Inherited from
[`QueryBase`](QueryBase.md).[`execute`](QueryBase.md#execute)
`StandardQueryBase.execute`
***
@@ -143,7 +143,7 @@ const plan = await table.query().nearestTo([0.5, 0.2]).explainPlan();
#### Inherited from
[`QueryBase`](QueryBase.md).[`explainPlan`](QueryBase.md#explainplan)
`StandardQueryBase.explainPlan`
***
@@ -164,7 +164,7 @@ Use [Table#optimize](Table.md#optimize) to index all un-indexed data.
#### Inherited from
[`QueryBase`](QueryBase.md).[`fastSearch`](QueryBase.md#fastsearch)
`StandardQueryBase.fastSearch`
***
@@ -194,7 +194,7 @@ Use `where` instead
#### Inherited from
[`QueryBase`](QueryBase.md).[`filter`](QueryBase.md#filter)
`StandardQueryBase.filter`
***
@@ -216,7 +216,7 @@ fullTextSearch(query, options?): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`fullTextSearch`](QueryBase.md#fulltextsearch)
`StandardQueryBase.fullTextSearch`
***
@@ -241,7 +241,7 @@ called then every valid row from the table will be returned.
#### Inherited from
[`QueryBase`](QueryBase.md).[`limit`](QueryBase.md#limit)
`StandardQueryBase.limit`
***
@@ -325,6 +325,10 @@ nearestToText(query, columns?): Query
offset(offset): this
```
Set the number of rows to skip before returning results.
This is useful for pagination.
#### Parameters
* **offset**: `number`
@@ -335,7 +339,7 @@ offset(offset): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`offset`](QueryBase.md#offset)
`StandardQueryBase.offset`
***
@@ -388,7 +392,7 @@ object insertion order is easy to get wrong and `Map` is more foolproof.
#### Inherited from
[`QueryBase`](QueryBase.md).[`select`](QueryBase.md#select)
`StandardQueryBase.select`
***
@@ -410,7 +414,7 @@ Collect the results as an array of objects.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArray`](QueryBase.md#toarray)
`StandardQueryBase.toArray`
***
@@ -436,7 +440,7 @@ ArrowTable.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArrow`](QueryBase.md#toarrow)
`StandardQueryBase.toArrow`
***
@@ -471,7 +475,7 @@ on the filter column(s).
#### Inherited from
[`QueryBase`](QueryBase.md).[`where`](QueryBase.md#where)
`StandardQueryBase.where`
***
@@ -493,4 +497,4 @@ order to perform hybrid search.
#### Inherited from
[`QueryBase`](QueryBase.md).[`withRowId`](QueryBase.md#withrowid)
`StandardQueryBase.withRowId`

View File

@@ -15,12 +15,11 @@ Common methods supported by all query types
## Extended by
- [`Query`](Query.md)
- [`VectorQuery`](VectorQuery.md)
- [`TakeQuery`](TakeQuery.md)
## Type Parameters
**NativeQueryType** *extends* `NativeQuery` \| `NativeVectorQuery`
**NativeQueryType** *extends* `NativeQuery` \| `NativeVectorQuery` \| `NativeTakeQuery`
## Implements
@@ -141,104 +140,6 @@ const plan = await table.query().nearestTo([0.5, 0.2]).explainPlan();
***
### fastSearch()
```ts
fastSearch(): this
```
Skip searching un-indexed data. This can make search faster, but will miss
any data that is not yet indexed.
Use [Table#optimize](Table.md#optimize) to index all un-indexed data.
#### Returns
`this`
***
### ~~filter()~~
```ts
filter(predicate): this
```
A filter statement to be applied to this query.
#### Parameters
* **predicate**: `string`
#### Returns
`this`
#### See
where
#### Deprecated
Use `where` instead
***
### fullTextSearch()
```ts
fullTextSearch(query, options?): this
```
#### Parameters
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;
#### Returns
`this`
***
### limit()
```ts
limit(limit): this
```
Set the maximum number of results to return.
By default, a plain search has no limit. If this method is not
called then every valid row from the table will be returned.
#### Parameters
* **limit**: `number`
#### Returns
`this`
***
### offset()
```ts
offset(offset): this
```
#### Parameters
* **offset**: `number`
#### Returns
`this`
***
### select()
```ts
@@ -328,37 +229,6 @@ ArrowTable.
***
### where()
```ts
where(predicate): this
```
A filter statement to be applied to this query.
The filter should be supplied as an SQL query string. For example:
#### Parameters
* **predicate**: `string`
#### Returns
`this`
#### Example
```ts
x > 10
y > 0 AND y < 100
x > 5 OR y = 'test'
Filtering performance can often be improved by creating a scalar index
on the filter column(s).
```
***
### withRowId()
```ts

View File

@@ -0,0 +1,88 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Session
# Class: Session
A session for managing caches and object stores across LanceDB operations.
Sessions allow you to configure cache sizes for index and metadata caches,
which can significantly impact memory use and performance. They can
also be re-used across multiple connections to share the same cache state.
## Constructors
### new Session()
```ts
new Session(indexCacheSizeBytes?, metadataCacheSizeBytes?): Session
```
Create a new session with custom cache sizes.
# Parameters
- `index_cache_size_bytes`: The size of the index cache in bytes.
Index data is stored in memory in this cache to speed up queries.
Defaults to 6GB if not specified.
- `metadata_cache_size_bytes`: The size of the metadata cache in bytes.
The metadata cache stores file metadata and schema information in memory.
This cache improves scan and write performance.
Defaults to 1GB if not specified.
#### Parameters
* **indexCacheSizeBytes?**: `null` \| `bigint`
* **metadataCacheSizeBytes?**: `null` \| `bigint`
#### Returns
[`Session`](Session.md)
## Methods
### approxNumItems()
```ts
approxNumItems(): number
```
Get the approximate number of items cached in the session.
#### Returns
`number`
***
### sizeBytes()
```ts
sizeBytes(): bigint
```
Get the current size of the session caches in bytes.
#### Returns
`bigint`
***
### default()
```ts
static default(): Session
```
Create a session with default cache sizes.
This is equivalent to creating a session with 6GB index cache
and 1GB metadata cache.
#### Returns
[`Session`](Session.md)

View File

@@ -40,7 +40,7 @@ Returns the name of the table
### add()
```ts
abstract add(data, options?): Promise<void>
abstract add(data, options?): Promise<AddResult>
```
Insert records into this Table.
@@ -54,14 +54,17 @@ Insert records into this Table.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AddResult`](../interfaces/AddResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table
***
### addColumns()
```ts
abstract addColumns(newColumnTransforms): Promise<void>
abstract addColumns(newColumnTransforms): Promise<AddColumnsResult>
```
Add new columns with defined values.
@@ -76,14 +79,17 @@ Add new columns with defined values.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AddColumnsResult`](../interfaces/AddColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after adding the columns.
***
### alterColumns()
```ts
abstract alterColumns(columnAlterations): Promise<void>
abstract alterColumns(columnAlterations): Promise<AlterColumnsResult>
```
Alter the name or nullability of columns.
@@ -96,7 +102,10 @@ Alter the name or nullability of columns.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`AlterColumnsResult`](../interfaces/AlterColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after altering the columns.
***
@@ -252,7 +261,7 @@ await table.createIndex("my_float_col");
### delete()
```ts
abstract delete(predicate): Promise<void>
abstract delete(predicate): Promise<DeleteResult>
```
Delete the rows that satisfy the predicate.
@@ -263,7 +272,10 @@ Delete the rows that satisfy the predicate.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`DeleteResult`](../interfaces/DeleteResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table
***
@@ -284,7 +296,7 @@ Return a brief description of the table
### dropColumns()
```ts
abstract dropColumns(columnNames): Promise<void>
abstract dropColumns(columnNames): Promise<DropColumnsResult>
```
Drop one or more columns from the dataset
@@ -303,7 +315,10 @@ then call ``cleanup_files`` to remove the old files.
#### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`DropColumnsResult`](../interfaces/DropColumnsResult.md)&gt;
A promise that resolves to an object
containing the new version number of the table after dropping the columns.
***
@@ -597,7 +612,7 @@ of the given query
#### Parameters
* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md) \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md) \| [`MultiVector`](../type-aliases/MultiVector.md) \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
the query, a vector or string
* **queryType?**: `string`
@@ -659,6 +674,48 @@ console.log(tags); // { "v1": { version: 1, manifestSize: ... } }
***
### takeOffsets()
```ts
abstract takeOffsets(offsets): TakeQuery
```
Create a query that returns a subset of the rows in the table.
#### Parameters
* **offsets**: `number`[]
The offsets of the rows to return.
#### Returns
[`TakeQuery`](TakeQuery.md)
A builder that can be used to parameterize the query.
***
### takeRowIds()
```ts
abstract takeRowIds(rowIds): TakeQuery
```
Create a query that returns a subset of the rows in the table.
#### Parameters
* **rowIds**: `number`[]
The row ids of the rows to return.
#### Returns
[`TakeQuery`](TakeQuery.md)
A builder that can be used to parameterize the query.
***
### toArrow()
```ts
@@ -678,7 +735,7 @@ Return the table as an arrow table
#### update(opts)
```ts
abstract update(opts): Promise<void>
abstract update(opts): Promise<UpdateResult>
```
Update existing records in the Table
@@ -689,7 +746,10 @@ Update existing records in the Table
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object containing
the number of rows updated and the new version number
##### Example
@@ -700,7 +760,7 @@ table.update({where:"x = 2", values:{"vector": [10, 10]}})
#### update(opts)
```ts
abstract update(opts): Promise<void>
abstract update(opts): Promise<UpdateResult>
```
Update existing records in the Table
@@ -711,7 +771,10 @@ Update existing records in the Table
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object containing
the number of rows updated and the new version number
##### Example
@@ -722,7 +785,7 @@ table.update({where:"x = 2", valuesSql:{"x": "x + 1"}})
#### update(updates, options)
```ts
abstract update(updates, options?): Promise<void>
abstract update(updates, options?): Promise<UpdateResult>
```
Update existing records in the Table
@@ -745,10 +808,6 @@ repeatedly calilng this method.
* **updates**: `Record`&lt;`string`, `string`&gt; \| `Map`&lt;`string`, `string`&gt;
the
columns to update
Keys in the map should specify the name of the column to update.
Values in the map provide the new value of the column. These can
be SQL literal strings (e.g. "7" or "'foo'") or they can be expressions
based on the row being updated (e.g. "my_col + 1")
* **options?**: `Partial`&lt;[`UpdateOptions`](../interfaces/UpdateOptions.md)&gt;
additional options to control
@@ -756,7 +815,15 @@ repeatedly calilng this method.
##### Returns
`Promise`&lt;`void`&gt;
`Promise`&lt;[`UpdateResult`](../interfaces/UpdateResult.md)&gt;
A promise that resolves to an object
containing the number of rows updated and the new version number
Keys in the map should specify the name of the column to update.
Values in the map provide the new value of the column. These can
be SQL literal strings (e.g. "7" or "'foo'") or they can be expressions
based on the row being updated (e.g. "my_col + 1")
***
@@ -774,7 +841,7 @@ by `query`.
#### Parameters
* **vector**: [`IntoVector`](../type-aliases/IntoVector.md)
* **vector**: [`IntoVector`](../type-aliases/IntoVector.md) \| [`MultiVector`](../type-aliases/MultiVector.md)
#### Returns

View File

@@ -0,0 +1,265 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / TakeQuery
# Class: TakeQuery
A query that returns a subset of the rows in the table.
## Extends
- [`QueryBase`](QueryBase.md)&lt;`NativeTakeQuery`&gt;
## Properties
### inner
```ts
protected inner: TakeQuery | Promise<TakeQuery>;
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`inner`](QueryBase.md#inner)
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### execute()
```ts
protected execute(options?): RecordBatchIterator
```
Execute the query and return the results as an
#### Parameters
* **options?**: `Partial`&lt;[`QueryExecutionOptions`](../interfaces/QueryExecutionOptions.md)&gt;
#### Returns
[`RecordBatchIterator`](RecordBatchIterator.md)
#### See
- AsyncIterator
of
- RecordBatch.
By default, LanceDb will use many threads to calculate results and, when
the result set is large, multiple batches will be processed at one time.
This readahead is limited however and backpressure will be applied if this
stream is consumed slowly (this constrains the maximum memory used by a
single query)
#### Inherited from
[`QueryBase`](QueryBase.md).[`execute`](QueryBase.md#execute)
***
### explainPlan()
```ts
explainPlan(verbose): Promise<string>
```
Generates an explanation of the query execution plan.
#### Parameters
* **verbose**: `boolean` = `false`
If true, provides a more detailed explanation. Defaults to false.
#### Returns
`Promise`&lt;`string`&gt;
A Promise that resolves to a string containing the query execution plan explanation.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).explainPlan();
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`explainPlan`](QueryBase.md#explainplan)
***
### select()
```ts
select(columns): this
```
Return only the specified columns.
By default a query will return all columns from the table. However, this can have
a very significant impact on latency. LanceDb stores data in a columnar fashion. This
means we can finely tune our I/O to select exactly the columns we need.
As a best practice you should always limit queries to the columns that you need. If you
pass in an array of column names then only those columns will be returned.
You can also use this method to create new "dynamic" columns based on your existing columns.
For example, you may not care about "a" or "b" but instead simply want "a + b". This is often
seen in the SELECT clause of an SQL query (e.g. `SELECT a+b FROM my_table`).
To create dynamic columns you can pass in a Map<string, string>. A column will be returned
for each entry in the map. The key provides the name of the column. The value is
an SQL string used to specify how the column is calculated.
For example, an SQL query might state `SELECT a + b AS combined, c`. The equivalent
input to this method would be:
#### Parameters
* **columns**: `string` \| `string`[] \| `Record`&lt;`string`, `string`&gt; \| `Map`&lt;`string`, `string`&gt;
#### Returns
`this`
#### Example
```ts
new Map([["combined", "a + b"], ["c", "c"]])
Columns will always be returned in the order given, even if that order is different than
the order used when adding the data.
Note that you can pass in a `Record<string, string>` (e.g. an object literal). This method
uses `Object.entries` which should preserve the insertion order of the object. However,
object insertion order is easy to get wrong and `Map` is more foolproof.
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`select`](QueryBase.md#select)
***
### toArray()
```ts
toArray(options?): Promise<any[]>
```
Collect the results as an array of objects.
#### Parameters
* **options?**: `Partial`&lt;[`QueryExecutionOptions`](../interfaces/QueryExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;`any`[]&gt;
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArray`](QueryBase.md#toarray)
***
### toArrow()
```ts
toArrow(options?): Promise<Table<any>>
```
Collect the results as an Arrow
#### Parameters
* **options?**: `Partial`&lt;[`QueryExecutionOptions`](../interfaces/QueryExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;`Table`&lt;`any`&gt;&gt;
#### See
ArrowTable.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArrow`](QueryBase.md#toarrow)
***
### withRowId()
```ts
withRowId(): this
```
Whether to return the row id in the results.
This column can be used to match results between different queries. For
example, to match results from a full text search and a vector search in
order to perform hybrid search.
#### Returns
`this`
#### Inherited from
[`QueryBase`](QueryBase.md).[`withRowId`](QueryBase.md#withrowid)

View File

@@ -16,7 +16,7 @@ This builder can be reused to execute the query many times.
## Extends
- [`QueryBase`](QueryBase.md)&lt;`NativeVectorQuery`&gt;
- `StandardQueryBase`&lt;`NativeVectorQuery`&gt;
## Properties
@@ -28,7 +28,7 @@ protected inner: VectorQuery | Promise<VectorQuery>;
#### Inherited from
[`QueryBase`](QueryBase.md).[`inner`](QueryBase.md#inner)
`StandardQueryBase.inner`
## Methods
@@ -91,7 +91,7 @@ AnalyzeExec verbose=true, metrics=[]
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
`StandardQueryBase.analyzePlan`
***
@@ -248,7 +248,7 @@ single query)
#### Inherited from
[`QueryBase`](QueryBase.md).[`execute`](QueryBase.md#execute)
`StandardQueryBase.execute`
***
@@ -284,7 +284,7 @@ const plan = await table.query().nearestTo([0.5, 0.2]).explainPlan();
#### Inherited from
[`QueryBase`](QueryBase.md).[`explainPlan`](QueryBase.md#explainplan)
`StandardQueryBase.explainPlan`
***
@@ -305,7 +305,7 @@ Use [Table#optimize](Table.md#optimize) to index all un-indexed data.
#### Inherited from
[`QueryBase`](QueryBase.md).[`fastSearch`](QueryBase.md#fastsearch)
`StandardQueryBase.fastSearch`
***
@@ -335,7 +335,7 @@ Use `where` instead
#### Inherited from
[`QueryBase`](QueryBase.md).[`filter`](QueryBase.md#filter)
`StandardQueryBase.filter`
***
@@ -357,7 +357,7 @@ fullTextSearch(query, options?): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`fullTextSearch`](QueryBase.md#fulltextsearch)
`StandardQueryBase.fullTextSearch`
***
@@ -382,7 +382,54 @@ called then every valid row from the table will be returned.
#### Inherited from
[`QueryBase`](QueryBase.md).[`limit`](QueryBase.md#limit)
`StandardQueryBase.limit`
***
### maximumNprobes()
```ts
maximumNprobes(maximumNprobes): VectorQuery
```
Set the maximum number of probes used.
This controls the maximum number of partitions that will be searched. If this
number is greater than minimumNprobes then the excess partitions will _only_ be
searched if we have not found enough results. This can be useful when there is
a narrow filter to allow these queries to spend more time searching and avoid
potential false negatives.
#### Parameters
* **maximumNprobes**: `number`
#### Returns
[`VectorQuery`](VectorQuery.md)
***
### minimumNprobes()
```ts
minimumNprobes(minimumNprobes): VectorQuery
```
Set the minimum number of probes used.
This controls the minimum number of partitions that will be searched. This
parameter will impact every query against a vector index, regardless of the
filter. See `nprobes` for more details. Higher values will increase recall
but will also increase latency.
#### Parameters
* **minimumNprobes**: `number`
#### Returns
[`VectorQuery`](VectorQuery.md)
***
@@ -413,6 +460,10 @@ For best results we recommend tuning this parameter with a benchmark against
your actual data to find the smallest possible value that will still give
you the desired recall.
For more fine grained control over behavior when you have a very narrow filter
you can use `minimumNprobes` and `maximumNprobes`. This method sets both
the minimum and maximum to the same value.
#### Parameters
* **nprobes**: `number`
@@ -429,6 +480,10 @@ you the desired recall.
offset(offset): this
```
Set the number of rows to skip before returning results.
This is useful for pagination.
#### Parameters
* **offset**: `number`
@@ -439,7 +494,7 @@ offset(offset): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`offset`](QueryBase.md#offset)
`StandardQueryBase.offset`
***
@@ -586,7 +641,7 @@ object insertion order is easy to get wrong and `Map` is more foolproof.
#### Inherited from
[`QueryBase`](QueryBase.md).[`select`](QueryBase.md#select)
`StandardQueryBase.select`
***
@@ -608,7 +663,7 @@ Collect the results as an array of objects.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArray`](QueryBase.md#toarray)
`StandardQueryBase.toArray`
***
@@ -634,7 +689,7 @@ ArrowTable.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArrow`](QueryBase.md#toarrow)
`StandardQueryBase.toArrow`
***
@@ -669,7 +724,7 @@ on the filter column(s).
#### Inherited from
[`QueryBase`](QueryBase.md).[`where`](QueryBase.md#where)
`StandardQueryBase.where`
***
@@ -691,4 +746,4 @@ order to perform hybrid search.
#### Inherited from
[`QueryBase`](QueryBase.md).[`withRowId`](QueryBase.md#withrowid)
`StandardQueryBase.withRowId`

View File

@@ -15,6 +15,14 @@ Enum representing the types of full-text queries supported.
## Enumeration Members
### Boolean
```ts
Boolean: "boolean";
```
***
### Boost
```ts

View File

@@ -0,0 +1,37 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Occur
# Enumeration: Occur
Enum representing the occurrence of terms in full-text queries.
- `Must`: The term must be present in the document.
- `Should`: The term should contribute to the document score, but is not required.
- `MustNot`: The term must not be present in the document.
## Enumeration Members
### Must
```ts
Must: "MUST";
```
***
### MustNot
```ts
MustNot: "MUST_NOT";
```
***
### Should
```ts
Should: "SHOULD";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Operator
# Enumeration: Operator
Enum representing the logical operators used in full-text queries.
- `And`: All terms must match.
- `Or`: At least one term must match.
## Enumeration Members
### And
```ts
And: "AND";
```
***
### Or
```ts
Or: "OR";
```

View File

@@ -6,10 +6,13 @@
# Function: connect()
## connect(uri, options)
## connect(uri, options, session)
```ts
function connect(uri, options?): Promise<Connection>
function connect(
uri,
options?,
session?): Promise<Connection>
```
Connect to a LanceDB instance at the given URI.
@@ -29,6 +32,8 @@ Accepted formats:
* **options?**: `Partial`&lt;[`ConnectionOptions`](../interfaces/ConnectionOptions.md)&gt;
The options to use when connecting to the database
* **session?**: [`Session`](../classes/Session.md)
### Returns
`Promise`&lt;[`Connection`](../classes/Connection.md)&gt;
@@ -77,7 +82,7 @@ Accepted formats:
[ConnectionOptions](../interfaces/ConnectionOptions.md) for more details on the URI format.
### Example
### Examples
```ts
const conn = await connect({
@@ -85,3 +90,11 @@ const conn = await connect({
storageOptions: {timeout: "60s"}
});
```
```ts
const session = Session.default();
const conn = await connect({
uri: "/path/to/database",
session: session
});
```

View File

@@ -12,9 +12,12 @@
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
## Classes
- [BooleanQuery](classes/BooleanQuery.md)
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)
@@ -26,21 +29,28 @@
- [Query](classes/Query.md)
- [QueryBase](classes/QueryBase.md)
- [RecordBatchIterator](classes/RecordBatchIterator.md)
- [Session](classes/Session.md)
- [Table](classes/Table.md)
- [TagContents](classes/TagContents.md)
- [Tags](classes/Tags.md)
- [TakeQuery](classes/TakeQuery.md)
- [VectorColumnOptions](classes/VectorColumnOptions.md)
- [VectorQuery](classes/VectorQuery.md)
## Interfaces
- [AddColumnsResult](interfaces/AddColumnsResult.md)
- [AddColumnsSql](interfaces/AddColumnsSql.md)
- [AddDataOptions](interfaces/AddDataOptions.md)
- [AddResult](interfaces/AddResult.md)
- [AlterColumnsResult](interfaces/AlterColumnsResult.md)
- [ClientConfig](interfaces/ClientConfig.md)
- [ColumnAlteration](interfaces/ColumnAlteration.md)
- [CompactionStats](interfaces/CompactionStats.md)
- [ConnectionOptions](interfaces/ConnectionOptions.md)
- [CreateTableOptions](interfaces/CreateTableOptions.md)
- [DeleteResult](interfaces/DeleteResult.md)
- [DropColumnsResult](interfaces/DropColumnsResult.md)
- [ExecutableQuery](interfaces/ExecutableQuery.md)
- [FragmentStatistics](interfaces/FragmentStatistics.md)
- [FragmentSummaryStats](interfaces/FragmentSummaryStats.md)
@@ -54,7 +64,7 @@
- [IndexStatistics](interfaces/IndexStatistics.md)
- [IvfFlatOptions](interfaces/IvfFlatOptions.md)
- [IvfPqOptions](interfaces/IvfPqOptions.md)
- [MergeStats](interfaces/MergeStats.md)
- [MergeResult](interfaces/MergeResult.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md)
- [OptimizeStats](interfaces/OptimizeStats.md)
@@ -65,7 +75,9 @@
- [TableStatistics](interfaces/TableStatistics.md)
- [TimeoutConfig](interfaces/TimeoutConfig.md)
- [UpdateOptions](interfaces/UpdateOptions.md)
- [UpdateResult](interfaces/UpdateResult.md)
- [Version](interfaces/Version.md)
- [WriteExecutionOptions](interfaces/WriteExecutionOptions.md)
## Type Aliases
@@ -74,6 +86,7 @@
- [FieldLike](type-aliases/FieldLike.md)
- [IntoSql](type-aliases/IntoSql.md)
- [IntoVector](type-aliases/IntoVector.md)
- [MultiVector](type-aliases/MultiVector.md)
- [RecordBatchLike](type-aliases/RecordBatchLike.md)
- [SchemaLike](type-aliases/SchemaLike.md)
- [TableLike](type-aliases/TableLike.md)

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AddColumnsResult
# Interface: AddColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AddResult
# Interface: AddResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / AlterColumnsResult
# Interface: AlterColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -70,6 +70,17 @@ Defaults to 'us-east-1'.
***
### session?
```ts
optional session: Session;
```
(For LanceDB OSS only): the session to use for this connection. Holds
shared caches and other session-specific state.
***
### storageOptions?
```ts

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DeleteResult
# Interface: DeleteResult
## Properties
### version
```ts
version: number;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DropColumnsResult
# Interface: DropColumnsResult
## Properties
### version
```ts
version: number;
```

View File

@@ -23,7 +23,7 @@ whether to remove punctuation
### baseTokenizer?
```ts
optional baseTokenizer: "raw" | "simple" | "whitespace";
optional baseTokenizer: "raw" | "simple" | "whitespace" | "ngram";
```
The tokenizer to use when building the index.
@@ -71,6 +71,36 @@ tokens longer than this length will be ignored
***
### ngramMaxLength?
```ts
optional ngramMaxLength: number;
```
ngram max length
***
### ngramMinLength?
```ts
optional ngramMinLength: number;
```
ngram min length
***
### prefixOnly?
```ts
optional prefixOnly: boolean;
```
whether to only index the prefix of the token for ngram tokenizer
***
### removeStopWords?
```ts

View File

@@ -26,6 +26,18 @@ will be used to determine the most useful kind of index to create.
***
### name?
```ts
optional name: string;
```
Optional custom name for the index.
If not provided, a default name will be generated based on the column name.
***
### replace?
```ts
@@ -42,8 +54,27 @@ The default is true
***
### train?
```ts
optional train: boolean;
```
Whether to train the index with existing data.
If true (default), the index will be trained with existing data in the table.
If false, the index will be created empty and populated as new data is added.
Note: This option is only supported for scalar indices. Vector indices always train.
***
### waitTimeoutSeconds?
```ts
optional waitTimeoutSeconds: number;
```
Timeout in seconds to wait for index creation to complete.
If not specified, the method will return immediately after starting the index creation.

View File

@@ -0,0 +1,39 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MergeResult
# Interface: MergeResult
## Properties
### numDeletedRows
```ts
numDeletedRows: number;
```
***
### numInsertedRows
```ts
numInsertedRows: number;
```
***
### numUpdatedRows
```ts
numUpdatedRows: number;
```
***
### version
```ts
version: number;
```

View File

@@ -1,31 +0,0 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MergeStats
# Interface: MergeStats
## Properties
### numDeletedRows
```ts
numDeletedRows: bigint;
```
***
### numInsertedRows
```ts
numInsertedRows: bigint;
```
***
### numUpdatedRows
```ts
numUpdatedRows: bigint;
```

View File

@@ -8,7 +8,7 @@
## Properties
### indexCacheSize?
### ~~indexCacheSize?~~
```ts
optional indexCacheSize: number;
@@ -16,6 +16,11 @@ optional indexCacheSize: number;
Set the size of the index cache, specified as a number of entries
#### Deprecated
Use session-level cache configuration instead.
Create a Session with custom cache sizes and pass it to the connect() function.
The exact meaning of an "entry" will depend on the type of index:
- IVF: there is one entry for each IVF partition
- BTREE: there is one entry for the entire index

View File

@@ -24,10 +24,10 @@ The default is 7 days
// Delete all versions older than 1 day
const olderThan = new Date();
olderThan.setDate(olderThan.getDate() - 1));
tbl.cleanupOlderVersions(olderThan);
tbl.optimize({cleanupOlderThan: olderThan});
// Delete all versions except the current version
tbl.cleanupOlderVersions(new Date());
tbl.optimize({cleanupOlderThan: new Date()});
```
***

View File

@@ -44,3 +44,17 @@ optional readTimeout: number;
The timeout for reading data from the server in seconds. Default is 300
seconds (5 minutes). This can also be set via the environment variable
`LANCE_CLIENT_READ_TIMEOUT`, as an integer number of seconds.
***
### timeout?
```ts
optional timeout: number;
```
The overall timeout for the entire request in seconds. This includes
connection, send, and read time. If the entire request doesn't complete
within this time, it will fail. Default is None (no overall timeout).
This can also be set via the environment variable `LANCE_CLIENT_TIMEOUT`,
as an integer number of seconds.

View File

@@ -0,0 +1,23 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / UpdateResult
# Interface: UpdateResult
## Properties
### rowsUpdated
```ts
rowsUpdated: number;
```
***
### version
```ts
version: number;
```

View File

@@ -0,0 +1,26 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / WriteExecutionOptions
# Interface: WriteExecutionOptions
## Properties
### timeoutMs?
```ts
optional timeoutMs: number;
```
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including the first.

View File

@@ -0,0 +1,11 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MultiVector
# Type Alias: MultiVector
```ts
type MultiVector: IntoVector[];
```

View File

@@ -428,7 +428,7 @@
"\n",
"**Why?** \n",
"Embedding the UFO dataset and ingesting it into LanceDB takes **~2 hours on a T4 GPU**. To save time: \n",
"- **Use the pre-prepared table with index created ** (provided below) to proceed directly to step7: search. \n",
"- **Use the pre-prepared table with index created** (provided below) to proceed directly to **Step 7**: search. \n",
"- **Step 5a** contains the full ingestion code for reference (run it only if necessary). \n",
"- **Step 6** contains the details on creating the index on the multivector column"
]

View File

@@ -0,0 +1,53 @@
# Apache Datafusion
In Python, LanceDB tables can also be queried with [Apache Datafusion](https://datafusion.apache.org/), an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. This means you can write complex SQL queries to analyze your data in LanceDB.
This integration is done via [Datafusion FFI](https://docs.rs/datafusion-ffi/latest/datafusion_ffi/), which provides a native integration between LanceDB and Datafusion.
The Datafusion FFI allows to pass down column selections and basic filters to LanceDB, reducing the amount of scanned data when executing your query. Additionally, the integration allows streaming data from LanceDB tables which allows to do aggregation larger-than-memory.
We can demonstrate this by first installing `datafusion` and `lancedb`.
```shell
pip install datafusion lancedb
```
We will re-use the dataset [created previously](./pandas_and_pyarrow.md):
```python
import lancedb
from datafusion import SessionContext
from lance import FFILanceTableProvider
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
lance_table = db.create_table("lance_table", data)
ctx = SessionContext()
ffi_lance_table = FFILanceTableProvider(
lance_table.to_lance(), with_row_id=True, with_row_addr=True
)
ctx.register_table_provider("ffi_lance_table", ffi_lance_table)
```
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to Datafusion through the Datafusion FFI integration layer.
To query the resulting Lance dataset in Datafusion, you first need to register the dataset with Datafusion and then just reference it by the same name in your SQL query.
```python
ctx.table("ffi_lance_table")
ctx.sql("SELECT * FROM ffi_lance_table")
```
```
┌─────────────┬─────────┬────────┬─────────────────┬─────────────────┐
│ vector │ item │ price │ _rowid │ _rowaddr │
│ float[] │ varchar │ double │ bigint unsigned │ bigint unsigned │
├─────────────┼─────────┼────────┼─────────────────┼─────────────────┤
│ [3.1, 4.1] │ foo │ 10.0 │ 0 │ 0 │
│ [5.9, 26.5] │ bar │ 20.0 │ 1 │ 1 │
└─────────────┴─────────┴────────┴─────────────────┴─────────────────┘
```

View File

@@ -1,101 +0,0 @@
# Getting Started with LanceDB: A Minimal Vector Search Tutorial
Let's set up a LanceDB database, insert vector data, and perform a simple vector search. We'll use simple character classes like "knight" and "rogue" to illustrate semantic relevance.
## 1. Install Dependencies
Before starting, make sure you have the necessary packages:
```bash
pip install lancedb pandas numpy
```
## 2. Import Required Libraries
```python
import lancedb
import pandas as pd
import numpy as np
```
## 3. Connect to LanceDB
You can use a local directory to store your database:
```python
db = lancedb.connect("./lancedb")
```
## 4. Create Sample Data
Add sample text data and corresponding 4D vectors:
```python
data = pd.DataFrame([
{"id": "1", "vector": [1.0, 0.0, 0.0, 0.0], "text": "knight"},
{"id": "2", "vector": [0.9, 0.1, 0.0, 0.0], "text": "warrior"},
{"id": "3", "vector": [0.0, 1.0, 0.0, 0.0], "text": "rogue"},
{"id": "4", "vector": [0.0, 0.9, 0.1, 0.0], "text": "thief"},
{"id": "5", "vector": [0.5, 0.5, 0.0, 0.0], "text": "ranger"},
])
```
## 5. Create a Table in LanceDB
```python
table = db.create_table("rpg_classes", data=data, mode="overwrite")
```
Let's see how the table looks:
```python
print(data)
```
| id | vector | text |
|----|--------|------|
| 1 | [1.0, 0.0, 0.0, 0.0] | knight |
| 2 | [0.9, 0.1, 0.0, 0.0] | warrior |
| 3 | [0.0, 1.0, 0.0, 0.0] | rogue |
| 4 | [0.0, 0.9, 0.1, 0.0] | thief |
| 5 | [0.5, 0.5, 0.0, 0.0] | ranger |
## 6. Perform a Vector Search
Search for the most similar character classes to our query vector:
```python
# Query as if we are searching for "rogue"
results = table.search([0.95, 0.05, 0.0, 0.0]).limit(3).to_df()
print(results)
```
This will return the top 3 closest classes to the vector, effectively showing how LanceDB can be used for semantic search.
| id | vector | text | _distance |
|------|------------------------|----------|-----------|
| 3 | [0.0, 1.0, 0.0, 0.0] | rogue | 0.00 |
| 4 | [0.0, 0.9, 0.1, 0.0] | thief | 0.02 |
| 5 | [0.5, 0.5, 0.0, 0.0] | ranger | 0.50 |
Let's try searching for "knight"
```python
query_vector = [1.0, 0.0, 0.0, 0.0]
results = table.search(query_vector).limit(3).to_pandas()
print(results)
```
| id | vector | text | _distance |
|------|------------------------|----------|-----------|
| 1 | [1.0, 0.0, 0.0, 0.0] | knight | 0.00 |
| 2 | [0.9, 0.1, 0.0, 0.0] | warrior | 0.02 |
| 5 | [0.5, 0.5, 0.0, 0.0] | ranger | 0.50 |
## Next Steps
That's it - you just conducted vector search!
For more beginner tips, check out the [Basic Usage](basic.md) guide.

View File

@@ -30,7 +30,8 @@ excluded_globs = [
"../src/rag/advanced_techniques/*.md",
"../src/guides/scalar_index.md",
"../src/guides/storage.md",
"../src/search.md"
"../src/search.md",
"../src/guides/sql_querying.md",
]
python_prefix = "py"

View File

@@ -7,3 +7,4 @@ tantivy==0.20.1
--extra-index-url https://download.pytorch.org/whl/cpu
torch
polars>=0.19, <=1.3.0
datafusion

View File

@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
wrapperVersion=3.3.2
distributionType=only-script
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.9/apache-maven-3.9.9-bin.zip

37
java/README.md Normal file
View File

@@ -0,0 +1,37 @@
# LanceDB Java SDK
## Configuration and Initialization
### LanceDB Cloud
For LanceDB Cloud, use the simplified builder API:
```java
import com.lancedb.lance.namespace.LanceRestNamespace;
// If your DB url is db://example-db, then your database here is example-db
LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
.apiKey("your_lancedb_cloud_api_key")
.database("your_database_name")
.build();
```
### LanceDB Enterprise
For Enterprise deployments, use your VPC endpoint:
```java
LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
.apiKey("your_lancedb_enterprise_api_key")
.database("your-top-dir") // Your top level folder under your cloud bucket, e.g. s3://your-bucket/your-top-dir/
.hostOverride("http://<vpc_endpoint_dns_name>:80")
.build();
```
## Development
Build:
```shell
./mvnw install
```

View File

@@ -15,13 +15,16 @@ publish = false
crate-type = ["cdylib"]
[dependencies]
lancedb = { path = "../../../rust/lancedb" }
lancedb = { path = "../../../rust/lancedb", default-features = false }
lance = { workspace = true }
arrow = { workspace = true, features = ["ffi"] }
arrow-schema.workspace = true
tokio = "1.23"
tokio = "1.46"
jni = "0.21.1"
snafu.workspace = true
lazy_static.workspace = true
serde = { version = "^1" }
serde_json = { version = "1" }
[features]
default = ["lancedb/default"]

View File

@@ -8,18 +8,24 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.1</version>
<version>0.22.0-beta.1</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>LanceDB Core</name>
<name>${project.artifactId}</name>
<description>LanceDB Core</description>
<packaging>jar</packaging>
<properties>
<rust.release.build>false</rust.release.build>
</properties>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
<version>0.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>

View File

@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.22.0-beta.1</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-lance-namespace</artifactId>
<name>${project.artifactId}</name>
<description>LanceDB Java Integration with Lance Namespace</description>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
</dependency>
</dependencies>
</project>

View File

@@ -0,0 +1,146 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import com.lancedb.lance.namespace.LanceRestNamespace;
import com.lancedb.lance.namespace.client.apache.ApiClient;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
/** Util class to help construct a {@link LanceRestNamespace} for LanceDB. */
public class LanceDbRestNamespaces {
private static final String DEFAULT_REGION = "us-east-1";
private static final String CLOUD_URL_PATTERN = "https://%s.%s.api.lancedb.com";
private String apiKey;
private String database;
private Optional<String> hostOverride = Optional.empty();
private Optional<String> region = Optional.empty();
private Map<String, String> additionalConfig = new HashMap<>();
private LanceDbRestNamespaces() {}
/**
* Create a new builder instance.
*
* @return A new LanceRestNamespaceBuilder
*/
public static LanceDbRestNamespaces builder() {
return new LanceDbRestNamespaces();
}
/**
* Set the API key (required).
*
* @param apiKey The LanceDB API key
* @return This builder
*/
public LanceDbRestNamespaces apiKey(String apiKey) {
if (apiKey == null || apiKey.trim().isEmpty()) {
throw new IllegalArgumentException("API key cannot be null or empty");
}
this.apiKey = apiKey;
return this;
}
/**
* Set the database name (required).
*
* @param database The database name
* @return This builder
*/
public LanceDbRestNamespaces database(String database) {
if (database == null || database.trim().isEmpty()) {
throw new IllegalArgumentException("Database cannot be null or empty");
}
this.database = database;
return this;
}
/**
* Set a custom host override (optional). When set, this overrides the default LanceDB Cloud URL
* construction. Use this for LanceDB Enterprise deployments.
*
* @param hostOverride The complete base URL (e.g., "http://your-vpc-endpoint:80")
* @return This builder
*/
public LanceDbRestNamespaces hostOverride(String hostOverride) {
this.hostOverride = Optional.ofNullable(hostOverride);
return this;
}
/**
* Set the region for LanceDB Cloud (optional). Defaults to "us-east-1" if not specified. This is
* ignored when hostOverride is set.
*
* @param region The AWS region (e.g., "us-east-1", "eu-west-1")
* @return This builder
*/
public LanceDbRestNamespaces region(String region) {
this.region = Optional.ofNullable(region);
return this;
}
/**
* Add additional configuration parameters.
*
* @param key The configuration key
* @param value The configuration value
* @return This builder
*/
public LanceDbRestNamespaces config(String key, String value) {
this.additionalConfig.put(key, value);
return this;
}
/**
* Build the LanceRestNamespace instance.
*
* @return A configured LanceRestNamespace
* @throws IllegalStateException if required parameters are missing
*/
public LanceRestNamespace build() {
// Validate required fields
if (apiKey == null) {
throw new IllegalStateException("API key is required");
}
if (database == null) {
throw new IllegalStateException("Database is required");
}
// Build configuration map
Map<String, String> config = new HashMap<>(additionalConfig);
config.put("headers.x-lancedb-database", database);
config.put("headers.x-api-key", apiKey);
// Determine base URL
String baseUrl;
if (hostOverride.isPresent()) {
baseUrl = hostOverride.get();
config.put("host_override", hostOverride.get());
} else {
String effectiveRegion = region.orElse(DEFAULT_REGION);
baseUrl = String.format(CLOUD_URL_PATTERN, database, effectiveRegion);
config.put("region", effectiveRegion);
}
// Create and configure ApiClient
ApiClient apiClient = new ApiClient();
apiClient.setBasePath(baseUrl);
return new LanceRestNamespace(apiClient, config);
}
}

259
java/mvnw vendored Executable file
View File

@@ -0,0 +1,259 @@
#!/bin/sh
# ----------------------------------------------------------------------------
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# ----------------------------------------------------------------------------
# ----------------------------------------------------------------------------
# Apache Maven Wrapper startup batch script, version 3.3.2
#
# Optional ENV vars
# -----------------
# JAVA_HOME - location of a JDK home dir, required when download maven via java source
# MVNW_REPOURL - repo url base for downloading maven distribution
# MVNW_USERNAME/MVNW_PASSWORD - user and password for downloading maven
# MVNW_VERBOSE - true: enable verbose log; debug: trace the mvnw script; others: silence the output
# ----------------------------------------------------------------------------
set -euf
[ "${MVNW_VERBOSE-}" != debug ] || set -x
# OS specific support.
native_path() { printf %s\\n "$1"; }
case "$(uname)" in
CYGWIN* | MINGW*)
[ -z "${JAVA_HOME-}" ] || JAVA_HOME="$(cygpath --unix "$JAVA_HOME")"
native_path() { cygpath --path --windows "$1"; }
;;
esac
# set JAVACMD and JAVACCMD
set_java_home() {
# For Cygwin and MinGW, ensure paths are in Unix format before anything is touched
if [ -n "${JAVA_HOME-}" ]; then
if [ -x "$JAVA_HOME/jre/sh/java" ]; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
JAVACCMD="$JAVA_HOME/jre/sh/javac"
else
JAVACMD="$JAVA_HOME/bin/java"
JAVACCMD="$JAVA_HOME/bin/javac"
if [ ! -x "$JAVACMD" ] || [ ! -x "$JAVACCMD" ]; then
echo "The JAVA_HOME environment variable is not defined correctly, so mvnw cannot run." >&2
echo "JAVA_HOME is set to \"$JAVA_HOME\", but \"\$JAVA_HOME/bin/java\" or \"\$JAVA_HOME/bin/javac\" does not exist." >&2
return 1
fi
fi
else
JAVACMD="$(
'set' +e
'unset' -f command 2>/dev/null
'command' -v java
)" || :
JAVACCMD="$(
'set' +e
'unset' -f command 2>/dev/null
'command' -v javac
)" || :
if [ ! -x "${JAVACMD-}" ] || [ ! -x "${JAVACCMD-}" ]; then
echo "The java/javac command does not exist in PATH nor is JAVA_HOME set, so mvnw cannot run." >&2
return 1
fi
fi
}
# hash string like Java String::hashCode
hash_string() {
str="${1:-}" h=0
while [ -n "$str" ]; do
char="${str%"${str#?}"}"
h=$(((h * 31 + $(LC_CTYPE=C printf %d "'$char")) % 4294967296))
str="${str#?}"
done
printf %x\\n $h
}
verbose() { :; }
[ "${MVNW_VERBOSE-}" != true ] || verbose() { printf %s\\n "${1-}"; }
die() {
printf %s\\n "$1" >&2
exit 1
}
trim() {
# MWRAPPER-139:
# Trims trailing and leading whitespace, carriage returns, tabs, and linefeeds.
# Needed for removing poorly interpreted newline sequences when running in more
# exotic environments such as mingw bash on Windows.
printf "%s" "${1}" | tr -d '[:space:]'
}
# parse distributionUrl and optional distributionSha256Sum, requires .mvn/wrapper/maven-wrapper.properties
while IFS="=" read -r key value; do
case "${key-}" in
distributionUrl) distributionUrl=$(trim "${value-}") ;;
distributionSha256Sum) distributionSha256Sum=$(trim "${value-}") ;;
esac
done <"${0%/*}/.mvn/wrapper/maven-wrapper.properties"
[ -n "${distributionUrl-}" ] || die "cannot read distributionUrl property in ${0%/*}/.mvn/wrapper/maven-wrapper.properties"
case "${distributionUrl##*/}" in
maven-mvnd-*bin.*)
MVN_CMD=mvnd.sh _MVNW_REPO_PATTERN=/maven/mvnd/
case "${PROCESSOR_ARCHITECTURE-}${PROCESSOR_ARCHITEW6432-}:$(uname -a)" in
*AMD64:CYGWIN* | *AMD64:MINGW*) distributionPlatform=windows-amd64 ;;
:Darwin*x86_64) distributionPlatform=darwin-amd64 ;;
:Darwin*arm64) distributionPlatform=darwin-aarch64 ;;
:Linux*x86_64*) distributionPlatform=linux-amd64 ;;
*)
echo "Cannot detect native platform for mvnd on $(uname)-$(uname -m), use pure java version" >&2
distributionPlatform=linux-amd64
;;
esac
distributionUrl="${distributionUrl%-bin.*}-$distributionPlatform.zip"
;;
maven-mvnd-*) MVN_CMD=mvnd.sh _MVNW_REPO_PATTERN=/maven/mvnd/ ;;
*) MVN_CMD="mvn${0##*/mvnw}" _MVNW_REPO_PATTERN=/org/apache/maven/ ;;
esac
# apply MVNW_REPOURL and calculate MAVEN_HOME
# maven home pattern: ~/.m2/wrapper/dists/{apache-maven-<version>,maven-mvnd-<version>-<platform>}/<hash>
[ -z "${MVNW_REPOURL-}" ] || distributionUrl="$MVNW_REPOURL$_MVNW_REPO_PATTERN${distributionUrl#*"$_MVNW_REPO_PATTERN"}"
distributionUrlName="${distributionUrl##*/}"
distributionUrlNameMain="${distributionUrlName%.*}"
distributionUrlNameMain="${distributionUrlNameMain%-bin}"
MAVEN_USER_HOME="${MAVEN_USER_HOME:-${HOME}/.m2}"
MAVEN_HOME="${MAVEN_USER_HOME}/wrapper/dists/${distributionUrlNameMain-}/$(hash_string "$distributionUrl")"
exec_maven() {
unset MVNW_VERBOSE MVNW_USERNAME MVNW_PASSWORD MVNW_REPOURL || :
exec "$MAVEN_HOME/bin/$MVN_CMD" "$@" || die "cannot exec $MAVEN_HOME/bin/$MVN_CMD"
}
if [ -d "$MAVEN_HOME" ]; then
verbose "found existing MAVEN_HOME at $MAVEN_HOME"
exec_maven "$@"
fi
case "${distributionUrl-}" in
*?-bin.zip | *?maven-mvnd-?*-?*.zip) ;;
*) die "distributionUrl is not valid, must match *-bin.zip or maven-mvnd-*.zip, but found '${distributionUrl-}'" ;;
esac
# prepare tmp dir
if TMP_DOWNLOAD_DIR="$(mktemp -d)" && [ -d "$TMP_DOWNLOAD_DIR" ]; then
clean() { rm -rf -- "$TMP_DOWNLOAD_DIR"; }
trap clean HUP INT TERM EXIT
else
die "cannot create temp dir"
fi
mkdir -p -- "${MAVEN_HOME%/*}"
# Download and Install Apache Maven
verbose "Couldn't find MAVEN_HOME, downloading and installing it ..."
verbose "Downloading from: $distributionUrl"
verbose "Downloading to: $TMP_DOWNLOAD_DIR/$distributionUrlName"
# select .zip or .tar.gz
if ! command -v unzip >/dev/null; then
distributionUrl="${distributionUrl%.zip}.tar.gz"
distributionUrlName="${distributionUrl##*/}"
fi
# verbose opt
__MVNW_QUIET_WGET=--quiet __MVNW_QUIET_CURL=--silent __MVNW_QUIET_UNZIP=-q __MVNW_QUIET_TAR=''
[ "${MVNW_VERBOSE-}" != true ] || __MVNW_QUIET_WGET='' __MVNW_QUIET_CURL='' __MVNW_QUIET_UNZIP='' __MVNW_QUIET_TAR=v
# normalize http auth
case "${MVNW_PASSWORD:+has-password}" in
'') MVNW_USERNAME='' MVNW_PASSWORD='' ;;
has-password) [ -n "${MVNW_USERNAME-}" ] || MVNW_USERNAME='' MVNW_PASSWORD='' ;;
esac
if [ -z "${MVNW_USERNAME-}" ] && command -v wget >/dev/null; then
verbose "Found wget ... using wget"
wget ${__MVNW_QUIET_WGET:+"$__MVNW_QUIET_WGET"} "$distributionUrl" -O "$TMP_DOWNLOAD_DIR/$distributionUrlName" || die "wget: Failed to fetch $distributionUrl"
elif [ -z "${MVNW_USERNAME-}" ] && command -v curl >/dev/null; then
verbose "Found curl ... using curl"
curl ${__MVNW_QUIET_CURL:+"$__MVNW_QUIET_CURL"} -f -L -o "$TMP_DOWNLOAD_DIR/$distributionUrlName" "$distributionUrl" || die "curl: Failed to fetch $distributionUrl"
elif set_java_home; then
verbose "Falling back to use Java to download"
javaSource="$TMP_DOWNLOAD_DIR/Downloader.java"
targetZip="$TMP_DOWNLOAD_DIR/$distributionUrlName"
cat >"$javaSource" <<-END
public class Downloader extends java.net.Authenticator
{
protected java.net.PasswordAuthentication getPasswordAuthentication()
{
return new java.net.PasswordAuthentication( System.getenv( "MVNW_USERNAME" ), System.getenv( "MVNW_PASSWORD" ).toCharArray() );
}
public static void main( String[] args ) throws Exception
{
setDefault( new Downloader() );
java.nio.file.Files.copy( java.net.URI.create( args[0] ).toURL().openStream(), java.nio.file.Paths.get( args[1] ).toAbsolutePath().normalize() );
}
}
END
# For Cygwin/MinGW, switch paths to Windows format before running javac and java
verbose " - Compiling Downloader.java ..."
"$(native_path "$JAVACCMD")" "$(native_path "$javaSource")" || die "Failed to compile Downloader.java"
verbose " - Running Downloader.java ..."
"$(native_path "$JAVACMD")" -cp "$(native_path "$TMP_DOWNLOAD_DIR")" Downloader "$distributionUrl" "$(native_path "$targetZip")"
fi
# If specified, validate the SHA-256 sum of the Maven distribution zip file
if [ -n "${distributionSha256Sum-}" ]; then
distributionSha256Result=false
if [ "$MVN_CMD" = mvnd.sh ]; then
echo "Checksum validation is not supported for maven-mvnd." >&2
echo "Please disable validation by removing 'distributionSha256Sum' from your maven-wrapper.properties." >&2
exit 1
elif command -v sha256sum >/dev/null; then
if echo "$distributionSha256Sum $TMP_DOWNLOAD_DIR/$distributionUrlName" | sha256sum -c >/dev/null 2>&1; then
distributionSha256Result=true
fi
elif command -v shasum >/dev/null; then
if echo "$distributionSha256Sum $TMP_DOWNLOAD_DIR/$distributionUrlName" | shasum -a 256 -c >/dev/null 2>&1; then
distributionSha256Result=true
fi
else
echo "Checksum validation was requested but neither 'sha256sum' or 'shasum' are available." >&2
echo "Please install either command, or disable validation by removing 'distributionSha256Sum' from your maven-wrapper.properties." >&2
exit 1
fi
if [ $distributionSha256Result = false ]; then
echo "Error: Failed to validate Maven distribution SHA-256, your Maven distribution might be compromised." >&2
echo "If you updated your Maven version, you need to update the specified distributionSha256Sum property." >&2
exit 1
fi
fi
# unzip and move
if command -v unzip >/dev/null; then
unzip ${__MVNW_QUIET_UNZIP:+"$__MVNW_QUIET_UNZIP"} "$TMP_DOWNLOAD_DIR/$distributionUrlName" -d "$TMP_DOWNLOAD_DIR" || die "failed to unzip"
else
tar xzf${__MVNW_QUIET_TAR:+"$__MVNW_QUIET_TAR"} "$TMP_DOWNLOAD_DIR/$distributionUrlName" -C "$TMP_DOWNLOAD_DIR" || die "failed to untar"
fi
printf %s\\n "$distributionUrl" >"$TMP_DOWNLOAD_DIR/$distributionUrlNameMain/mvnw.url"
mv -- "$TMP_DOWNLOAD_DIR/$distributionUrlNameMain" "$MAVEN_HOME" || [ -d "$MAVEN_HOME" ] || die "fail to move MAVEN_HOME"
clean || :
exec_maven "$@"

View File

@@ -6,11 +6,10 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.1</version>
<version>0.22.0-beta.1</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>
<description>LanceDB vector database Java API</description>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
<url>http://lancedb.com/</url>
<developers>
@@ -29,6 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-namespace.verison>0.0.1</lance-namespace.verison>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
@@ -52,6 +52,7 @@
<modules>
<module>core</module>
<module>lance-namespace</module>
</modules>
<scm>
@@ -62,6 +63,11 @@
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
<version>${lance-namespace.verison}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>

View File

@@ -1,22 +0,0 @@
module.exports = {
env: {
browser: true,
es2021: true
},
extends: 'standard-with-typescript',
overrides: [
],
parserOptions: {
project: './tsconfig.json',
ecmaVersion: 'latest',
sourceType: 'module'
},
rules: {
"@typescript-eslint/method-signature-style": "off",
"@typescript-eslint/quotes": "off",
"@typescript-eslint/semi": "off",
"@typescript-eslint/explicit-function-return-type": "off",
"@typescript-eslint/space-before-function-paren": "off",
"@typescript-eslint/indent": "off",
}
}

View File

@@ -1,4 +0,0 @@
gen_test_data.py
index.node
dist/lancedb*.tgz
vectordb*.tgz

View File

@@ -1,64 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.1.5] - 2023-06-00
### Added
- Support for macOS X86
## [0.1.4] - 2023-06-03
### Added
- Select / Project query API
### Changed
- Deprecated created_index in favor of createIndex
## [0.1.3] - 2023-06-01
### Added
- Support S3 and Google Cloud Storage
- Embedding functions support
- OpenAI embedding function
## [0.1.2] - 2023-05-27
### Added
- Append records API
- Extra query params to to nodejs client
- Create_index API
### Fixed
- bugfix: string columns should be converted to Utf8Array (#94)
## [0.1.1] - 2023-05-16
### Added
- create_table API
- limit parameter for queries
- Typescript / JavaScript examples
- Linux support
## [0.1.0] - 2023-05-16
### Added
- Initial JavaScript / Node.js library for LanceDB
- Read-only api to query LanceDB datasets
- Supports macOS arm only
## [pre-0.1.0]
- Various prototypes / test builds

View File

@@ -1,66 +0,0 @@
# LanceDB
A JavaScript / Node.js library for [LanceDB](https://github.com/lancedb/lancedb).
**DEPRECATED: This library is deprecated. Please use the new client,
[@lancedb/lancedb](https://www.npmjs.com/package/@lancedb/lancedb).**
## Installation
```bash
npm install vectordb
```
This will download the appropriate native library for your platform. We currently
support:
* Linux (x86_64 and aarch64)
* MacOS (Intel and ARM/M1/M2)
* Windows (x86_64 only)
We do not yet support musl-based Linux (such as Alpine Linux) or aarch64 Windows.
## Usage
### Basic Example
```javascript
const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');
const table = await db.createTable("my_table",
[{ id: 1, vector: [0.1, 1.0], item: "foo", price: 10.0 },
{ id: 2, vector: [3.9, 0.5], item: "bar", price: 20.0 }])
const results = await table.search([0.1, 0.3]).limit(20).execute();
console.log(results);
```
The [examples](./examples) folder contains complete examples.
## Development
To build everything fresh:
```bash
npm install
npm run build
```
Then you should be able to run the tests with:
```bash
npm test
```
### Fix lints
To run the linter and have it automatically fix all errors
```bash
npm run lint -- --fix
```
To build documentation
```bash
npx typedoc --plugin typedoc-plugin-markdown --out ../docs/src/javascript src/index.ts
```

View File

@@ -1,41 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
'use strict'
async function example () {
const lancedb = require('vectordb')
// You need to provide an OpenAI API key, here we read it from the OPENAI_API_KEY environment variable
const apiKey = process.env.OPENAI_API_KEY
// The embedding function will create embeddings for the 'text' column(text in this case)
const embedding = new lancedb.OpenAIEmbeddingFunction('text', apiKey)
const db = await lancedb.connect('data/sample-lancedb')
const data = [
{ id: 1, text: 'Black T-Shirt', price: 10 },
{ id: 2, text: 'Leather Jacket', price: 50 }
]
const table = await db.createTable('vectors', data, embedding)
console.log(await db.tableNames())
const results = await table
.search('keeps me warm')
.limit(1)
.execute()
console.log(results[0].text)
}
example().then(_ => { console.log('All done!') })

View File

@@ -1,15 +0,0 @@
{
"name": "vectordb-example-js-openai",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"dependencies": {
"vectordb": "file:../..",
"openai": "^3.2.1"
}
}

View File

@@ -1,66 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
'use strict'
async function example() {
const lancedb = require('vectordb')
// Import transformers and the all-MiniLM-L6-v2 model (https://huggingface.co/Xenova/all-MiniLM-L6-v2)
const { pipeline } = await import('@xenova/transformers')
const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
// Create embedding function from pipeline which returns a list of vectors from batch
// sourceColumn is the name of the column in the data to be embedded
//
// Output of pipe is a Tensor { data: Float32Array(384) }, so filter for the vector
const embed_fun = {}
embed_fun.sourceColumn = 'text'
embed_fun.embed = async function (batch) {
let result = []
for (let text of batch) {
const res = await pipe(text, { pooling: 'mean', normalize: true })
result.push(Array.from(res['data']))
}
return (result)
}
// Link a folder and create a table with data
const db = await lancedb.connect('data/sample-lancedb')
const data = [
{ id: 1, text: 'Cherry', type: 'fruit' },
{ id: 2, text: 'Carrot', type: 'vegetable' },
{ id: 3, text: 'Potato', type: 'vegetable' },
{ id: 4, text: 'Apple', type: 'fruit' },
{ id: 5, text: 'Banana', type: 'fruit' }
]
const table = await db.createTable('food_table', data, embed_fun)
// Query the table
const results = await table
.search("a sweet fruit to eat")
.metricType("cosine")
.limit(2)
.execute()
console.log(results.map(r => r.text))
}
example().then(_ => { console.log("Done!") })

View File

@@ -1,16 +0,0 @@
{
"name": "vectordb-example-js-transformers",
"version": "1.0.0",
"description": "Example for using transformers.js with lancedb",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"dependencies": {
"@xenova/transformers": "^2.4.1",
"vectordb": "file:../.."
}
}

View File

@@ -1,122 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
'use strict'
const lancedb = require('vectordb')
const fs = require('fs/promises')
const readline = require('readline/promises')
const { stdin: input, stdout: output } = require('process')
const { Configuration, OpenAIApi } = require('openai')
// Download file from XYZ
const INPUT_FILE_NAME = 'data/youtube-transcriptions_sample.jsonl';
(async () => {
// You need to provide an OpenAI API key, here we read it from the OPENAI_API_KEY environment variable
const apiKey = process.env.OPENAI_API_KEY
// The embedding function will create embeddings for the 'context' column
const embedFunction = new lancedb.OpenAIEmbeddingFunction('context', apiKey)
// Connects to LanceDB
const db = await lancedb.connect('data/youtube-lancedb')
// Open the vectors table or create one if it does not exist
let tbl
if ((await db.tableNames()).includes('vectors')) {
tbl = await db.openTable('vectors', embedFunction)
} else {
tbl = await createEmbeddingsTable(db, embedFunction)
}
// Use OpenAI Completion API to generate and answer based on the context that LanceDB provides
const configuration = new Configuration({ apiKey })
const openai = new OpenAIApi(configuration)
const rl = readline.createInterface({ input, output })
try {
while (true) {
const query = await rl.question('Prompt: ')
const results = await tbl
.search(query)
.select(['title', 'text', 'context'])
.limit(3)
.execute()
// console.table(results)
const response = await openai.createCompletion({
model: 'text-davinci-003',
prompt: createPrompt(query, results),
max_tokens: 400,
temperature: 0,
top_p: 1,
frequency_penalty: 0,
presence_penalty: 0
})
console.log(response.data.choices[0].text)
}
} catch (err) {
console.log('Error: ', err)
} finally {
rl.close()
}
process.exit(1)
})()
async function createEmbeddingsTable (db, embedFunction) {
console.log(`Creating embeddings from ${INPUT_FILE_NAME}`)
// read the input file into a JSON array, skipping empty lines
const lines = (await fs.readFile(INPUT_FILE_NAME, 'utf-8'))
.toString()
.split('\n')
.filter(line => line.length > 0)
.map(line => JSON.parse(line))
const data = contextualize(lines, 20, 'video_id')
return await db.createTable('vectors', data, embedFunction)
}
// Each transcript has a small text column, we include previous transcripts in order to
// have more context information when creating embeddings
function contextualize (rows, contextSize, groupColumn) {
const grouped = []
rows.forEach(row => {
if (!grouped[row[groupColumn]]) {
grouped[row[groupColumn]] = []
}
grouped[row[groupColumn]].push(row)
})
const data = []
Object.keys(grouped).forEach(key => {
for (let i = 0; i < grouped[key].length; i++) {
const start = i - contextSize > 0 ? i - contextSize : 0
grouped[key][i].context = grouped[key].slice(start, i + 1).map(r => r.text).join(' ')
}
data.push(...grouped[key])
})
return data
}
// Creates a prompt by aggregating all relevant contexts
function createPrompt (query, context) {
let prompt =
'Answer the question based on the context below.\n\n' +
'Context:\n'
// need to make sure our prompt is not larger than max size
prompt = prompt + context.map(c => c.context).join('\n\n---\n\n').substring(0, 3750)
prompt = prompt + `\n\nQuestion: ${query}\nAnswer:`
return prompt
}

View File

@@ -1,15 +0,0 @@
{
"name": "vectordb-example-js-openai",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"dependencies": {
"vectordb": "file:../..",
"openai": "^3.2.1"
}
}

View File

@@ -1,36 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
'use strict'
async function example () {
const lancedb = require('vectordb')
const db = await lancedb.connect('data/sample-lancedb')
const data = [
{ id: 1, vector: [0.1, 0.2], price: 10 },
{ id: 2, vector: [1.1, 1.2], price: 50 }
]
const table = await db.createTable('vectors', data)
console.log(await db.tableNames())
const results = await table
.search([0.1, 0.3])
.limit(20)
.execute()
console.log(results)
}
example()

View File

@@ -1,14 +0,0 @@
{
"name": "vectordb-example-js",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"dependencies": {
"vectordb": "file:../.."
}
}

View File

@@ -1,22 +0,0 @@
{
"name": "vectordb-example-ts",
"version": "1.0.0",
"description": "",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"scripts": {
"tsc": "tsc -b",
"build": "tsc"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"devDependencies": {
"@types/node": "^18.16.2",
"ts-node": "^10.9.1",
"ts-node-dev": "^2.0.0",
"typescript": "*"
},
"dependencies": {
"vectordb": "file:../.."
}
}

View File

@@ -1,35 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
import * as vectordb from 'vectordb';
async function example () {
const db = await vectordb.connect('data/sample-lancedb')
const data = [
{ id: 1, vector: [0.1, 0.2], price: 10 },
{ id: 2, vector: [1.1, 1.2], price: 50 }
]
const table = await db.createTable('vectors', data)
console.log(await db.tableNames())
const results = await table
.search([0.1, 0.3])
.limit(20)
.execute()
console.log(results)
}
example().then(_ => { console.log ("All done!") })

View File

@@ -1,10 +0,0 @@
{
"include": ["src/**/*.ts"],
"compilerOptions": {
"target": "es2016",
"module": "commonjs",
"declaration": true,
"outDir": "./dist",
"strict": true
}
}

View File

@@ -1,36 +0,0 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
const { currentTarget } = require('@neon-rs/load')
let nativeLib
try {
// When developing locally, give preference to the local built library
nativeLib = require('./index.node')
} catch {
try {
nativeLib = require(`@lancedb/vectordb-${currentTarget()}`)
} catch (e) {
throw new Error(`vectordb: failed to load native library.
You may need to run \`npm install @lancedb/vectordb-${currentTarget()}\`.
If that does not work, please file a bug report at https://github.com/lancedb/lancedb/issues
Source error: ${e}`)
}
}
// Dynamic require for runtime.
module.exports = nativeLib

5239
node/package-lock.json generated

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More