Commit Graph

430 Commits

Author SHA1 Message Date
Will Jones
b3a4efd587 fix: revert change default read_consistency_interval=5s (#2327)
This reverts commit a547c523c2 or #2281

The current implementation can cause panics and performance degradation.
I will bring this back with more testing in
https://github.com/lancedb/lancedb/pull/2311

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Enhanced clarity on read consistency settings with updated
descriptions and default behavior.
- Removed outdated warnings about eventual consistency from the
troubleshooting guide.

- **Refactor**
- Streamlined the handling of the read consistency interval across
integrations, now defaulting to "None" for improved performance.
  - Simplified internal logic to offer a more consistent experience.

- **Tests**
- Updated test expectations to reflect the new default representation
for the read consistency interval.
- Removed redundant tests related to "no consistency" settings for
streamlined testing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-04-14 08:48:15 -07:00
Lei Xu
080ea2f9a4 chore: fix 1.86 warnings (#2312)
Fix rust 1.86 warnings
2025-04-12 08:29:10 -05:00
Lance Release
56aa133ee6 Bump version: 0.19.0-beta.5 → 0.19.0-beta.6 2025-04-08 06:16:30 +00:00
BubbleCal
ec8271931f feat: support to create FTS index on list of strings (#2317)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated internal library dependencies to the latest beta version for
improved system stability.
- **Tests**
- Added automated tests to validate full-text search functionality on
list-based text fields.
- **Refactor**
- Enhanced the search processing logic to provide robust support for
list-type text data, ensuring more reliable results.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-08 14:12:35 +08:00
Lance Release
c298482ee1 Bump version: 0.19.0-beta.4 → 0.19.0-beta.5 2025-04-04 21:49:53 +00:00
Will Jones
657843d9e9 perf: remove redundant checkout latest (#2310)
This bug was introduced in https://github.com/lancedb/lancedb/pull/2281

Likely introduced during a rebase when fixing merge conflicts.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Updated the refresh process so that reloading now uses the existing
dataset version instead of automatically updating to the latest version.
This change may affect workflows that rely on immediate data updates
during refresh.
  
- **New Features**
- Introduced a new module for tracking I/O statistics in object store
operations, enhancing monitoring capabilities.
- Added a new test module to validate the functionality of the dataset
operations.

- **Bug Fixes**
- Reintroduced the `write_options` method in the `CreateTableBuilder`,
ensuring consistent functionality across different builder variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:56:02 -07:00
Will Jones
1cd76b8498 feat: add timeout to query execution options (#2288)
Closes #2287


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added configurable timeout support for query executions. Users can now
specify maximum wait times for queries, enhancing control over
long-running operations across various integrations.
- **Tests**
- Expanded test coverage to validate timeout behavior in both
synchronous and asynchronous query flows, ensuring timely error
responses when query execution exceeds the specified limit.
- Introduced a new test suite to verify query operations when a timeout
is reached, checking for appropriate error handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:34:41 -07:00
Lance Release
d4ea50fba1 Bump version: 0.19.0-beta.3 → 0.19.0-beta.4 2025-04-02 21:23:19 +00:00
Lance Release
bd62c2384f Bump version: 0.19.0-beta.2 → 0.19.0-beta.3 2025-04-02 09:28:14 +00:00
BubbleCal
e52ac79c69 fix: can't do structured FTS in python (#2300)
missed to support it in `search()` API and there were some pydantic
errors

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced full-text search capabilities by incorporating additional
parameters, enabling more flexible query definitions.
- Extended table search functionality to support full-text queries
alongside existing search types.

- **Tests**
- Introduced new tests that validate both structured and conditional
full-text search behaviors.
- Expanded test coverage for various query types, including MatchQuery,
BoostQuery, MultiMatchQuery, and PhraseQuery.

- **Bug Fixes**
- Fixed a logic issue in query processing to ensure correct handling of
full-text search queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-02 17:27:15 +08:00
Lance Release
a505bc3965 Bump version: 0.19.0-beta.1 → 0.19.0-beta.2 2025-04-01 17:28:21 +00:00
Weston Pace
1ee63984f5 feat: allow FSB to be used for btree indices (#2297)
We recently allowed this for lance but there was a check in lancedb as
well that was preventing it

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for indexing fixed-size binary data using B-tree
structures for efficient data storage and retrieval.
- **Tests**
- Implemented automated tests to ensure the new binary indexing works
correctly and meets the expected configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 10:27:22 -07:00
Lance Release
e4485a630e Bump version: 0.19.0-beta.0 → 0.19.0-beta.1 2025-04-01 14:26:47 +00:00
Weston Pace
625bab3f21 feat: update to lance 0.25.3b1 (#2294)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependency versions for improved performance and
compatibility.

- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.

- **Bug Fixes**
  - Improved error handling and reporting for full-text search queries.

- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
2025-04-01 06:36:42 -07:00
Lance Release
e67cd0baf9 Bump version: 0.18.3-beta.0 → 0.19.0-beta.0 2025-03-30 18:04:32 +00:00
LuQQiu
b9bdb8d937 fix: fix remote restore api to always checkout latest version (#2291)
Fix restore to always checkout latest version, following local restore
api implementation

a1d1833a40/rust/lancedb/src/table.rs (L1910)
Otherwise
table.create_table -> version 1
table.add_table -> version 2
table.checkout(1), table.restore() -> the version remains at 1 (should
checkout_latest inside restore method to update version to latest
version and allow write operation)
table.checkout_latest() -> version is 3
can do write operations
2025-03-29 22:46:57 -07:00
LuQQiu
a1d1833a40 feat: add analyze_plan api (#2280)
add analyze plan api to allow executing the queries and see runtime
metrics.
Which help identify the query IO overhead and help identify query
slowness
2025-03-28 14:28:52 -07:00
Will Jones
a547c523c2 feat!: change default read_consistency_interval=5s (#2281)
Previously, when we loaded the next version of the table, we would block
all reads with a write lock. Now, we only do that if
`read_consistency_interval=0`. Otherwise, we load the next version
asynchronously in the background. This should mean that
`read_consistency_interval > 0` won't have a meaningful impact on
latency.

Along with this change, I felt it was safe to change the default
consistency interval to 5 seconds. The current default is `None`, which
means we will **never** check for a new version by default. I think that
default is contrary to most users expectations.
2025-03-28 11:04:31 -07:00
Lance Release
346cbf8bf7 Bump version: 0.18.2-beta.0 → 0.18.3-beta.0 2025-03-28 16:03:31 +00:00
LuQQiu
cba14a5743 feat: add restore remote api (#2282) 2025-03-27 16:33:52 -07:00
LuQQiu
698f329598 feat: add explain plan remote api (#2263)
Add explain plan remote api
2025-03-26 11:22:40 -07:00
Lance Release
f97e751b3c Bump version: 0.18.1 → 0.18.2-beta.0 2025-03-21 20:02:59 +00:00
Weston Pace
9403254442 feat: add to_query_object method (#2239)
This PR adds a `to_query_object` method to the various query builders
(except not hybrid queries yet). This makes it possible to inspect the
query that is built.

In addition this PR does some normalization between the sync and async
query paths. A few custom defaults were removed in favor of None (with
the default getting set once, in rust).

Also, the synchronous to_batches method will now actually stream results

Also, the remote API now defaults to prefiltering
2025-03-21 13:01:51 -07:00
Samuel Colvin
7982d5c082 fix: correct rust install docs (#2253)
I'm pretty sure you mean `cargo add lancedb` here, `cargo install
lancedb` fails right now.
2025-03-21 10:12:53 -07:00
BubbleCal
7ff6ec7fe3 feat: upgrade to lance v0.25.0-beta.5 (#2248)
- adds `loss` into the index stats for vector index
- now `optimize` can retrain the vector index

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:12:23 -07:00
Will Jones
440a466a13 ci: remove OpenSSL as dependency in favor of rustls (#2242)
`object_store` already hard codes `rustls` as the TLS implementation, so
we have been shipping a mix of `rustls` and `openssl`. For simplicity of
builds, we should consolidate to one, and that has to be `rustls`.
2025-03-20 08:06:45 -07:00
Weston Pace
4e03ee82bc refactor: rework catalog/database options (#2213)
The `ConnectRequest` has a set of properties that only make sense for
listing databases / catalogs and a set of properties that only make
sense for remote databases.

This PR reduces all options to a single `HashMap<String, String>`. This
makes it easier to add new database / catalog implementations and makes
it clearer to users which options are applicable in which situations.

I don't believe there are any breaking changes here. The closest thing
is that I placed the `ConnectBuilder` methods `api_key`, `region`, and
`host_override` behind a `remote` feature gate. This is not strictly
needed and I could remove the feature gate but it seemed appropriate.
Since using these methods without the remote feature would have been
meaningless I don't feel this counts as a breaking change.

We could look at removing these methods entirely from the
`ConnectBuilder` (and encouraging users to use `RemoteDatabaseOptions`
instead) but I'm not sure how I feel about that.

Another approach we could take is to move these methods into a
`RemoteConnectBuilderExt` trait (and there could be a similar
`ListingConnectBuilderExt` trait to add methods for the listing database
/ catalog).

For now though my main goal is to simplify `ConnectRequest` as much as
possible (I see this being part of the key public API for database /
catalog integrations, similar to the `BaseTable`, `Catalog`, and
`Database` traits and I'd like it to be simple).
2025-03-18 10:13:59 -07:00
Weston Pace
46a6846d07 refactor: remove dataset reference from base table (#2226) 2025-03-17 06:27:33 -07:00
Bob Liu
5c00b2904c feat: add get dataset method on NativeTable (#2021)
I want to public the dataset method from native table, then I can use
more lance method like order_by which is not exposed in the lancedb
crate.
2025-03-13 11:15:28 -07:00
Gagan Bhullar
14677d7c18 fix: metric type inconsistency (#2122)
PR fixes #2113

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-03-12 10:28:37 -07:00
Will Jones
7747c9bcbf feat(node): parse arrow types in alterColumns() (#2208)
Previously, users could only specify new data types in `alterColumns` as
strings:

```ts
await tbl.alterColumns([
  path: "price",
  dataType: "float"
]);
```

But this has some problems:

1. It wasn't clear what were valid types
2. It was impossible to specify nested types, like lists and vector
columns.

This PR changes it to take an Arrow data type, similar to how the Python
API works. This allows casting vector types:

```ts
await tbl.alterColumns([
  {
    path: "vector",
    dataType: new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float16(), false),
    ),
  },
]);
```

Closes #2185
2025-03-12 09:57:36 -07:00
vinoyang
3750639b5f feat(rust): add connect_catalog method to support connect catalog via url (#2177) 2025-03-12 05:19:03 -07:00
Lance Release
de6739e7ec Bump version: 0.18.1-beta.0 → 0.18.1 2025-03-11 13:14:49 +00:00
Lance Release
495216efdb Bump version: 0.18.0 → 0.18.1-beta.0 2025-03-11 13:14:44 +00:00
Lance Release
e80a405dee Bump version: 0.18.0-beta.1 → 0.18.0 2025-03-10 23:13:18 +00:00
Lance Release
a53e19e386 Bump version: 0.18.0-beta.0 → 0.18.0-beta.1 2025-03-10 23:13:13 +00:00
Wyatt Alt
f86b20a564 fix: delete tables from DDB on drop_all_tables (#2194)
Prior to this commit, issuing drop_all_tables on a listing database with
an external manifest store would delete physical tables but leave
references behind in the manifest store. The table drop would succeed,
but subsequent creation of a table with the same name would fail with a
conflict.

With this patch, the external manifest store is updated to account for
the dropped tables so that dropped table names can be reused.
2025-03-10 15:00:53 -07:00
Weston Pace
bc49c4db82 feat: respect datafusion's batch size when running as a table provider (#2187)
Datafusion makes the batch size available as part of the `SessionState`.
We should use that to set the `max_batch_length` property in the
`QueryExecutionOptions`.
2025-03-07 05:53:36 -08:00
Weston Pace
d2eec46f17 feat: add support for streaming input to create_table (#2175)
This PR makes it possible to create a table using an asynchronous stream
of input data. Currently only a synchronous iterator is supported. There
are a number of follow-ups not yet tackled:

* Support for embedding functions (the embedding functions wrapper needs
to be re-written to be async, should be an easy lift)
* Support for async input into the remote table (the make_ipc_batch
needs to change to accept async input, leaving undone for now because I
think we want to support actual streaming uploads into the remote table
soon)
* Support for async input into the add function (pretty essential, but
it is a fairly distinct code path, so saving for a different PR)
2025-03-06 11:55:00 -08:00
vinoyang
374fe0ad95 feat(rust): introduce Catalog trait and implement ListingCatalog (#2148)
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-03-03 20:22:24 -08:00
Weston Pace
fa1b9ad5bd fix: don't use with_schema to remove schema metadata (#2162)
It seems that `RecordBatch::with_schema` is unable to remove schema
metadata from a batch. It fails with the error `target schema is not
superset of current schema`.

I'm not sure how the `test_metadata_erased` test is passing. Strangely,
the metadata was not present by the time the batch arrived at the
metadata eraser. I think maybe the schema metadata is only present in
the batch if there is a filter.

I've created a new unit test that makes sure the metadata is erased if
we have a filter also
2025-02-27 10:24:00 -08:00
BubbleCal
8877eb020d feat: record the server version for remote table (#2147)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-02-27 15:55:59 +08:00
Lance Release
84b110e0ef Bump version: 0.17.0 → 0.18.0-beta.0 2025-02-26 20:11:07 +00:00
Will Jones
5b12a47119 feat!: revert query limit to be unbounded for scans (#2151)
In earlier PRs (#1886, #1191) we made the default limit 10 regardless of
the query type. This was confusing for users and in many cases a
breaking change. Users would have queries that used to return all
results, but instead only returned the first 10, causing silent bugs.

Part of the cause was consistency: the Python sync API seems to have
always had a limit of 10, while newer APIs (Python async and Nodejs)
didn't.

This PR sets the default limit only for searches (vector search, FTS),
while letting scans (even with filters) be unbounded. It does this
consistently for all SDKs.

Fixes #1983
Fixes #1852
Fixes #2141
2025-02-26 10:32:14 -08:00
Lance Release
22bd8329f3 Bump version: 0.17.0-beta.0 → 0.17.0 2025-02-26 18:16:07 +00:00
Lance Release
a736fad149 Bump version: 0.16.1-beta.3 → 0.17.0-beta.0 2025-02-26 18:16:01 +00:00
Weston Pace
c4f99e82e5 feat: push filters down into DF table provider (#2128) 2025-02-25 14:46:28 -08:00
BubbleCal
f391ed828a fix: remote table doesn't apply the prefilter flag for FTS (#2145) 2025-02-24 21:37:43 +08:00
Lance Release
0f102f02c3 Bump version: 0.16.1-beta.2 → 0.16.1-beta.3 2025-02-20 03:38:01 +00:00
BubbleCal
14c9ff46d1 feat: support multivector on remote table (#2045)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-02-20 11:34:51 +08:00