Commit Graph

52 Commits

Author SHA1 Message Date
BubbleCal
96c66fd087 feat: support multivector for JS SDK (#2527)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-07-22 21:19:34 +08:00
BubbleCal
fec8d58f06 feat: support a bunch or FTS features in JS SDK (#2431)
- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-12 17:04:19 +08:00
Renato Marroquin
d0bc671cac docs: add example for querying a lance table with SQL (#2389)
Adds example for querying a dataset with SQL

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Added new guides on querying LanceDB tables using SQL with DuckDB and
Apache Datafusion.
- Included detailed instructions for integrating LanceDB with Datafusion
in Python.
- Updated navigation to include Datafusion and SQL querying
documentation.
- Improved formatting in TypeScript and vectordb update examples for
consistency.

- **Tests**
- Added a new test demonstrating SQL querying on Lance tables via
DataFusion integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-05-29 06:14:38 -07:00
Eileen Noonan
1620ba3508 docs: make table.update() nodejs guide consistent with API documentation (#2334)
The docs in the Guide here do not match the [API reference]
(https://lancedb.github.io/lancedb/js/classes/Table/#updateopts) for the
nodejs client.

I am writing an Elixir wrapper over the typescript library (Rust
forthcoming!) and confirmed in testing that the API reference is correct
vs the Guide.

Following the Guide docs, the error I got was:

"lance error: Invalid user input: Schema error: No field named bar.
Valid fields are foo. For a query of:

await table.update({foo: "buzz"}, { where: "foo = 'bar'"});
Over a table with a schema of just {foo: Utf8}.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Reformatted a code snippet in the guide to enhance readability by
splitting it into multiple lines for improved clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 08:38:16 -07:00
Adam Azzam
c42a201389 docs: remove trailing commas from AWS IAM Policies (#2324)
Before:

<img width="1173" alt="Screenshot 2025-04-08 at 10 58 50 AM"
src="https://github.com/user-attachments/assets/e5c69c45-ab68-488f-9c7f-e12f7ecbfaab"
/>

After:
<img width="1136" alt="Screenshot 2025-04-08 at 10 58 58 AM"
src="https://github.com/user-attachments/assets/108c11ea-09b3-49b5-9a50-b880e72a0270"
/>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated JSON policy examples in the storage guides to correct
formatting issues and enhance syntax clarity for readers.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-16 13:09:21 -07:00
Will Jones
b3a4efd587 fix: revert change default read_consistency_interval=5s (#2327)
This reverts commit a547c523c2 or #2281

The current implementation can cause panics and performance degradation.
I will bring this back with more testing in
https://github.com/lancedb/lancedb/pull/2311

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Enhanced clarity on read consistency settings with updated
descriptions and default behavior.
- Removed outdated warnings about eventual consistency from the
troubleshooting guide.

- **Refactor**
- Streamlined the handling of the read consistency interval across
integrations, now defaulting to "None" for improved performance.
  - Simplified internal logic to offer a more consistent experience.

- **Tests**
- Updated test expectations to reflect the new default representation
for the read consistency interval.
- Removed redundant tests related to "no consistency" settings for
streamlined testing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-04-14 08:48:15 -07:00
Will Jones
a547c523c2 feat!: change default read_consistency_interval=5s (#2281)
Previously, when we loaded the next version of the table, we would block
all reads with a write lock. Now, we only do that if
`read_consistency_interval=0`. Otherwise, we load the next version
asynchronously in the background. This should mean that
`read_consistency_interval > 0` won't have a meaningful impact on
latency.

Along with this change, I felt it was safe to change the default
consistency interval to 5 seconds. The current default is `None`, which
means we will **never** check for a new version by default. I think that
default is contrary to most users expectations.
2025-03-28 11:04:31 -07:00
QianZhu
b3b5362632 docs: replace Lancedb Cloud link (#2259)
* direct users to cloud.lancedb.com since LanceDB Cloud is in public
beta
* removed the `cast vector dimension` from alter columns as we don't
support it
2025-03-21 17:43:00 -07:00
Ayush Chaurasia
b9afd9c860 docs: add late interaction, multi-vector guide & link example (#2231)
1/2 docs update for this week. Addesses issues from this docs epic -
https://github.com/lancedb/lancedb/issues/1476
2025-03-20 20:29:32 +05:30
Will Jones
7747c9bcbf feat(node): parse arrow types in alterColumns() (#2208)
Previously, users could only specify new data types in `alterColumns` as
strings:

```ts
await tbl.alterColumns([
  path: "price",
  dataType: "float"
]);
```

But this has some problems:

1. It wasn't clear what were valid types
2. It was impossible to specify nested types, like lists and vector
columns.

This PR changes it to take an Arrow data type, similar to how the Python
API works. This allows casting vector types:

```ts
await tbl.alterColumns([
  {
    path: "vector",
    dataType: new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float16(), false),
    ),
  },
]);
```

Closes #2185
2025-03-12 09:57:36 -07:00
Will Jones
dba85f4d6f docs: user guide for merge insert (#2083)
Closes #2062
2025-01-31 10:03:21 -08:00
Josef Gugglberger
55ffc96e56 docs: update storage.md, fix Azure Sync connect example (#2010)
In the sync code example there was also an `await`.


![image](https://github.com/user-attachments/assets/4e1a1bd9-f2fb-4dbe-a9a6-1384ab63edbb)
2025-01-10 09:01:19 -08:00
QianZhu
17c9e9afea docs: add async examples to doc (#1941)
- added sync and async tabs for python examples
- moved python code to tests/docs

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-01-07 15:10:25 -08:00
Wyatt Alt
0b45ef93c0 docs: assorted copyedits (#1998)
This includes a handful of minor edits I made while reading the docs. In
addition to a few spelling fixes,
* standardize on "rerank" over "re-rank" in prose
* terminate sentences with periods or colons as appropriate
* replace some usage of dashes with colons, such as in "Try it yourself
- <link>"

All changes are surface-level. No changes to semantics or structure.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-01-06 15:04:48 -08:00
QianZhu
c0ee370f83 docs: improve schema evolution api examples (#1929) 2024-12-12 10:52:06 -08:00
Bert
239f725b32 feat(python)!: async-sync feature parity on Connections (#1905)
Closes #1791
Closes #1764
Closes #1897 (Makes this unnecessary)

BREAKING CHANGE: when using azure connection string `az://...` the call
to connect will fail if the azure storage credentials are not set. this
is breaking from the previous behaviour where the call would fail after
connect, when user invokes methods on the connection.
2024-12-05 14:54:39 -05:00
Will Jones
79eaa52184 feat: schema evolution APIs in all SDKs (#1851)
* Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK
and async Python
* Add `data_type` parameter to node
* Docs updates
2024-12-04 14:47:50 -08:00
QianZhu
3e9321fc40 docs: improve scalar index and filtering (#1874)
improved the docs on build a scalar index and pre-/post-filtering

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-11-25 11:30:57 -08:00
Emmanuel Ferdman
83ae52938a docs: update migration reference (#1837)
# PR Summary
PR fixes the `migration.md` reference in `docs/src/guides/tables.md`. On
the way, it also fixes some typos found in that document.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2024-11-18 14:33:32 -08:00
Will Jones
587c0824af feat: flexible null handling and insert subschemas in Python (#1827)
* Test that we can insert subschemas (omit nullable columns) in Python.
* More work is needed to support this in Node. See:
https://github.com/lancedb/lancedb/issues/1832
* Test that we can insert data with nullable schema but no nulls in
non-nullable schema.
* Add `"null"` option for `on_bad_vectors` where we fill with null if
the vector is bad.
* Make null values not considered bad if the field itself is nullable.
2024-11-15 11:33:00 -08:00
Will Jones
0fd8a50bd7 ci(node): run examples in CI (#1796)
This is done as setup for a PR that will fix the OpenAI dependency
issue.

 * [x] FTS examples
 * [x] Setup mock openai
 * [x] Ran `npm audit fix`
 * [x] sentences embeddings test
 * [x] Double check formatting of docs examples
2024-11-13 11:10:56 -08:00
Olzhas Alexandrov
5ccd0edec2 docs: clarify infrastructure requirements for S3 Express One Zone (#1745) 2024-10-11 14:06:28 -06:00
Jon X
2b8e872be0 docs: removed the unnecessary fence code tag (#1599) 2024-09-05 14:40:38 +05:30
Lei Xu
5857cb4c6e docs: add a section to describe scalar index (#1495) 2024-08-16 18:48:29 -07:00
Cory Grinstead
69295548cc docs: minor updates for js migration guides (#1451)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-07-22 10:26:49 -07:00
Ayush Chaurasia
bb2e624ff0 docs: add fine tuning section in retriever guide and minor fixes (#1438) 2024-07-11 17:34:29 +05:30
Cory Grinstead
31be9212da docs(nodejs): add @lancedb/lancedb examples everywhere (#1411)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-07-10 13:29:03 -05:00
Will Jones
865ed99881 feat: dynamodb commit store support (#1410)
This allows users to specify URIs like:

```
s3+ddb://my_bucket/path?ddbTableName=myCommitTable
```

and it will support concurrent writes in S3.

* [x] Add dynamodb integration tests
* [x] Add modifications to get it working in Python sync API
* [x] Added section in documentation describing how to configure.

Closes #534

---------

Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
2024-06-28 09:30:36 -07:00
Thomas J. Fan
a866b78a31 docs: fixes polars formatting in docs (#1400)
Currently, the whole polars section is formatted as a code block:
https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe

This PR fixes the formatting.
2024-06-25 08:46:16 -07:00
Ayush Chaurasia
76fc16c7a1 docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326)
- Tried to address some onboarding feedbacks listed in
https://github.com/lancedb/lancedb/issues/1224
- Improve visibility of pydantic integration and embedding API. (Based
on onboarding feedback - Many ways of ingesting data, defining schema
but not sure what to use in a specific use-case)
- Add a guide that takes users through testing and improving retriever
performance using built-in utilities like hybrid-search and reranking
- Add some benchmarks for the above
- Add missing cohere docs

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-06-08 06:25:31 +05:30
Will Jones
becd649130 docs: add tip about using allow_http on local servers (#1277)
Based on user question
https://discord.com/channels/1030247538198061086/1197630499926057021/1237350091191222293
2024-05-07 10:15:26 -07:00
Will Jones
1d23af213b feat: expose storage options in LanceDB (#1204)
Exposes `storage_options` in LanceDB. This is provided for Python async,
Node `lancedb`, and Node `vectordb` (and Rust of course). Python
synchronous is omitted because it's not compatible with the PyArrow
filesystems we use there currently. In the future, we will move the sync
API to wrap the async one, and then it will get support for
`storage_options`.

1. Fixes #1168
2. Closes #1165
3. Closes #1082
4. Closes #439
5. Closes #897
6. Closes #642
7. Closes #281
8. Closes #114
9. Closes #990
10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged
to use `storageOptions` instead.
2024-04-10 10:12:04 -07:00
vincent d warmerdam
85a9ef472f Unhide Pydantic guides in Docs (#1122)
@wjones127 after fixing https://github.com/lancedb/lancedb/issues/1112 I
noticed something else on the docs. There's an odd chunk of the docs
missing
[here](https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe).
I can see the heading, but after clicking it the contents don't show.

![CleanShot 2024-03-15 at 23 40
17@2x](https://github.com/lancedb/lancedb/assets/1019791/04784b19-0200-4c3f-ae17-7a8f871ef9bd)

Apon inspection it was a markdown issue, one tab too many on a whole
segment.

This PR fixes it. It looks like this now and the sections appear again:

![CleanShot 2024-03-15 at 23 42
32@2x](https://github.com/lancedb/lancedb/assets/1019791/c5aaec4c-1c37-474d-9fb0-641f4cf52626)
2024-04-05 16:32:47 -07:00
Will Jones
c5b0934bfb feat(node): add read_consistency_interval to Node and Rust (#1002)
This PR adds the same consistency semantics as was added in #828. It
*does not* add the same lazy-loading of tables, since that breaks some
existing tests.

This closes #998.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:30:40 -07:00
QianZhu
2e75b16403 make it explicit about the vector column data type (#916)
<img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964">
[
<img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07">
](url)

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:29:05 -07:00
QianZhu
1f2eafca75 arrow table/f16 example (#907) 2024-04-05 16:29:05 -07:00
Will Jones
d5be6c7a05 docs: provide AWS S3 cleanup and permissions advice (#903)
Adding some more quick advice for how to setup AWS S3 with LanceDB.

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:28:56 -07:00
Will Jones
7d82e56f76 docs: document basics of configuring object storage (#832)
Created based on upstream PR https://github.com/lancedb/lance/pull/1849

Closes #681

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-04-05 16:27:51 -07:00
Prashanth Rao
e6bb907d81 Docs updates incl. Polars (#827)
This PR makes the following aesthetic and content updates to the docs.

- [x] Fix max width issue on mobile: Content should now render more
cleanly and be more readable on smaller devices
- [x] Improve image quality of flowchart in data management page
- [x] Fix syntax highlighting in text at the bottom of the IVF-PQ
concepts page
- [x] Add example of Polars LazyFrames to docs (Integrations)
- [x] Add example of adding data to tables using Polars (guides)
2024-04-05 16:27:14 -07:00
Prashanth Rao
4d5d748acd docs: Updates and refactor (#683)
This PR makes incremental changes to the documentation.

* Closes #697
* Closes #698

- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:27:12 -07:00
Chang She
72b39432e8 feat(python): add exist_ok option to create table (#813)
This mimics CREATE TABLE IF NOT EXISTS behavior.
We add `db.create_table(..., exist_ok=True)` parameter.
By default it is set to False, so trying to create
a table with the same name will raise an exception.
If set to True, then it only opens the table if it
already exists. If you pass in a schema, it will
be checked against the existing table to make sure
you get what you want. If you pass in data, it will
NOT be added to the existing table.
2024-04-05 16:26:35 -07:00
Chang She
7bac1131fb feat: add timezone handling for datetime in pydantic (#578)
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
2024-04-05 16:24:47 -07:00
Will Jones
43662705ad docs: enhance Update user guide (#735)
Closes #705
2024-04-05 16:24:47 -07:00
Chang She
5376970e87 doc(javascript): minor improvement on docs for working with tables (#736)
Closes #639 
Closes #638
2024-04-05 16:24:47 -07:00
Rok Mihevc
377a564904 docs: switch python examples to be row based (#554) 2024-04-05 16:22:59 -07:00
Ayush Chaurasia
541b06664f [Docs] Improve visibility of table ops (#553)
A little verbose, but better than being non-discoverable 
![Screenshot from 2023-10-11
16-26-02](https://github.com/lancedb/lancedb/assets/15766192/9ba539a7-0cf8-4d9e-94e7-ce5d37c35df0)
2024-04-05 16:22:59 -07:00
Tan Li
e4c3a9346c [doc] make the tensor width differnt from height (#533) 2023-10-03 00:55:16 -07:00
Chang She
f20f19b804 feat: improve pydantic 1.x compat (#503) 2023-09-18 19:01:30 -07:00
Lei Xu
b315ea3978 [Python] Pydantic vector field with default value (#474)
Rename `lance.pydantic.vector` to `Vector` and deprecate `vector(dim)`
2023-09-08 22:35:31 -07:00
Ayush Chaurasia
cc916389a6 [DOCS] Major Docs Revamp (#435) 2023-08-22 14:06:26 -07:00