lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 15:12:53 +00:00

Author	SHA1	Message	Date
BubbleCal	96c66fd087	feat: support multivector for JS SDK (#2527 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-22 21:19:34 +08:00
BubbleCal	fec8d58f06	feat: support a bunch or FTS features in JS SDK (#2431 ) - operator for match query - slop for phrase query - boolean query <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced support for boolean full-text search queries with AND/OR logic and occurrence conditions. - Added operator options for match and multi-match queries to control term combination logic. - Enabled phrase queries to specify proximity (slop) for flexible phrase matching. - Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery` class for enhanced query expressiveness. - Bug Fixes - Improved validation and error handling for invalid operator and occurrence inputs in full-text queries. - Tests - Expanded test coverage with new cases for boolean queries and operator-based full-text searches. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-12 17:04:19 +08:00
Renato Marroquin	d0bc671cac	docs: add example for querying a lance table with SQL (#2389 ) Adds example for querying a dataset with SQL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added new guides on querying LanceDB tables using SQL with DuckDB and Apache Datafusion. - Included detailed instructions for integrating LanceDB with Datafusion in Python. - Updated navigation to include Datafusion and SQL querying documentation. - Improved formatting in TypeScript and vectordb update examples for consistency. - Tests - Added a new test demonstrating SQL querying on Lance tables via DataFusion integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-05-29 06:14:38 -07:00
Eileen Noonan	1620ba3508	docs: make table.update() nodejs guide consistent with API documentation (#2334 ) The docs in the Guide here do not match the [API reference] (https://lancedb.github.io/lancedb/js/classes/Table/#updateopts) for the nodejs client. I am writing an Elixir wrapper over the typescript library (Rust forthcoming!) and confirmed in testing that the API reference is correct vs the Guide. Following the Guide docs, the error I got was: "lance error: Invalid user input: Schema error: No field named bar. Valid fields are foo. For a query of: await table.update({foo: "buzz"}, { where: "foo = 'bar'"}); Over a table with a schema of just {foo: Utf8}. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Reformatted a code snippet in the guide to enhance readability by splitting it into multiple lines for improved clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:38:16 -07:00
Adam Azzam	c42a201389	docs: remove trailing commas from AWS IAM Policies (#2324 ) Before: <img width="1173" alt="Screenshot 2025-04-08 at 10 58 50 AM" src="https://github.com/user-attachments/assets/e5c69c45-ab68-488f-9c7f-e12f7ecbfaab" /> After: <img width="1136" alt="Screenshot 2025-04-08 at 10 58 58 AM" src="https://github.com/user-attachments/assets/108c11ea-09b3-49b5-9a50-b880e72a0270" /> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Updated JSON policy examples in the storage guides to correct formatting issues and enhance syntax clarity for readers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-16 13:09:21 -07:00
Will Jones	b3a4efd587	fix: revert change default read_consistency_interval=5s (#2327 ) This reverts commit `a547c523c2` or #2281 The current implementation can cause panics and performance degradation. I will bring this back with more testing in https://github.com/lancedb/lancedb/pull/2311 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Enhanced clarity on read consistency settings with updated descriptions and default behavior. - Removed outdated warnings about eventual consistency from the troubleshooting guide. - Refactor - Streamlined the handling of the read consistency interval across integrations, now defaulting to "None" for improved performance. - Simplified internal logic to offer a more consistent experience. - Tests - Updated test expectations to reflect the new default representation for the read consistency interval. - Removed redundant tests related to "no consistency" settings for streamlined testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-04-14 08:48:15 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
QianZhu	b3b5362632	docs: replace Lancedb Cloud link (#2259 ) * direct users to cloud.lancedb.com since LanceDB Cloud is in public beta * removed the `cast vector dimension` from alter columns as we don't support it	2025-03-21 17:43:00 -07:00
Ayush Chaurasia	b9afd9c860	docs: add late interaction, multi-vector guide & link example (#2231 ) 1/2 docs update for this week. Addesses issues from this docs epic - https://github.com/lancedb/lancedb/issues/1476	2025-03-20 20:29:32 +05:30
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
Will Jones	dba85f4d6f	docs: user guide for merge insert (#2083 ) Closes #2062	2025-01-31 10:03:21 -08:00
Josef Gugglberger	55ffc96e56	docs: update storage.md, fix Azure Sync connect example (#2010 ) In the sync code example there was also an `await`. ![image](https://github.com/user-attachments/assets/4e1a1bd9-f2fb-4dbe-a9a6-1384ab63edbb)	2025-01-10 09:01:19 -08:00
QianZhu	17c9e9afea	docs: add async examples to doc (#1941 ) - added sync and async tabs for python examples - moved python code to tests/docs --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-01-07 15:10:25 -08:00
Wyatt Alt	0b45ef93c0	docs: assorted copyedits (#1998 ) This includes a handful of minor edits I made while reading the docs. In addition to a few spelling fixes, * standardize on "rerank" over "re-rank" in prose * terminate sentences with periods or colons as appropriate * replace some usage of dashes with colons, such as in "Try it yourself - <link>" All changes are surface-level. No changes to semantics or structure. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-01-06 15:04:48 -08:00
QianZhu	c0ee370f83	docs: improve schema evolution api examples (#1929 )	2024-12-12 10:52:06 -08:00
Bert	239f725b32	feat(python)!: async-sync feature parity on Connections (#1905 ) Closes #1791 Closes #1764 Closes #1897 (Makes this unnecessary) BREAKING CHANGE: when using azure connection string `az://...` the call to connect will fail if the azure storage credentials are not set. this is breaking from the previous behaviour where the call would fail after connect, when user invokes methods on the connection.	2024-12-05 14:54:39 -05:00
Will Jones	79eaa52184	feat: schema evolution APIs in all SDKs (#1851 ) * Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK and async Python * Add `data_type` parameter to node * Docs updates	2024-12-04 14:47:50 -08:00
QianZhu	3e9321fc40	docs: improve scalar index and filtering (#1874 ) improved the docs on build a scalar index and pre-/post-filtering --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-11-25 11:30:57 -08:00
Emmanuel Ferdman	83ae52938a	docs: update migration reference (#1837 ) # PR Summary PR fixes the `migration.md` reference in `docs/src/guides/tables.md`. On the way, it also fixes some typos found in that document. Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2024-11-18 14:33:32 -08:00
Will Jones	587c0824af	feat: flexible null handling and insert subschemas in Python (#1827 ) * Test that we can insert subschemas (omit nullable columns) in Python. * More work is needed to support this in Node. See: https://github.com/lancedb/lancedb/issues/1832 * Test that we can insert data with nullable schema but no nulls in non-nullable schema. * Add `"null"` option for `on_bad_vectors` where we fill with null if the vector is bad. * Make null values not considered bad if the field itself is nullable.	2024-11-15 11:33:00 -08:00
Will Jones	0fd8a50bd7	ci(node): run examples in CI (#1796 ) This is done as setup for a PR that will fix the OpenAI dependency issue. * [x] FTS examples * [x] Setup mock openai * [x] Ran `npm audit fix` * [x] sentences embeddings test * [x] Double check formatting of docs examples	2024-11-13 11:10:56 -08:00
Olzhas Alexandrov	5ccd0edec2	docs: clarify infrastructure requirements for S3 Express One Zone (#1745 )	2024-10-11 14:06:28 -06:00
Jon X	2b8e872be0	docs: removed the unnecessary fence code tag (#1599 )	2024-09-05 14:40:38 +05:30
Lei Xu	5857cb4c6e	docs: add a section to describe scalar index (#1495 )	2024-08-16 18:48:29 -07:00
Cory Grinstead	69295548cc	docs: minor updates for js migration guides (#1451 ) Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-22 10:26:49 -07:00
Ayush Chaurasia	bb2e624ff0	docs: add fine tuning section in retriever guide and minor fixes (#1438 )	2024-07-11 17:34:29 +05:30
Cory Grinstead	31be9212da	docs(nodejs): add @lancedb/lancedb examples everywhere (#1411 ) Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-10 13:29:03 -05:00
Will Jones	865ed99881	feat: dynamodb commit store support (#1410 ) This allows users to specify URIs like: ``` s3+ddb://my_bucket/path?ddbTableName=myCommitTable ``` and it will support concurrent writes in S3. * [x] Add dynamodb integration tests * [x] Add modifications to get it working in Python sync API * [x] Added section in documentation describing how to configure. Closes #534 --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com>	2024-06-28 09:30:36 -07:00
Thomas J. Fan	a866b78a31	docs: fixes polars formatting in docs (#1400 ) Currently, the whole polars section is formatted as a code block: https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe This PR fixes the formatting.	2024-06-25 08:46:16 -07:00
Ayush Chaurasia	76fc16c7a1	docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326 ) - Tried to address some onboarding feedbacks listed in https://github.com/lancedb/lancedb/issues/1224 - Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case) - Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking - Add some benchmarks for the above - Add missing cohere docs --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-06-08 06:25:31 +05:30
Will Jones	becd649130	docs: add tip about using allow_http on local servers (#1277 ) Based on user question https://discord.com/channels/1030247538198061086/1197630499926057021/1237350091191222293	2024-05-07 10:15:26 -07:00
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
vincent d warmerdam	85a9ef472f	Unhide Pydantic guides in Docs (#1122 ) @wjones127 after fixing https://github.com/lancedb/lancedb/issues/1112 I noticed something else on the docs. There's an odd chunk of the docs missing [here](https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe). I can see the heading, but after clicking it the contents don't show. ![CleanShot 2024-03-15 at 23 40 17@2x](https://github.com/lancedb/lancedb/assets/1019791/04784b19-0200-4c3f-ae17-7a8f871ef9bd) Apon inspection it was a markdown issue, one tab too many on a whole segment. This PR fixes it. It looks like this now and the sections appear again: ![CleanShot 2024-03-15 at 23 42 32@2x](https://github.com/lancedb/lancedb/assets/1019791/c5aaec4c-1c37-474d-9fb0-641f4cf52626)	2024-04-05 16:32:47 -07:00
Will Jones	c5b0934bfb	feat(node): add `read_consistency_interval` to Node and Rust (#1002 ) This PR adds the same consistency semantics as was added in #828. It does not add the same lazy-loading of tables, since that breaks some existing tests. This closes #998. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:30:40 -07:00
QianZhu	2e75b16403	make it explicit about the vector column data type (#916 ) <img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM" src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964"> [ <img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM" src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07"> ](url) --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:29:05 -07:00
QianZhu	1f2eafca75	arrow table/f16 example (#907 )	2024-04-05 16:29:05 -07:00
Will Jones	d5be6c7a05	docs: provide AWS S3 cleanup and permissions advice (#903 ) Adding some more quick advice for how to setup AWS S3 with LanceDB. --------- Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:28:56 -07:00
Will Jones	7d82e56f76	docs: document basics of configuring object storage (#832 ) Created based on upstream PR https://github.com/lancedb/lance/pull/1849 Closes #681 --------- Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:27:51 -07:00
Prashanth Rao	e6bb907d81	Docs updates incl. Polars (#827 ) This PR makes the following aesthetic and content updates to the docs. - [x] Fix max width issue on mobile: Content should now render more cleanly and be more readable on smaller devices - [x] Improve image quality of flowchart in data management page - [x] Fix syntax highlighting in text at the bottom of the IVF-PQ concepts page - [x] Add example of Polars LazyFrames to docs (Integrations) - [x] Add example of adding data to tables using Polars (guides)	2024-04-05 16:27:14 -07:00
Prashanth Rao	4d5d748acd	docs: Updates and refactor (#683 ) This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:27:12 -07:00
Chang She	72b39432e8	feat(python): add exist_ok option to create table (#813 ) This mimics CREATE TABLE IF NOT EXISTS behavior. We add `db.create_table(..., exist_ok=True)` parameter. By default it is set to False, so trying to create a table with the same name will raise an exception. If set to True, then it only opens the table if it already exists. If you pass in a schema, it will be checked against the existing table to make sure you get what you want. If you pass in data, it will NOT be added to the existing table.	2024-04-05 16:26:35 -07:00
Chang She	7bac1131fb	feat: add timezone handling for datetime in pydantic (#578 ) If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.	2024-04-05 16:24:47 -07:00
Will Jones	43662705ad	docs: enhance Update user guide (#735 ) Closes #705	2024-04-05 16:24:47 -07:00
Chang She	5376970e87	doc(javascript): minor improvement on docs for working with tables (#736 ) Closes #639 Closes #638	2024-04-05 16:24:47 -07:00
Rok Mihevc	377a564904	docs: switch python examples to be row based (#554 )	2024-04-05 16:22:59 -07:00
Ayush Chaurasia	541b06664f	[Docs] Improve visibility of table ops (#553 ) A little verbose, but better than being non-discoverable ![Screenshot from 2023-10-11 16-26-02](https://github.com/lancedb/lancedb/assets/15766192/9ba539a7-0cf8-4d9e-94e7-ce5d37c35df0)	2024-04-05 16:22:59 -07:00
Tan Li	e4c3a9346c	[doc] make the tensor width differnt from height (#533 )	2023-10-03 00:55:16 -07:00
Chang She	f20f19b804	feat: improve pydantic 1.x compat (#503 )	2023-09-18 19:01:30 -07:00
Lei Xu	b315ea3978	[Python] Pydantic vector field with default value (#474 ) Rename `lance.pydantic.vector` to `Vector` and deprecate `vector(dim)`	2023-09-08 22:35:31 -07:00
Ayush Chaurasia	cc916389a6	[DOCS] Major Docs Revamp (#435 )	2023-08-22 14:06:26 -07:00

1 2

52 Commits