lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-16 11:30:41 +00:00

Author	SHA1	Message	Date
Lance Release	fec2a05629	Bump version: 0.22.2-beta.0 → 0.22.2-beta.1	2025-09-30 19:31:44 +00:00
Lance Release	e7e9e80b1d	Bump version: 0.22.1 → 0.22.2-beta.0	2025-09-24 22:54:54 +00:00
Will Jones	d617cdef4a	feat: add use_index parameter to merge insert operations (#2674 ) ## Summary Exposes `use_index` Merge Insert parameter, which was created upstream in https://github.com/lancedb/lance/pull/4688. ## API Examples ### Python ```python # Force table scan table.merge_insert(["id"]) \ .when_not_matched_insert_all() \ .use_index(False) \ .execute(data) ``` ### Node.js/TypeScript ```typescript // Force table scan await table.mergeInsert("id") .whenNotMatchedInsertAll() .useIndex(false) .execute(data); ``` ### Rust ```rust // Force table scan let mut builder = table.merge_insert(&["id"]); builder.when_not_matched_insert_all() .use_index(false); builder.execute(data).await?; ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-09-24 12:50:21 -07:00
Will Jones	356d7046fd	ci: fix test failure on main (#2677 ) Test was in wrong position.	2025-09-24 09:46:04 -07:00
Will Jones	48e5caabda	ci(nodejs): lint for unused imports (#2673 )	2025-09-23 18:49:42 -07:00
Lance Release	d6cc68f671	Bump version: 0.22.1-beta.4 → 0.22.1	2025-09-23 22:07:31 +00:00
Lance Release	55eacfa685	Bump version: 0.22.1-beta.3 → 0.22.1-beta.4	2025-09-23 22:06:45 +00:00
Neha Prasad	b0800b4b71	fix: undefined values should become null in nullable fields (#2658 ) ### Bug Fix: Undefined Values in Nullable Fields Issue: When inserting data with `undefined` values into nullable fields, LanceDB was incorrectly coercing them to default values (`false` for booleans, `NaN` for numbers, `""` for strings) instead of `null`. Fix: Modified the `makeVector()` function in `arrow.ts` to properly convert `undefined` values to `null` for nullable fields before passing data to Apache Arrow. fixes: #2645 Result: Now `{ text: undefined, number: undefined, bool: undefined }` correctly becomes `{ text: null, number: null, bool: null }` when fields are marked as nullable in the schema. Files Changed: - `nodejs/lancedb/arrow.ts` (core fix) - `nodejs/__test__/arrow.test.ts` (test coverage) - This ensures proper null handling for nullable fields as expected by users. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:29:52 -07:00
Neha Prasad	1befebf614	fix(node): handle null values in nullable boolean fields (#2657 ) ### Solution Added special handling in `makeVector` function for boolean arrays where all values are null. The fix creates a proper null bitmap using `makeData` and `arrowMakeVector` instead of relying on Apache Arrow's `vectorFromArray` which doesn't handle this edge case correctly. fixes: #2644 ### Changes - Added null value detection for boolean types in `makeVector` function - Creates proper Arrow data structure with null bitmap when all boolean values are null - Preserves existing behavior for non-null boolean values and other data types - Fixes the boolean null value bug while maintaining backward compatibility. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:07:00 -07:00
Lance Release	05a4ea646a	Bump version: 0.22.1-beta.2 → 0.22.1-beta.3	2025-09-22 04:49:00 +00:00
Jack Ye	ff71d7e552	feat: support shallow clone (#2653 ) Support shallow cloning a dataset at a specific location to create a new dataset, using the shallow_clone feature in Lance. Also introduce remote `clone` API for remote tables for this functionality.	2025-09-21 21:28:40 -07:00
Neha Prasad	2261eb95a0	fix(node): handle undefined vector fields with embedding functions (#2655 ) - Fixes issue where passing `{ vector: undefined }` with an embedding function threw "Found field not in schema" error instead of calling the embedding function like `null` or omitted fields. Changes: - Modified `rowPathsAndValues` to skip undefined values during schema inference - Added test case verifying undefined, null, and omitted vector fields all work correctly Before: `{ vector: undefined }` → Error After: `{ vector: undefined }` → Calls embedding function Closes #2647	2025-09-19 09:17:28 -07:00
Lance Release	b5a39bffec	Bump version: 0.22.1-beta.1 → 0.22.1-beta.2	2025-09-18 20:22:35 +00:00
Lance Release	6ea6884260	Bump version: 0.22.1-beta.0 → 0.22.1-beta.1	2025-09-10 20:49:43 +00:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Lance Release	3c7419b392	Bump version: 0.22.0 → 0.22.1-beta.0	2025-09-10 14:24:58 +00:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
Lance Release	06d5612443	Bump version: 0.22.0-beta.2 → 0.22.0	2025-09-04 08:33:40 +00:00
Lance Release	45f96f4151	Bump version: 0.22.0-beta.1 → 0.22.0-beta.2	2025-09-04 08:33:09 +00:00
Lance Release	48f7b20daa	Bump version: 0.22.0-beta.0 → 0.22.0-beta.1	2025-09-03 17:51:36 +00:00
Jack Ye	e6f1da31dc	chore: upgrade lance to 0.34.0-beta.4 (#2621 )	2025-09-02 21:33:55 -07:00
Lance Release	47747287b6	Bump version: 0.21.4-beta.1 → 0.22.0-beta.0	2025-08-29 21:20:57 +00:00
Wyatt Alt	981f8427e6	chore: update lance (#2610 ) Adds storage_options to object_store wrap() to adhere to upstream lance change.	2025-08-29 13:41:02 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Lance Release	6839ac3509	Bump version: 0.21.4-beta.0 → 0.21.4-beta.1	2025-08-22 03:55:22 +00:00
Lance Release	d4a41b5663	Bump version: 0.21.3 → 0.21.4-beta.0	2025-08-19 22:56:52 +00:00
Vitali Lovich	d602e9f98c	fix: make cloud features optional (#2567 ) (#2568 ) This shrinks the size of a local embedded build that can disable all the default features. When combined with https://github.com/lancedb/lance/pull/4362 and the dependencies are updated to point to the fix, this resolves #2567 fully. Verified by patching the workspace to redirect to my clone of lance with the PR applied. ``` cargo tree -p lancedb -e no-build -e no-dev --no-default-features -i aws-config \| less ``` The reason that lance itself needs to change too is that many dependencies within that project depend on lance-io/default and lancedb depends on them which transitively ends up enabling the cloud regardless. The PR in lance removes the dependency on lance-io/default from all sibling crates. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-08-15 16:46:52 -07:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00
Lance Release	0c34ffb252	Bump version: 0.21.3-beta.0 → 0.21.3	2025-08-15 18:03:26 +00:00
Lance Release	d9f333d828	Bump version: 0.21.2 → 0.21.3-beta.0	2025-08-15 18:02:43 +00:00
Will Jones	dcf53c4506	fix: limit and offset support paginating through FTS and vector search results (#2592 ) Adds tests to ensure that users can paginate through simple scan, FTS, and vector search results using `limit` and `offset`. Tests upstream work: https://github.com/lancedb/lance/pull/4318 Closes #2459	2025-08-15 08:55:12 -07:00
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
Weston Pace	16beaaa656	ci: fix broken CI checks (#2585 )	2025-08-13 10:05:57 -07:00
Will Jones	9d683e4f0b	feat: infer vector columns when name contains 'vector' or 'embedding' (#2547 ) ## Summary - Enhanced vector column detection to use substring matching instead of exact matching - Now detects columns with names containing "vector" or "embedding" (case-insensitive) - Added integer vector support to Node.js implementation (matching Python) - Comprehensive test coverage for both float and integer vector types ## Changes ### Python (`python/python/lancedb/table.py`) - Updated `_infer_target_schema()` to use substring matching with helper function `_is_vector_column()` - Preserved original field names instead of forcing "vector" - Consolidated duplicate logic for better maintainability ### Node.js (`nodejs/lancedb/arrow.ts`) - Enhanced type inference with `nameSuggestsVectorColumn()` helper function - Added `isAllIntegers()` function with performance optimization (checks first 10 elements) - Implemented integer vector support using `Uint8` type (matching Python) - Improved type safety by removing `any` usage ### Tests - Python: Added `test_infer_target_schema_with_vector_embedding_names()` in `test_util.py` - Node.js: Added comprehensive test case in `arrow.test.ts` - Both test suites cover various naming patterns and integer/float vector types ## Examples of newly supported column names: - `user_vector`, `text_embedding`, `doc_embeddings` - `my_vector_field`, `embedding_model` - `VECTOR_COL`, `Vector_Mixed` (case-insensitive) - Both float and integer arrays are properly converted to fixed-size lists ## Test plan - [x] All existing tests pass (backward compatibility maintained) - [x] New tests pass for both Python and Node.js implementations - [x] Integer vector detection works correctly in Node.js - [x] Code passes linting and formatting checks - [x] Performance optimized for large vector arrays Fixes #2546 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 15:36:49 -07:00
Will Jones	02595dc475	feat: add overall timeout parameter to remote client (#2550 ) ## Summary - Adds an overall `timeout` parameter to `TimeoutConfig` that limits the total time for the entire request - Can be set via config or `LANCE_CLIENT_TIMEOUT` environment variable - Exposed in Python and Node.js bindings - Includes comprehensive tests ## Test plan - [x] Unit tests for Rust TimeoutConfig - [x] Integration tests for Python bindings - [x] Integration tests for Node.js bindings - [x] All existing tests pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 10:06:55 -07:00
Reed Loden	f23327af79	fix: use SPDX-compliant license name for nodejs packages (#2558 ) Update license field from `Apache 2.0` to be `Apache-2.0` for all Node.js packages. This was causing GitHub's Dependency Review license check to fail with: > The validity of the licenses of the dependencies below could not be determined. Ensure that they are valid SPDX licenses	2025-08-04 09:54:53 -07:00
Mark McCaskey	fe76496a59	fix: `.nprobes` method in python bindings, improve error messages (#2556 ) `nprobes` with a value greater than 20 fails with the minimum error: ``` self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30 def minimum_nprobes(self, minimum_nprobes: int) -> Self: """Set the minimum number of probes to use. See `nprobes` for more details. These partitions will be searched on every indexed vector query and will increase recall at the expense of latency. """ > self._inner.minimum_nprobes(minimum_nprobes) E ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes python/lancedb/query.py:2744: ValueError ``` Putting the max set before the min seems reasonable but it causes this reasonable case to fail: ``` def test_nprobes_min_max_works_sync(table): LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list() ``` with ``` self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4 def maximum_nprobes(self, maximum_nprobes: int) -> Self: """Set the maximum number of probes to use. See `nprobes` for more details. If this value is greater than `minimum_nprobes` then the excess partitions will be searched only if we have not found enough results. This can be useful when there is a narrow filter to allow these queries to spend more time searching and avoid potential false negatives. If this value is 0 then no limit will be applied and all partitions could be searched if needed to satisfy the limit. """ > self._inner.maximum_nprobes(maximum_nprobes) E ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes python/lancedb/query.py:2761: ValueError ```. The case I care about is where min == max, but this solution handles it even if they're not. If both min and max exist, we set both to the minimum and then set the max. This isn't 100% the same as the minimum setter checks for 0 on the min and `.nprobes` does not do any sanity checking at all. But I figured this was the most reasonable and general solution without touching more of this code. As part of this I noticed the error messages were a bit ambiguous so I made them symmetric and clarified them while I was here.	2025-07-30 09:23:25 -07:00
Lance Release	70d9b04ba5	Bump version: 0.21.2-beta.2 → 0.21.2	2025-07-25 20:32:41 +00:00
Lance Release	b0d4a79c35	Bump version: 0.21.2-beta.1 → 0.21.2-beta.2	2025-07-25 20:31:50 +00:00
Will Jones	3d1f102087	feat: allow Python and Typescript users to create `Session`s (#2530 ) ## Summary - Exposes `Session` in Python and Typescript so users can set the `index_cache_size_bytes` and `metadata_cache_size_bytes` * The `Session` is attached to the `Connection`, and thus shared across all tables in that connection. - Adds deprecation warnings for table-level cache configuration 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-24 12:06:29 -07:00
Will Jones	fbff244ed8	chore: add claude md files (#2531 ) Gives basic context to Claude about how to do common tasks in the repo.	2025-07-23 12:20:36 -07:00
Lance Release	cceaf27d79	Bump version: 0.21.2-beta.0 → 0.21.2-beta.1	2025-07-22 15:41:13 +00:00
BubbleCal	96c66fd087	feat: support multivector for JS SDK (#2527 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-22 21:19:34 +08:00
Will Jones	88283110f4	fix: handle input with missing columns when using embedding functions (#2516 ) ## Summary Fixes #2515 by implementing comprehensive support for missing columns in Arrow table inputs when using embedding functions. ### Problem Previously, when an Arrow table was passed to `fromDataToBuffer` with missing columns and a schema containing embedding functions, the system would fail because `applyEmbeddingsFromMetadata` expected all columns to be present in the table. 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-18 15:54:25 -07:00
Lance Release	b3a637fdeb	Bump version: 0.21.1 → 0.21.2-beta.0	2025-07-18 16:03:28 +00:00
BubbleCal	03b62599d7	feat: support ngram tokenizer (#2507 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-15 16:36:08 +08:00
Benjamin Schmidt	4c999fb651	chore: fix cleanupOlderThan docs (#2504 ) Thanks for all your work. The docstring for `OptimizeOptions ` seems to reference a non-existent method on `Table`. I believe this is the correct example for `cleanupOlderThan`. This also appears in the generated docs, but I assume they live downstream from this code?	2025-07-15 16:23:10 +08:00
Lance Release	6d23d32ab5	Bump version: 0.21.1-beta.2 → 0.21.1	2025-07-10 21:36:59 +00:00
Lance Release	704cec34e1	Bump version: 0.21.1-beta.1 → 0.21.1-beta.2	2025-07-10 21:36:26 +00:00
Lance Release	2bffbcefa5	Bump version: 0.21.1-beta.0 → 0.21.1-beta.1	2025-07-09 05:54:20 +00:00

1 2 3 4 5 ...

377 Commits