lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-22 21:09:58 +00:00

Author	SHA1	Message	Date
Will Jones	48e5caabda	ci(nodejs): lint for unused imports (#2673 )	2025-09-23 18:49:42 -07:00
Lance Release	d6cc68f671	Bump version: 0.22.1-beta.4 → 0.22.1	2025-09-23 22:07:31 +00:00
Lance Release	55eacfa685	Bump version: 0.22.1-beta.3 → 0.22.1-beta.4	2025-09-23 22:06:45 +00:00
Lance Release	222e3264ab	Bump version: 0.25.1-beta.4 → 0.25.1 python-v0.25.1	2025-09-23 22:06:08 +00:00
Lance Release	13505026cb	Bump version: 0.25.1-beta.3 → 0.25.1-beta.4	2025-09-23 22:06:08 +00:00
Neha Prasad	b0800b4b71	fix: undefined values should become null in nullable fields (#2658 ) ### Bug Fix: Undefined Values in Nullable Fields Issue: When inserting data with `undefined` values into nullable fields, LanceDB was incorrectly coercing them to default values (`false` for booleans, `NaN` for numbers, `""` for strings) instead of `null`. Fix: Modified the `makeVector()` function in `arrow.ts` to properly convert `undefined` values to `null` for nullable fields before passing data to Apache Arrow. fixes: #2645 Result: Now `{ text: undefined, number: undefined, bool: undefined }` correctly becomes `{ text: null, number: null, bool: null }` when fields are marked as nullable in the schema. Files Changed: - `nodejs/lancedb/arrow.ts` (core fix) - `nodejs/__test__/arrow.test.ts` (test coverage) - This ensures proper null handling for nullable fields as expected by users. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:29:52 -07:00
Neha Prasad	1befebf614	fix(node): handle null values in nullable boolean fields (#2657 ) ### Solution Added special handling in `makeVector` function for boolean arrays where all values are null. The fix creates a proper null bitmap using `makeData` and `arrowMakeVector` instead of relying on Apache Arrow's `vectorFromArray` which doesn't handle this edge case correctly. fixes: #2644 ### Changes - Added null value detection for boolean types in `makeVector` function - Creates proper Arrow data structure with null bitmap when all boolean values are null - Preserves existing behavior for non-null boolean values and other data types - Fixes the boolean null value bug while maintaining backward compatibility. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:07:00 -07:00
Will Jones	1ab60fae7f	feat: upgrade Lance to v0.37.0 (#2672 ) Change logs: * https://github.com/lancedb/lance/releases/tag/v0.37.0 * https://github.com/lancedb/lance/releases/tag/v0.36.0	2025-09-23 13:41:47 -07:00
Ayush Chaurasia	e921c90c1b	feat: support mean reciprocal rank reranker (#2671 ) The basic idea of MRR is this - https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr I've implemented a weighted version for allowing user to set weightage between vector and fts. The gist is something like this ### Scenario A: Document at rank 1 in one set, absent from another ``` # Assuming equal weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.0 # absent → 0.0 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.0 = 0.5 ``` ### Scenario B: Document at rank 1 in one set, rank 2 in another ``` # Same weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.5 # rank 2 → 1/2 = 0.5 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.5 = 0.5 + 0.25 = 0.75 ``` And so with `return_score="all"` the result looks something like this (this is from the reranker tests). Because this is a weighted rank based reranker, some results might have the same score ``` text vector _distance _rowid _score _relevance_score 0 I am your father [-0.010703234, 0.069315575, 0.030076642, 0.002... 8.149148e-13 8589934598 10.978719 1.000000 1 the ground beneath my feet [-0.09500901, 0.00092102867, 0.0755851, 0.0372... 1.376896e+00 8589934604 NaN 0.250000 2 I find your lack of faith disturbing [0.07525753, -0.0100010475, 0.09990541, 0.0209... NaN 8589934595 3.483394 0.250000 3 but I don't wanna die [0.033476487, -0.011235877, -0.057625435, -0.0... 1.538222e+00 8589934610 1.130355 0.238095 4 if you strike me down I shall become more powe... [0.00432201, 0.030120496, 5.3317923e-05, 0.033... 1.381086e+00 8589934594 0.715157 0.216667 5 I see a salty message written in the eves [-0.04213107, 0.0016004723, 0.061052393, -0.02... 1.638301e+00 8589934603 1.043785 0.133333 6 but his son was mortal [0.012462767, 0.049041674, -0.057339743, -0.04... 1.421566e+00 8589934620 NaN 0.125000 7 I've got a bad feeling about this [-0.06973199, -0.029960092, 0.02641632, -0.031... NaN 8589934596 1.043785 0.125000 8 now that's a name I haven't heard in a long time [-0.014374257, -0.013588792, -0.07487557, 0.03... 1.597573e+00 8589934593 0.848772 0.118056 9 he was a god [-0.0258895, 0.11925236, -0.029397793, 0.05888... 1.423147e+00 8589934618 NaN 0.100000 10 I wish they would make another one [-0.14737535, -0.015304729, 0.04318139, -0.061... NaN 8589934622 1.043785 0.100000 11 Kratos had a son [-0.057455737, 0.13734367, -0.03537109, -0.000... 1.488075e+00 8589934617 NaN 0.083333 12 I don't wanna live like this [-0.0028891307, 0.015214227, 0.025183653, 0.08... NaN 8589934609 1.043785 0.071429 13 I see a mansard roof through the trees [0.052383978, 0.087759204, 0.014739997, 0.0239... NaN 8589934602 1.043785 0.062500 14 great kid don't get cocky [-0.047043696, 0.054648954, -0.008509666, -0.0... 1.618125e+00 8589934592 NaN 0.055556 ```	2025-09-23 18:25:18 +05:30
Lance Release	05a4ea646a	Bump version: 0.22.1-beta.2 → 0.22.1-beta.3	2025-09-22 04:49:00 +00:00
Lance Release	ebbeeff4e0	Bump version: 0.25.1-beta.2 → 0.25.1-beta.3 python-v0.25.1-beta.3	2025-09-22 04:47:42 +00:00
Jack Ye	407ca53f92	chore: increase pypi publish timeout and use warp runner for arm64 (#2670 ) Fix failures like: https://github.com/lancedb/lancedb/actions/runs/17840462235/job/50748940233 ARM64 build cannot succeed within 1 hour, x86-64 build sometimes cannot succeed within 1 hour.	2025-09-21 21:42:44 -07:00
Jack Ye	ff71d7e552	feat: support shallow clone (#2653 ) Support shallow cloning a dataset at a specific location to create a new dataset, using the shallow_clone feature in Lance. Also introduce remote `clone` API for remote tables for this functionality.	2025-09-21 21:28:40 -07:00
Neha Prasad	2261eb95a0	fix(node): handle undefined vector fields with embedding functions (#2655 ) - Fixes issue where passing `{ vector: undefined }` with an embedding function threw "Found field not in schema" error instead of calling the embedding function like `null` or omitted fields. Changes: - Modified `rowPathsAndValues` to skip undefined values during schema inference - Added test case verifying undefined, null, and omitted vector fields all work correctly Before: `{ vector: undefined }` → Error After: `{ vector: undefined }` → Calls embedding function Closes #2647	2025-09-19 09:17:28 -07:00
Jack Ye	5b397e410b	chore: fix out of date tests with new namespace validation (#2663 ) Failure: https://github.com/lancedb/lancedb/actions/runs/17820044478/job/50660516344	2025-09-18 13:29:47 -07:00
Lance Release	b5a39bffec	Bump version: 0.22.1-beta.1 → 0.22.1-beta.2	2025-09-18 20:22:35 +00:00
Lance Release	5e1e9add07	Bump version: 0.25.1-beta.1 → 0.25.1-beta.2 python-v0.25.1-beta.2	2025-09-18 20:21:33 +00:00
Jack Ye	97e9938dfe	fix: add missing validations to namespace operations (#2659 )	2025-09-17 23:27:04 -07:00
Weston Pace	1d4b92e01e	refactor: remove catalog implementation now that we have namespaces in database (#2662 ) We had previously prototyped a `Catalog` trait anticipating a three-tiered Catalog-Database-Table structure. Now that we have namespaces in the `Database` we can support any tiering scheme and the `Catalog` trait is no longer needed.	2025-09-17 08:40:20 -07:00
Le Duc Manh	4c9fc3044b	fix: use create to resolve variables (#2640 ) # What - Use `create` to resolve variables values # Reference Fixes #2181 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-12 13:07:32 -07:00
Jack Ye	0ebc8d45a8	chore: fix no lock build warnings and CI timeouts (#2650 ) Example CI failures: - publish build timeout: https://github.com/lancedb/lancedb/actions/runs/17626482881/job/50084552906 - doc test build timeout: https://github.com/lancedb/lancedb/actions/runs/17627058590/job/50086456818	2025-09-11 15:30:35 -07:00
BubbleCal	f7d78c3420	feat: add 'target_partition_size' param (#2642 ) this exposes the param `target_partition_size` from lance --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-09-11 22:56:16 +08:00
Lance Release	6ea6884260	Bump version: 0.22.1-beta.0 → 0.22.1-beta.1	2025-09-10 20:49:43 +00:00
Lance Release	b1d791a299	Bump version: 0.25.1-beta.0 → 0.25.1-beta.1 python-v0.25.1-beta.1	2025-09-10 20:48:56 +00:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Lance Release	3c7419b392	Bump version: 0.22.0 → 0.22.1-beta.0	2025-09-10 14:24:58 +00:00
Lance Release	e612686fdb	Bump version: 0.25.0 → 0.25.1-beta.0 python-v0.25.1-beta.0	2025-09-10 14:24:07 +00:00
Wyatt Alt	e77d57a5b6	chore: update lance to 0.35.0-beta4 (#2639 ) Updates lance to 0.35.0-beta4, which also incurs a datafusion update. This brings in a fix for a memory leak in index caching, resulting from a cyclical reference.	2025-09-10 06:19:35 -07:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
LuQQiu	79960b254e	fix: add partition statistics to MetadataEraser (#2637 ) Some of the data fusion optimizers optimize based on data statistics (e.g. total bytes, number of rows). If those statistics are not supplied, optimizers cannot optimize on top. One example is Anti Hash Join which can optimize from LeftAnti (Left: big table, Right: small table) to RightAnti (Left: small table, Right: big table). Left Anti requires reading the whole big & small table while RightAnti only requires reading the whole left table and supports limit push down to only read partial of big table	2025-09-09 09:13:22 -07:00
Xuanwo	d19c64e29b	chore: bump version for JSON support (#2633 ) Bump version of lance to latest beta for JSON support. Signed-off-by: Xuanwo <github@xuanwo.io>	2025-09-05 12:26:28 -07:00
Lance Release	06d5612443	Bump version: 0.22.0-beta.2 → 0.22.0	2025-09-04 08:33:40 +00:00
Lance Release	45f96f4151	Bump version: 0.22.0-beta.1 → 0.22.0-beta.2	2025-09-04 08:33:09 +00:00
Lance Release	f744b785f8	Bump version: 0.25.0-beta.2 → 0.25.0 python-v0.25.0	2025-09-04 08:32:44 +00:00
Lance Release	2e3f745820	Bump version: 0.25.0-beta.1 → 0.25.0-beta.2	2025-09-04 08:32:43 +00:00
Jack Ye	683aaed716	chore: upgrade lance to 0.35.0 (#2625 )	2025-09-04 01:31:13 -07:00
Lance Release	48f7b20daa	Bump version: 0.22.0-beta.0 → 0.22.0-beta.1	2025-09-03 17:51:36 +00:00
Lance Release	4dd399ca29	Bump version: 0.25.0-beta.0 → 0.25.0-beta.1 python-v0.25.0-beta.1	2025-09-03 17:50:41 +00:00
Jack Ye	e6f1da31dc	chore: upgrade lance to 0.34.0-beta.4 (#2621 )	2025-09-02 21:33:55 -07:00
Wyatt Alt	a9ea785b15	fix: remote python sdk namespace typing (#2620 ) This changes the default values for some namespace parameters in the remote python SDK from None to [], to match the underlying code it calls. Prior to this commit, failing to supply "namespace" with the remote SDK would cause an error because the underlying code it dispatches to does not consider None to be valid input.	2025-09-02 16:32:32 -07:00
Colin Patrick McCabe	cc38453391	fix!: fix doctest in query.py (#2622 ) Fix doctest in query.py to include cumulative_cpu, now that lance includes that.	2025-09-02 15:47:32 -07:00
Lance Release	47747287b6	Bump version: 0.21.4-beta.1 → 0.22.0-beta.0	2025-08-29 21:20:57 +00:00
Lance Release	0847e666a0	Bump version: 0.24.4-beta.1 → 0.25.0-beta.0 python-v0.25.0-beta.0	2025-08-29 21:19:51 +00:00
Wyatt Alt	981f8427e6	chore: update lance (#2610 ) Adds storage_options to object_store wrap() to adhere to upstream lance change.	2025-08-29 13:41:02 -07:00
Will Jones	f6846004ca	feat: add `name` parameter to remaining Python create index calls (#2617 ) ## Summary This PR adds the missing `name` parameter to `create_scalar_index` and `create_fts_index` methods in the Python SDK, which was inadvertently omitted when it was added to `create_index` in PR #2586. ## Changes - Add `name: Optional[str] = None` parameter to abstract `Table.create_scalar_index` and `Table.create_fts_index` methods - Update `LanceTable` implementation to accept and pass the `name` parameter to the underlying Rust layer - Update `RemoteTable` implementation to accept and pass the `name` parameter - Enhanced tests to verify custom index names work correctly for both scalar and FTS indices - When `name` is not provided, default names are generated (e.g., `{column}_idx`) ## Test plan - [x] Added test cases for custom names in scalar index creation - [x] Added test cases for custom names in FTS index creation - [x] Verified existing tests continue to pass - [x] Code formatting and linting checks pass This ensures API consistency across all index creation methods in the LanceDB Python SDK. Fixes #2616 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-27 14:02:48 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Weston Pace	fabe37274f	feat: add __getitems__ method impl for torch integration (#2596 ) This allows a lancedb Table to act as a torch dataset.	2025-08-25 13:23:22 -07:00
Lance Release	6839ac3509	Bump version: 0.21.4-beta.0 → 0.21.4-beta.1	2025-08-22 03:55:22 +00:00
Lance Release	b88422e515	Bump version: 0.24.4-beta.0 → 0.24.4-beta.1 python-v0.24.4-beta.1	2025-08-22 03:54:34 +00:00
BubbleCal	8d60685ede	chore: upgrade lance to 0.33.0-beta.4 (#2604 ) detials: https://github.com/lancedb/lance/releases/tag/untagged-5191abd48c1fbe76f746 Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-08-21 21:18:48 +08:00

... 2 3 4 5 6 ...

2217 Commits