lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 15:12:53 +00:00

Author	SHA1	Message	Date
Lance Release	e612686fdb	Bump version: 0.25.0 → 0.25.1-beta.0 python-v0.25.1-beta.0	2025-09-10 14:24:07 +00:00
Wyatt Alt	e77d57a5b6	chore: update lance to 0.35.0-beta4 (#2639 ) Updates lance to 0.35.0-beta4, which also incurs a datafusion update. This brings in a fix for a memory leak in index caching, resulting from a cyclical reference.	2025-09-10 06:19:35 -07:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
LuQQiu	79960b254e	fix: add partition statistics to MetadataEraser (#2637 ) Some of the data fusion optimizers optimize based on data statistics (e.g. total bytes, number of rows). If those statistics are not supplied, optimizers cannot optimize on top. One example is Anti Hash Join which can optimize from LeftAnti (Left: big table, Right: small table) to RightAnti (Left: small table, Right: big table). Left Anti requires reading the whole big & small table while RightAnti only requires reading the whole left table and supports limit push down to only read partial of big table	2025-09-09 09:13:22 -07:00
Xuanwo	d19c64e29b	chore: bump version for JSON support (#2633 ) Bump version of lance to latest beta for JSON support. Signed-off-by: Xuanwo <github@xuanwo.io>	2025-09-05 12:26:28 -07:00
Lance Release	06d5612443	Bump version: 0.22.0-beta.2 → 0.22.0	2025-09-04 08:33:40 +00:00
Lance Release	45f96f4151	Bump version: 0.22.0-beta.1 → 0.22.0-beta.2	2025-09-04 08:33:09 +00:00
Lance Release	f744b785f8	Bump version: 0.25.0-beta.2 → 0.25.0 python-v0.25.0	2025-09-04 08:32:44 +00:00
Lance Release	2e3f745820	Bump version: 0.25.0-beta.1 → 0.25.0-beta.2	2025-09-04 08:32:43 +00:00
Jack Ye	683aaed716	chore: upgrade lance to 0.35.0 (#2625 )	2025-09-04 01:31:13 -07:00
Lance Release	48f7b20daa	Bump version: 0.22.0-beta.0 → 0.22.0-beta.1	2025-09-03 17:51:36 +00:00
Lance Release	4dd399ca29	Bump version: 0.25.0-beta.0 → 0.25.0-beta.1 python-v0.25.0-beta.1	2025-09-03 17:50:41 +00:00
Jack Ye	e6f1da31dc	chore: upgrade lance to 0.34.0-beta.4 (#2621 )	2025-09-02 21:33:55 -07:00
Wyatt Alt	a9ea785b15	fix: remote python sdk namespace typing (#2620 ) This changes the default values for some namespace parameters in the remote python SDK from None to [], to match the underlying code it calls. Prior to this commit, failing to supply "namespace" with the remote SDK would cause an error because the underlying code it dispatches to does not consider None to be valid input.	2025-09-02 16:32:32 -07:00
Colin Patrick McCabe	cc38453391	fix!: fix doctest in query.py (#2622 ) Fix doctest in query.py to include cumulative_cpu, now that lance includes that.	2025-09-02 15:47:32 -07:00
Lance Release	47747287b6	Bump version: 0.21.4-beta.1 → 0.22.0-beta.0	2025-08-29 21:20:57 +00:00
Lance Release	0847e666a0	Bump version: 0.24.4-beta.1 → 0.25.0-beta.0 python-v0.25.0-beta.0	2025-08-29 21:19:51 +00:00
Wyatt Alt	981f8427e6	chore: update lance (#2610 ) Adds storage_options to object_store wrap() to adhere to upstream lance change.	2025-08-29 13:41:02 -07:00
Will Jones	f6846004ca	feat: add `name` parameter to remaining Python create index calls (#2617 ) ## Summary This PR adds the missing `name` parameter to `create_scalar_index` and `create_fts_index` methods in the Python SDK, which was inadvertently omitted when it was added to `create_index` in PR #2586. ## Changes - Add `name: Optional[str] = None` parameter to abstract `Table.create_scalar_index` and `Table.create_fts_index` methods - Update `LanceTable` implementation to accept and pass the `name` parameter to the underlying Rust layer - Update `RemoteTable` implementation to accept and pass the `name` parameter - Enhanced tests to verify custom index names work correctly for both scalar and FTS indices - When `name` is not provided, default names are generated (e.g., `{column}_idx`) ## Test plan - [x] Added test cases for custom names in scalar index creation - [x] Added test cases for custom names in FTS index creation - [x] Verified existing tests continue to pass - [x] Code formatting and linting checks pass This ensures API consistency across all index creation methods in the LanceDB Python SDK. Fixes #2616 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-27 14:02:48 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Weston Pace	fabe37274f	feat: add __getitems__ method impl for torch integration (#2596 ) This allows a lancedb Table to act as a torch dataset.	2025-08-25 13:23:22 -07:00
Lance Release	6839ac3509	Bump version: 0.21.4-beta.0 → 0.21.4-beta.1	2025-08-22 03:55:22 +00:00
Lance Release	b88422e515	Bump version: 0.24.4-beta.0 → 0.24.4-beta.1 python-v0.24.4-beta.1	2025-08-22 03:54:34 +00:00
BubbleCal	8d60685ede	chore: upgrade lance to 0.33.0-beta.4 (#2604 ) detials: https://github.com/lancedb/lance/releases/tag/untagged-5191abd48c1fbe76f746 Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-08-21 21:18:48 +08:00
Jack Ye	04285a4a4e	feat(python): integrate with lance namespace (#2599 ) This PR integrates `lancedb` with `lance-namespace` so that users can use LanceDB client to access Lance tables in any catalog services. In general, we expect most of the logic to be delegated to the existing `LanceDBConnection` and `LanceTable`, but the namespace implemenation will control how table is created, dropped, and describe where the table is stored with any related storage options like access credentials. The implementation currently only supports a 1 level namespace that directly contains tables. We will introduce nested namespace support in a separated PR. Users are expected to use it in the following way: ```python >>> import lancedb >>> import pyarrow as pa >>> # Connect using GlueNamespace >>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"}) >>> # Create a table with schema >>> schema = pa.schema([ ... pa.field("id", pa.int64()), ... pa.field("vector", pa.list_(pa.float32(), 2)) ... ]) >>> table = db.create_table("my_table", schema=schema) >>> # List tables >>> db.table_names() ['my_table'] ```	2025-08-20 15:46:16 -07:00
Lance Release	d4a41b5663	Bump version: 0.21.3 → 0.21.4-beta.0	2025-08-19 22:56:52 +00:00
Lance Release	adc3daa462	Bump version: 0.24.3 → 0.24.4-beta.0 python-v0.24.4-beta.0	2025-08-19 22:56:05 +00:00
Will Jones	acbfa6c012	feat: upgrade lance to 0.33.0-beta.3 (#2598 ) Change logs: * [v0.33.0-beta.3](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.3) * [v0.33.0-beta.2](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.2) * [v0.33.0-beta.1](https://github.com/lancedb/lance/releases/tag/v0.33.0-beta.1) Important changes: * Row-level conflict resolution for delete operations * Fixes #2593 * Fix for keeping tombstones fields around, preventing cleanup of dropped columns.	2025-08-19 13:45:15 -07:00
Vitali Lovich	d602e9f98c	fix: make cloud features optional (#2567 ) (#2568 ) This shrinks the size of a local embedded build that can disable all the default features. When combined with https://github.com/lancedb/lance/pull/4362 and the dependencies are updated to point to the fix, this resolves #2567 fully. Verified by patching the workspace to redirect to my clone of lance with the PR applied. ``` cargo tree -p lancedb -e no-build -e no-dev --no-default-features -i aws-config \| less ``` The reason that lance itself needs to change too is that many dependencies within that project depend on lance-io/default and lancedb depends on them which transitively ends up enabling the cloud regardless. The PR in lance removes the dependency on lance-io/default from all sibling crates. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-08-15 16:46:52 -07:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00
Lance Release	0c34ffb252	Bump version: 0.21.3-beta.0 → 0.21.3	2025-08-15 18:03:26 +00:00
Lance Release	d9f333d828	Bump version: 0.21.2 → 0.21.3-beta.0	2025-08-15 18:02:43 +00:00
Lance Release	bb809abd4b	Bump version: 0.24.3-beta.0 → 0.24.3 python-v0.24.3	2025-08-15 18:02:04 +00:00
Lance Release	c87530f7a3	Bump version: 0.24.2 → 0.24.3-beta.0	2025-08-15 18:02:04 +00:00
Will Jones	1eb1beecd6	ci: remove more mentions of node (#2595 ) I promise this time I tested it locally :)	2025-08-15 11:01:02 -07:00
Yuval Lifshitz	ce550e6c45	feat: add missing rust examples (#2583 ) all 3 example are running now with: ``` cargo run --example simple cargo run --example full_text_search cargo run --example ivf_pq ``` Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 10:38:58 -07:00
Will Jones	d3bae1f3a3	ci: drop old node mention (#2594 ) This broke release here: https://github.com/lancedb/lancedb/actions/runs/16993824504/job/48179542912	2025-08-15 09:51:19 -07:00
Will Jones	dcf53c4506	fix: limit and offset support paginating through FTS and vector search results (#2592 ) Adds tests to ensure that users can paginate through simple scan, FTS, and vector search results using `limit` and `offset`. Tests upstream work: https://github.com/lancedb/lance/pull/4318 Closes #2459	2025-08-15 08:55:12 -07:00
Ryan Green	941eada703	docs: update indexing and compaction docs (#2362 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Clarified and expanded explanations of data management concepts in LanceDB. - Added notes on automatic background fragment compaction and incremental reindexing support in LanceDB Cloud/Enterprise. - Updated details on disabling interim exhaustive kNN search during background reindexing. - Improved formatting and removed outdated FTS reindexing subsection. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-08-15 12:41:47 -02:30
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
Will Jones	296205ef96	feat: upgrade lance to v0.33.0 (#2591 ) https://github.com/lancedb/lance/releases/tag/v0.33.0	2025-08-14 12:11:19 -07:00
Weston Pace	16beaaa656	ci: fix broken CI checks (#2585 )	2025-08-13 10:05:57 -07:00
Tomoko Uchida	4ff87b1f4a	feat: add hybrid search example in Rust (#2579 ) Hello! I'm new to lancedb and interested in the Rust SDK. I couldn't find a good hybrid search example in Rust, so I created one. ## Usage ```bash $ cargo run --quiet --example hybrid_search --features=sentence-transformers Result: Python is a popular programming language. Result: Mount Everest is the highest mountain in the world. Result: The first computer programmer was Ada Lovelace. Result: Coffee is one of the most popular beverages in the world. Result: Basketball is a sport played with a ball and a hoop. ```	2025-08-12 08:22:19 -07:00
Shawn	0532ef2358	chore(deps): update crunchy to 0.2.4 (#2581 ) Hi, I'm try to build goose (rely on lancedb) for android/termux. Found out some depsendencies need to update. https://github.com/block/goose/pull/3890 0.2.4 update - nmathewson Fix cross-compilation between windows and non-windows. https://github.com/shawn111/lancedb/actions/runs/16871317860 windows and linux build passed https://github.com/shawn111/lancedb/actions/runs/16871859398 Signed-off-by: Shawn Wang <shawn111@gmail.com>	2025-08-11 18:00:00 -07:00
BubbleCal	dcf7334c1f	chore: upgrade lance to v0.32.2-beta.1 (#2580 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-08-08 17:00:54 +08:00
Will Jones	8ffe992a6f	fix: always uses slashes in table uris (#2575 ) Closes #2574	2025-08-05 12:12:57 -07:00
Will Jones	9d683e4f0b	feat: infer vector columns when name contains 'vector' or 'embedding' (#2547 ) ## Summary - Enhanced vector column detection to use substring matching instead of exact matching - Now detects columns with names containing "vector" or "embedding" (case-insensitive) - Added integer vector support to Node.js implementation (matching Python) - Comprehensive test coverage for both float and integer vector types ## Changes ### Python (`python/python/lancedb/table.py`) - Updated `_infer_target_schema()` to use substring matching with helper function `_is_vector_column()` - Preserved original field names instead of forcing "vector" - Consolidated duplicate logic for better maintainability ### Node.js (`nodejs/lancedb/arrow.ts`) - Enhanced type inference with `nameSuggestsVectorColumn()` helper function - Added `isAllIntegers()` function with performance optimization (checks first 10 elements) - Implemented integer vector support using `Uint8` type (matching Python) - Improved type safety by removing `any` usage ### Tests - Python: Added `test_infer_target_schema_with_vector_embedding_names()` in `test_util.py` - Node.js: Added comprehensive test case in `arrow.test.ts` - Both test suites cover various naming patterns and integer/float vector types ## Examples of newly supported column names: - `user_vector`, `text_embedding`, `doc_embeddings` - `my_vector_field`, `embedding_model` - `VECTOR_COL`, `Vector_Mixed` (case-insensitive) - Both float and integer arrays are properly converted to fixed-size lists ## Test plan - [x] All existing tests pass (backward compatibility maintained) - [x] New tests pass for both Python and Node.js implementations - [x] Integer vector detection works correctly in Node.js - [x] Code passes linting and formatting checks - [x] Performance optimized for large vector arrays Fixes #2546 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 15:36:49 -07:00
Will Jones	0a1ea1858d	chore: remove vectordb package (#2564 ) ```shell git rm -r rust/ffi git rm -r node git rm ci/build_windows_artifacts.ps1 git rm ci/build_windows_artifacts_nodejs.ps1 git rm ci/build_linux_artifacts.sh git rm ci/build_macos_artifacts.sh git rm -r ci/manylinux_node git rm .github/workflows/node.yml ```	2025-08-04 14:14:33 -07:00
Poornachandra.A.N	7d0127b376	feat(embeddings): add siglip embedding support to lancedb (#2499 ) ### Summary This PR adds SigLIP (Sigmoid Loss Image Pretraining) as a new embedding model in the LanceDB embedding registry. SigLIP improves image-text alignment performance using sigmoid-based contrastive loss and offers robust zero-shot generalization. Fixes #2498 ### What’s Implemented #### 1. `SigLIP` Embedding Class * Added `SigLIP` support under `python/lancedb/embeddings/siglip.py` * Implements: * `compute_source_embeddings` * `_batch_generate_embeddings` * Normalization logic * Batch-wise progress logging for image embedding #### 2. Registry Integration * Registered `SigLIP` in `embeddings/__init__.py` * `SigLIP` now usable via `connect(..., embedding="siglip")` #### 3. Evaluation Benchmark Support * Added SigLIP to `test_embeddings_slow.py` for side-by-side benchmarking with OpenCLIP and ImageBind ### New Test Methods #### `test_siglip` * End-to-end test to verify embeddings table creation and vector shape for SigLIP ![WhatsApp Image 2025-07-10 at 18 00 27_a3368163](https://github.com/user-attachments/assets/e5582ee1-80a3-43d7-a7a1-26ceecce9f4d) #### `test_siglip_vs_openclip_vs_imagebind_benchmark_full` * Benchmarks: * Recall\@1 / 5 / 10 * mAP (Mean Average Precision) * Embedding & Search Latency * Dimensionality reporting ![WhatsApp Image 2025-07-10 at 18 12 13_22c67a84](https://github.com/user-attachments/assets/455bf30f-62b7-4684-a3f3-ad52e2a1ffe5) ### Notes * SigLIP outputs 768D embeddings (vs 512D for OpenCLIP) * Benchmark shows competitive performance despite higher dimensionality * I'm still new to contributing to open-source and learning as I go. Please feel free to suggest any improvements — I'm happy to make changes!	2025-08-04 11:42:39 -07:00
Will Jones	02595dc475	feat: add overall timeout parameter to remote client (#2550 ) ## Summary - Adds an overall `timeout` parameter to `TimeoutConfig` that limits the total time for the entire request - Can be set via config or `LANCE_CLIENT_TIMEOUT` environment variable - Exposed in Python and Node.js bindings - Includes comprehensive tests ## Test plan - [x] Unit tests for Rust TimeoutConfig - [x] Integration tests for Python bindings - [x] Integration tests for Node.js bindings - [x] All existing tests pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 10:06:55 -07:00

1 2 3 4 5 ...

2041 Commits