lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-14 02:20:40 +00:00

Author	SHA1	Message	Date
fzowl	2adb10e6a8	feat: voyage-multimodal-3.5 (#2887 ) voyage-multimodal-3.5 support (text, image and video embeddings)	2026-01-02 15:14:52 -08:00
Jonathan Hsieh	1cf7b4b678	docs: remove incorrect "LanceDb Cloud only" from table_names params (#2893 ) The page_token and limit parameters for table_names() are supported by both local storage and LanceDB Cloud, not just Cloud as the docstring incorrectly stated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 09:08:04 -08:00
Prashanth Rao	8ae4f42fbe	fix: add to_lance() and to_polars() stub methods for type-checkers (#2876 ) Adds `Table.to_lance()` and `Table.to_polars()` methods (non-abstract methods, defaulting to `NotImplementedError`) so type checkers like mypy, pyright and ty don’t flag them as unknown attributes on `Table`. Not making these abstract methods should keep existing remote/other `Table` implementations instantiable. This is non-breaking change to existing functionality and is purely for the purpose of pleasing static type-checkers like mypy, ty and pyright. <img width="626" height="134" alt="image" src="https://github.com/user-attachments/assets/f4619bca-a882-432b-bd23-ae8f189ff9e3" />	2025-12-18 12:55:07 -05:00
BubbleCal	39a18baf59	feat: infer vector type to float32 if integers are out of uint8 range (#2856 ) ## Summary - infer integer vector columns as float32 when any value exceeds uint8 range or is negative - keep uint8 for integer vectors within range and nulls only - add sync/async tests covering large integer vector inference ## Testing - ./.venv/bin/pytest python/python/tests/test_table.py -k "large_int_vectors"	2025-12-08 17:10:25 +08:00
BubbleCal	a61461331c	feat: add IVF SQ index support and HNSW aliases (#2832 ) Adds IVF_SQ index config through Rust core and Python bindings, plus alias names IvfHnswSq/Pq for backward compatibility. Updates remote/table helpers and types to accept the new index type. Includes tests covering IVF SQ creation and alias usage.	2025-12-04 00:25:44 +08:00
Jack Ye	d1efc6ad8a	refactor!: use namespace models directly for namespace operations (#2806 ) 1. Use generated models in lance-namespace for request response models to avoid multiple layers of conversions 2. Make sure the API is consistent with the namespace spec 3. Deprecate the table_names API in favor of the list_tables API in namespace that allows full pagination support without the need to have sorted table names 4. Add describe_namespace API which was a miss in the original implementation	2025-12-02 22:41:04 -08:00
Jonathan Hsieh	44878dd9a5	feat: support stable row IDs via storage_options (#2831 ) Add support for enabling stable row IDs when creating tables via the `new_table_enable_stable_row_ids` storage option. Stable row IDs ensure that row identifiers remain constant after compaction, update, delete, and merge operations. This is useful for materialized views and other use cases that need to track source rows across these operations. The option can be set at two levels: - Connection level: applies to all tables created with that connection - Table level: per-table override via create_table storage_options 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 13:57:00 -08:00
LanceDB Robot	4b5bb2d76c	chore: update lance dependency to v1.0.0-beta.16 (#2835 ) ## Summary - bump all Lance crates to v1.0.0-beta.16 via ci/set_lance_version.py - refresh Cargo.lock (reqwest/opendal/etc.) to satisfy the new release ## Verification - cargo clippy --workspace --tests --all-features -- -D warnings - cargo fmt --all Triggered by [refs/tags/v1.0.0-beta.16](https://github.com/lance-format/lance/releases/tag/v1.0.0-beta.16) --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2025-12-01 23:07:03 -08:00
Prashanth Rao	a250d8e7df	docs: improve docstring for RabitQ in Python (#2808 ) This PR improves the docstring for `IVF_RQ` (RabitQ) in Python. The earlier version referred to it as "residual quantization", which is confusing to future readers of the code. In contrast, the TypeScript and Rust codebases defined `IVF_RQ` as RabitQ. So now the three languages use comments that are consistent with one another. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-24 13:35:19 +08:00
Jack Ye	0baf807be0	ci: use larger runner for doctest and fix failing tests (#2801 ) Currently test would fail after installing to around pytorch	2025-11-20 19:44:31 -08:00
Prashanth Rao	135dfdc7ec	docs: 404 and outdated URLs should now work (#2800 ) Did a full scan of all URLs that used to point to the old mkdocs pages, and now links to the appropriate pages on lancedb.com/docs or lance.org docs. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-20 11:14:20 -08:00
Jackson Hew	bb6b0bea0c	fix: .phrase_query() not working (#2781 ) The `self._query` value was not set when wrapping its copy `query` with quotation marks. The test for phrase queries has been updated to test the `.phrase_query()` method as well, which will catch this bug. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-11-20 10:32:37 -08:00
Jack Ye	0084eb238b	fix: use None default for namespace (#2797 ) Realized that using [] is an anti-pattern in python for defaults: https://docs.python-guide.org/writing/gotchas/	2025-11-20 10:23:41 -08:00
Colin Patrick McCabe	7d3f5348a7	feat: implement head() for remote tables (#2793 ) Implemnent the head() function for RemoteTable.	2025-11-19 12:49:34 -08:00
Jack Ye	1b78ccedaf	feat: support async namespace connection (#2788 ) Also fix 2 bugs: 1. make storage options provider serializable in ray 2. fix table.to_table() uri is wrong for namespace-backed tables	2025-11-19 12:23:50 -08:00
Mykola Skrynnyk	ca8d118f78	feat(python): support `to_pydantic` in async (#2438 ) This request improves support for `pydantic` integration by adding `to_pydantic` method to asynchronous queries and handling models that use `alias` in field definitions. Fixes #2436 and closes #2437 . <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for converting asynchronous query results to Pydantic models. - Bug Fixes - Simplified conversion of query results to Pydantic models for improved reliability. - Improved handling of field aliases and computed fields when mapping query results to Pydantic models. - Tests - Added tests to verify correct mapping of aliased and computed fields in both synchronous and asynchronous scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-11-19 11:20:14 -08:00
Wyatt Alt	386fc9e466	feat: add num_attempts to merge insert result (#2795 ) This pipes the num_attempts field from lance's merge insert result through lancedb. This allows callers of merge_insert to get a better idea of whether transaction conflicts are occurring.	2025-11-19 09:32:57 -08:00
Will Jones	1cf3917a87	ci: make rust ci faster, get ci green (#2782 ) * Add `ci` profile for smaller build caches. This had a meaningful impact in Lance, and I expect a similar impact here. https://github.com/lancedb/lance/pull/5236 * Get caching working in Rust. Previously was not working due to `workspaces: rust`. * Get caching working in NodeJs lint job. Previously wasn't working because we installed the toolchain after we called `- uses: Swatinem/rust-cache@v2`, which invalidates the cache locally. * Fix broken pytest from async io transition (`pytest.PytestRemovedIn9Warning`) * Altered `get_num_sub_vectors` to handle bug in case of 4-bit PQ. This was cause of `rust future panicked: unknown error`. Raised an issue upstream to change panic to error: https://github.com/lancedb/lance/issues/5257 * Call `npm run docs` to fix doc issue. * Disable flakey Windows test for consistency. It's just an OS-specific timer issue, not our fault. * Fix Windows absolute path handling in namespaces. Was causing CI failure `OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: `	2025-11-18 09:04:56 -08:00
Ryan Green	92dbec1f95	fix: convert schema metadata to strings for JsonArrowSchema (#2786 ) Fixes pydantic validation errors when creating materialized views with namespace. ``` > return JsonArrowSchema(fields=fields, metadata=schema.metadata) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E pydantic_core._pydantic_core.ValidationError: 4 validation errors for JsonArrowSchema E metadata.b'geneva::view::query' E Input should be a valid string [type=string_type, input_value=b'{"base":{"vector_column...t-image:latest\\"}"}}]}', input_type=bytes] E For further information visit https://errors.pydantic.dev/2.12/v/string_type ```	2025-11-17 13:18:20 -03:30
Jack Ye	e47f552a86	feat: support namespace credentials vending (#2778 ) Based on https://github.com/lancedb/lance/pull/4984 1. Bump to 1.0.0-beta.2 2. Use DirectoryNamespace in lance to perform all testing in python and rust for much better coverage 3. Refactor `ListingDatabase` to be able to accept location and namespace. This is because we have to leverage listing database (local lancedb connection) for using namespace, namespace only resolves the location and storage options but we don't want to bind all the way to rust since user will plug-in namespace from python side. And thus `ListingDatabase` needs to be able to accept location and namespace that are created from namespace connection. 4. For credentials vending, we also pass storage options provider all the way to rust layer, and the rust layer calls back to the python function to fetch next storage option. This is exactly the same thing we did in pylance.	2025-11-17 00:42:24 -08:00
Colin Patrick McCabe	1ff594a6a4	feat: bump lance version to 0.40-0-beta.2 (#2772 ) Bump the bump lance version to 0.40-0-beta.2.	2025-11-10 14:36:37 -08:00
Prashanth Rao	8e06b8bfe1	feat: pare down docs to only show API refs (#2770 ) This PR does the following: - Pare down the docs to only what's needed (Python, JS/TS API docs and a pointer to Rust docs) - Styling changes to be more in line with the main website theme The relative URLs remain unchanged, so assuming CI passes, there should be no breaking changes from the main docs site that points back here.	2025-11-10 12:04:57 -05:00
Weston Pace	aeac9c7644	feat: add python Permutation class to mimic hugging face dataset and provide pytorch dataloader (#2725 )	2025-11-06 16:15:33 -08:00
LuQQiu	8b94308cf2	feat: add fts udtf in sql (#2755 ) Support FTS feature parity in SQL to match current Python API capability. Add `.to_json()` method to FTS query classes to enable usage with SQL `fts()` UDTF. Related: https://github.com/lancedb/blog-lancedb/pull/147 query = MatchQuery("puppy", "text", fuzziness=2) result = client.execute(f"SELECT * FROM fts('table', '{query.to_json()}')") --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-31 10:06:19 -07:00
fzowl	93c2cf2f59	feat(voyageai): update voyage integration (#2713 ) Adding multimodal usage guide VoyageAI integration changes: - Adding voyage-3.5 and voyage-3.5-lite models - Adding voyage-context-3 model - Adding rerank-2.5 and rerank-2.5-lite models	2025-10-29 16:49:07 +05:30
LuQQiu	199904ab35	chore: update lance dependency to v0.38.3-beta.11 (#2749 ) ## Summary - Updated all Lance dependencies from v0.38.3-beta.9 to v0.38.3-beta.11 - Migrated `lance-namespace-impls` to use new granular cloud provider features (`dir-aws`, `dir-gcp`, `dir-azure`, `dir-oss`) instead of deprecated `dir` feature - Updated namespace connection API to use `ConnectBuilder` instead of deprecated `connect()` function ## API Changes The Lance team refactored the `lance-namespace-impls` package in v0.38.3-beta.11: 1. Feature flags: The single `dir` feature was split into cloud provider-specific features: - `dir-aws` for AWS S3 support - `dir-gcp` for Google Cloud Storage support - `dir-azure` for Azure Blob Storage support - `dir-oss` for Alibaba Cloud OSS support 2. Connection API: The `connect()` function was replaced with a `ConnectBuilder` pattern for more flexibility ## Testing - ✅ Ran `cargo clippy --workspace --tests --all-features -- -D warnings` - no warnings - ✅ Ran `cargo fmt --all` - code formatted - ✅ All changes verified and committed ## Related This update was triggered by the Lance release: https://github.com/lancedb/lance/releases/tag/v0.38.3-beta.11 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-27 19:10:26 -07:00
Weston Pace	4cfcd95320	feat: add a permutation reader that can read a permutation view (#2712 ) This adds a rust permutation builder. In the next PR I will have python bindings and integration with pytorch.	2025-10-17 05:00:23 -07:00
Ayush Chaurasia	3f2e3986e9	feat: expand support for multivector colpali models and enchancements (#2719 )	2025-10-17 14:36:32 +05:30
Weston Pace	8f8e06a2da	feat: add output_schema method to queries (#2717 ) This is a helper utility I need for some of my data loader work. It makes it easy to see the output schema even when a `select` has been applied.	2025-10-14 05:13:28 -07:00
Weston Pace	5a19cf15a6	feat: a utility for creating "permutation views" (#2552 ) I'm working on a lancedb version of pytorch data loading (and hopefully addressing https://github.com/lancedb/lance/issues/3727). However, rather than rely on pytorch for everything I'm moving some of the things that pytorch does into rust. This gives us more control over data loading (e.g. using shards or a hash-based split) and it allows permutations to be persistent. In particular I hope to be able to: * Create a persistent permutation * This permutation can handle splits, filtering, shuffling, and sharding * Create a rust data loader that can read a permutation (one or more splits), or a subset of a permutation (for DDP) * Create a python data loader that delegates to the rust data loader Eventually create integrations for other data loading libraries, including rust & node	2025-10-09 18:07:31 -07:00
BubbleCal	b59d1007d3	feat(index): add IVF_RQ index type (#2687 ) this expose IVF_RQ (RabitQ quantization) index type to lancedb --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-10-09 15:46:18 +08:00
Wyatt Alt	3594538509	fix: add name to index config and fix create_index typing (#2660 ) Co-authored-by: Mark McCaskey <markm@harvey.ai>	2025-10-08 04:41:30 -07:00
Ed Rogers	d0ce489b21	fix: use stdlib override when possible (#2699 ) ## Description of changes Fixes #2698 This PR uses [`typing.override`](https://docs.python.org/3/library/typing.html#typing.override) in favor of the [`overrides`](https://pypi.org/project/overrides/) dependency when possible. As of Python 3.12, the standard library offers `typing.override` to perform a static check on overridden methods. ### Motivation Currently, `overrides` is incompatible with Python 3.14. As a result, any package that attempts to import `overrides` using Python 3.14+ will raise an `AttributeError`. An [issue](https://github.com/mkorpela/overrides/issues/127) has been raised and a [pull request](https://github.com/mkorpela/overrides/pull/133) has been submitted to the GitHub repo for the `overrides` project. But the maintainer has been unresponsive. To ensure readiness for Python 3.14, this package (and any other package directly depending on `overrides`) should consider using `typing.override` instead. ### Impact The standard library added `typing.override` as of 3.12. As a result, this change will affect only users of Python 3.12+. Previous versions will continue to rely on `overrides`. Notably, the standard library implementation is slightly different than that of `overrides`. A thorough discussion of those differences is shown in [PEP 698](https://peps.python.org/pep-0698/), and it is also summarized nicely by the maintainer of `overrides` [here](https://github.com/mkorpela/overrides/issues/126#issuecomment-2401327116). There are 2 main ways that switching from `overrides` to `typing.override` will have an impact on developers of this repo. 1. `typing.override` does not implement any runtime checking. Instead, it provides information to type checkers. 2. The stdlib does not provide a mixin class to enforce override decorators on child classes. (Their reasoning for this is explained in [the PEP](https://peps.python.org/pep-0698/).) This PR disables that behavior entirely by replacing the `EnforceOverrides`.	2025-10-06 11:23:20 -07:00
Weston Pace	e07389a36c	feat: allow bitmap indexes on large-string, binary, large-binary, and bitmap (#2678 ) The underlying `pylance` already supported this, it was just blocked out by an over-eager validation function Closes #1981	2025-09-25 09:46:42 -07:00
Will Jones	d617cdef4a	feat: add use_index parameter to merge insert operations (#2674 ) ## Summary Exposes `use_index` Merge Insert parameter, which was created upstream in https://github.com/lancedb/lance/pull/4688. ## API Examples ### Python ```python # Force table scan table.merge_insert(["id"]) \ .when_not_matched_insert_all() \ .use_index(False) \ .execute(data) ``` ### Node.js/TypeScript ```typescript // Force table scan await table.mergeInsert("id") .whenNotMatchedInsertAll() .useIndex(false) .execute(data); ``` ### Rust ```rust // Force table scan let mut builder = table.merge_insert(&["id"]); builder.when_not_matched_insert_all() .use_index(false); builder.execute(data).await?; ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-09-24 12:50:21 -07:00
Will Jones	1ab60fae7f	feat: upgrade Lance to v0.37.0 (#2672 ) Change logs: * https://github.com/lancedb/lance/releases/tag/v0.37.0 * https://github.com/lancedb/lance/releases/tag/v0.36.0	2025-09-23 13:41:47 -07:00
Ayush Chaurasia	e921c90c1b	feat: support mean reciprocal rank reranker (#2671 ) The basic idea of MRR is this - https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr I've implemented a weighted version for allowing user to set weightage between vector and fts. The gist is something like this ### Scenario A: Document at rank 1 in one set, absent from another ``` # Assuming equal weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.0 # absent → 0.0 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.0 = 0.5 ``` ### Scenario B: Document at rank 1 in one set, rank 2 in another ``` # Same weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.5 # rank 2 → 1/2 = 0.5 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.5 = 0.5 + 0.25 = 0.75 ``` And so with `return_score="all"` the result looks something like this (this is from the reranker tests). Because this is a weighted rank based reranker, some results might have the same score ``` text vector _distance _rowid _score _relevance_score 0 I am your father [-0.010703234, 0.069315575, 0.030076642, 0.002... 8.149148e-13 8589934598 10.978719 1.000000 1 the ground beneath my feet [-0.09500901, 0.00092102867, 0.0755851, 0.0372... 1.376896e+00 8589934604 NaN 0.250000 2 I find your lack of faith disturbing [0.07525753, -0.0100010475, 0.09990541, 0.0209... NaN 8589934595 3.483394 0.250000 3 but I don't wanna die [0.033476487, -0.011235877, -0.057625435, -0.0... 1.538222e+00 8589934610 1.130355 0.238095 4 if you strike me down I shall become more powe... [0.00432201, 0.030120496, 5.3317923e-05, 0.033... 1.381086e+00 8589934594 0.715157 0.216667 5 I see a salty message written in the eves [-0.04213107, 0.0016004723, 0.061052393, -0.02... 1.638301e+00 8589934603 1.043785 0.133333 6 but his son was mortal [0.012462767, 0.049041674, -0.057339743, -0.04... 1.421566e+00 8589934620 NaN 0.125000 7 I've got a bad feeling about this [-0.06973199, -0.029960092, 0.02641632, -0.031... NaN 8589934596 1.043785 0.125000 8 now that's a name I haven't heard in a long time [-0.014374257, -0.013588792, -0.07487557, 0.03... 1.597573e+00 8589934593 0.848772 0.118056 9 he was a god [-0.0258895, 0.11925236, -0.029397793, 0.05888... 1.423147e+00 8589934618 NaN 0.100000 10 I wish they would make another one [-0.14737535, -0.015304729, 0.04318139, -0.061... NaN 8589934622 1.043785 0.100000 11 Kratos had a son [-0.057455737, 0.13734367, -0.03537109, -0.000... 1.488075e+00 8589934617 NaN 0.083333 12 I don't wanna live like this [-0.0028891307, 0.015214227, 0.025183653, 0.08... NaN 8589934609 1.043785 0.071429 13 I see a mansard roof through the trees [0.052383978, 0.087759204, 0.014739997, 0.0239... NaN 8589934602 1.043785 0.062500 14 great kid don't get cocky [-0.047043696, 0.054648954, -0.008509666, -0.0... 1.618125e+00 8589934592 NaN 0.055556 ```	2025-09-23 18:25:18 +05:30
Jack Ye	ff71d7e552	feat: support shallow clone (#2653 ) Support shallow cloning a dataset at a specific location to create a new dataset, using the shallow_clone feature in Lance. Also introduce remote `clone` API for remote tables for this functionality.	2025-09-21 21:28:40 -07:00
Jack Ye	5b397e410b	chore: fix out of date tests with new namespace validation (#2663 ) Failure: https://github.com/lancedb/lancedb/actions/runs/17820044478/job/50660516344	2025-09-18 13:29:47 -07:00
Le Duc Manh	4c9fc3044b	fix: use create to resolve variables (#2640 ) # What - Use `create` to resolve variables values # Reference Fixes #2181 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-12 13:07:32 -07:00
BubbleCal	f7d78c3420	feat: add 'target_partition_size' param (#2642 ) this exposes the param `target_partition_size` from lance --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-09-11 22:56:16 +08:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
Wyatt Alt	a9ea785b15	fix: remote python sdk namespace typing (#2620 ) This changes the default values for some namespace parameters in the remote python SDK from None to [], to match the underlying code it calls. Prior to this commit, failing to supply "namespace" with the remote SDK would cause an error because the underlying code it dispatches to does not consider None to be valid input.	2025-09-02 16:32:32 -07:00
Colin Patrick McCabe	cc38453391	fix!: fix doctest in query.py (#2622 ) Fix doctest in query.py to include cumulative_cpu, now that lance includes that.	2025-09-02 15:47:32 -07:00
Will Jones	f6846004ca	feat: add `name` parameter to remaining Python create index calls (#2617 ) ## Summary This PR adds the missing `name` parameter to `create_scalar_index` and `create_fts_index` methods in the Python SDK, which was inadvertently omitted when it was added to `create_index` in PR #2586. ## Changes - Add `name: Optional[str] = None` parameter to abstract `Table.create_scalar_index` and `Table.create_fts_index` methods - Update `LanceTable` implementation to accept and pass the `name` parameter to the underlying Rust layer - Update `RemoteTable` implementation to accept and pass the `name` parameter - Enhanced tests to verify custom index names work correctly for both scalar and FTS indices - When `name` is not provided, default names are generated (e.g., `{column}_idx`) ## Test plan - [x] Added test cases for custom names in scalar index creation - [x] Added test cases for custom names in FTS index creation - [x] Verified existing tests continue to pass - [x] Code formatting and linting checks pass This ensures API consistency across all index creation methods in the LanceDB Python SDK. Fixes #2616 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-27 14:02:48 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Weston Pace	fabe37274f	feat: add __getitems__ method impl for torch integration (#2596 ) This allows a lancedb Table to act as a torch dataset.	2025-08-25 13:23:22 -07:00
Jack Ye	04285a4a4e	feat(python): integrate with lance namespace (#2599 ) This PR integrates `lancedb` with `lance-namespace` so that users can use LanceDB client to access Lance tables in any catalog services. In general, we expect most of the logic to be delegated to the existing `LanceDBConnection` and `LanceTable`, but the namespace implemenation will control how table is created, dropped, and describe where the table is stored with any related storage options like access credentials. The implementation currently only supports a 1 level namespace that directly contains tables. We will introduce nested namespace support in a separated PR. Users are expected to use it in the following way: ```python >>> import lancedb >>> import pyarrow as pa >>> # Connect using GlueNamespace >>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"}) >>> # Create a table with schema >>> schema = pa.schema([ ... pa.field("id", pa.int64()), ... pa.field("vector", pa.list_(pa.float32(), 2)) ... ]) >>> table = db.create_table("my_table", schema=schema) >>> # List tables >>> db.table_names() ['my_table'] ```	2025-08-20 15:46:16 -07:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00

1 2 3 4 5 ...

338 Commits