lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 19:10:41 +00:00

Author	SHA1	Message	Date
Weston Pace	e07389a36c	feat: allow bitmap indexes on large-string, binary, large-binary, and bitmap (#2678 ) The underlying `pylance` already supported this, it was just blocked out by an over-eager validation function Closes #1981	2025-09-25 09:46:42 -07:00
Lance Release	247fb58400	Bump version: 0.25.1 → 0.25.2-beta.0	2025-09-24 22:54:09 +00:00
Will Jones	d617cdef4a	feat: add use_index parameter to merge insert operations (#2674 ) ## Summary Exposes `use_index` Merge Insert parameter, which was created upstream in https://github.com/lancedb/lance/pull/4688. ## API Examples ### Python ```python # Force table scan table.merge_insert(["id"]) \ .when_not_matched_insert_all() \ .use_index(False) \ .execute(data) ``` ### Node.js/TypeScript ```typescript // Force table scan await table.mergeInsert("id") .whenNotMatchedInsertAll() .useIndex(false) .execute(data); ``` ### Rust ```rust // Force table scan let mut builder = table.merge_insert(&["id"]); builder.when_not_matched_insert_all() .use_index(false); builder.execute(data).await?; ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-09-24 12:50:21 -07:00
Lance Release	222e3264ab	Bump version: 0.25.1-beta.4 → 0.25.1	2025-09-23 22:06:08 +00:00
Lance Release	13505026cb	Bump version: 0.25.1-beta.3 → 0.25.1-beta.4	2025-09-23 22:06:08 +00:00
Will Jones	1ab60fae7f	feat: upgrade Lance to v0.37.0 (#2672 ) Change logs: * https://github.com/lancedb/lance/releases/tag/v0.37.0 * https://github.com/lancedb/lance/releases/tag/v0.36.0	2025-09-23 13:41:47 -07:00
Ayush Chaurasia	e921c90c1b	feat: support mean reciprocal rank reranker (#2671 ) The basic idea of MRR is this - https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr I've implemented a weighted version for allowing user to set weightage between vector and fts. The gist is something like this ### Scenario A: Document at rank 1 in one set, absent from another ``` # Assuming equal weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.0 # absent → 0.0 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.0 = 0.5 ``` ### Scenario B: Document at rank 1 in one set, rank 2 in another ``` # Same weights: weight_vector = 0.5, weight_fts = 0.5 vector_rr = 1.0 # rank 1 → 1/1 = 1.0 fts_rr = 0.5 # rank 2 → 1/2 = 0.5 weighted_mrr = 0.5 × 1.0 + 0.5 × 0.5 = 0.5 + 0.25 = 0.75 ``` And so with `return_score="all"` the result looks something like this (this is from the reranker tests). Because this is a weighted rank based reranker, some results might have the same score ``` text vector _distance _rowid _score _relevance_score 0 I am your father [-0.010703234, 0.069315575, 0.030076642, 0.002... 8.149148e-13 8589934598 10.978719 1.000000 1 the ground beneath my feet [-0.09500901, 0.00092102867, 0.0755851, 0.0372... 1.376896e+00 8589934604 NaN 0.250000 2 I find your lack of faith disturbing [0.07525753, -0.0100010475, 0.09990541, 0.0209... NaN 8589934595 3.483394 0.250000 3 but I don't wanna die [0.033476487, -0.011235877, -0.057625435, -0.0... 1.538222e+00 8589934610 1.130355 0.238095 4 if you strike me down I shall become more powe... [0.00432201, 0.030120496, 5.3317923e-05, 0.033... 1.381086e+00 8589934594 0.715157 0.216667 5 I see a salty message written in the eves [-0.04213107, 0.0016004723, 0.061052393, -0.02... 1.638301e+00 8589934603 1.043785 0.133333 6 but his son was mortal [0.012462767, 0.049041674, -0.057339743, -0.04... 1.421566e+00 8589934620 NaN 0.125000 7 I've got a bad feeling about this [-0.06973199, -0.029960092, 0.02641632, -0.031... NaN 8589934596 1.043785 0.125000 8 now that's a name I haven't heard in a long time [-0.014374257, -0.013588792, -0.07487557, 0.03... 1.597573e+00 8589934593 0.848772 0.118056 9 he was a god [-0.0258895, 0.11925236, -0.029397793, 0.05888... 1.423147e+00 8589934618 NaN 0.100000 10 I wish they would make another one [-0.14737535, -0.015304729, 0.04318139, -0.061... NaN 8589934622 1.043785 0.100000 11 Kratos had a son [-0.057455737, 0.13734367, -0.03537109, -0.000... 1.488075e+00 8589934617 NaN 0.083333 12 I don't wanna live like this [-0.0028891307, 0.015214227, 0.025183653, 0.08... NaN 8589934609 1.043785 0.071429 13 I see a mansard roof through the trees [0.052383978, 0.087759204, 0.014739997, 0.0239... NaN 8589934602 1.043785 0.062500 14 great kid don't get cocky [-0.047043696, 0.054648954, -0.008509666, -0.0... 1.618125e+00 8589934592 NaN 0.055556 ```	2025-09-23 18:25:18 +05:30
Lance Release	ebbeeff4e0	Bump version: 0.25.1-beta.2 → 0.25.1-beta.3	2025-09-22 04:47:42 +00:00
Jack Ye	ff71d7e552	feat: support shallow clone (#2653 ) Support shallow cloning a dataset at a specific location to create a new dataset, using the shallow_clone feature in Lance. Also introduce remote `clone` API for remote tables for this functionality.	2025-09-21 21:28:40 -07:00
Jack Ye	5b397e410b	chore: fix out of date tests with new namespace validation (#2663 ) Failure: https://github.com/lancedb/lancedb/actions/runs/17820044478/job/50660516344	2025-09-18 13:29:47 -07:00
Lance Release	5e1e9add07	Bump version: 0.25.1-beta.1 → 0.25.1-beta.2	2025-09-18 20:21:33 +00:00
Le Duc Manh	4c9fc3044b	fix: use create to resolve variables (#2640 ) # What - Use `create` to resolve variables values # Reference Fixes #2181 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-12 13:07:32 -07:00
Jack Ye	0ebc8d45a8	chore: fix no lock build warnings and CI timeouts (#2650 ) Example CI failures: - publish build timeout: https://github.com/lancedb/lancedb/actions/runs/17626482881/job/50084552906 - doc test build timeout: https://github.com/lancedb/lancedb/actions/runs/17627058590/job/50086456818	2025-09-11 15:30:35 -07:00
BubbleCal	f7d78c3420	feat: add 'target_partition_size' param (#2642 ) this exposes the param `target_partition_size` from lance --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-09-11 22:56:16 +08:00
Lance Release	b1d791a299	Bump version: 0.25.1-beta.0 → 0.25.1-beta.1	2025-09-10 20:48:56 +00:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Lance Release	e612686fdb	Bump version: 0.25.0 → 0.25.1-beta.0	2025-09-10 14:24:07 +00:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
Lance Release	f744b785f8	Bump version: 0.25.0-beta.2 → 0.25.0	2025-09-04 08:32:44 +00:00
Lance Release	2e3f745820	Bump version: 0.25.0-beta.1 → 0.25.0-beta.2	2025-09-04 08:32:43 +00:00
Lance Release	4dd399ca29	Bump version: 0.25.0-beta.0 → 0.25.0-beta.1	2025-09-03 17:50:41 +00:00
Wyatt Alt	a9ea785b15	fix: remote python sdk namespace typing (#2620 ) This changes the default values for some namespace parameters in the remote python SDK from None to [], to match the underlying code it calls. Prior to this commit, failing to supply "namespace" with the remote SDK would cause an error because the underlying code it dispatches to does not consider None to be valid input.	2025-09-02 16:32:32 -07:00
Colin Patrick McCabe	cc38453391	fix!: fix doctest in query.py (#2622 ) Fix doctest in query.py to include cumulative_cpu, now that lance includes that.	2025-09-02 15:47:32 -07:00
Lance Release	0847e666a0	Bump version: 0.24.4-beta.1 → 0.25.0-beta.0	2025-08-29 21:19:51 +00:00
Will Jones	f6846004ca	feat: add `name` parameter to remaining Python create index calls (#2617 ) ## Summary This PR adds the missing `name` parameter to `create_scalar_index` and `create_fts_index` methods in the Python SDK, which was inadvertently omitted when it was added to `create_index` in PR #2586. ## Changes - Add `name: Optional[str] = None` parameter to abstract `Table.create_scalar_index` and `Table.create_fts_index` methods - Update `LanceTable` implementation to accept and pass the `name` parameter to the underlying Rust layer - Update `RemoteTable` implementation to accept and pass the `name` parameter - Enhanced tests to verify custom index names work correctly for both scalar and FTS indices - When `name` is not provided, default names are generated (e.g., `{column}_idx`) ## Test plan - [x] Added test cases for custom names in scalar index creation - [x] Added test cases for custom names in FTS index creation - [x] Verified existing tests continue to pass - [x] Code formatting and linting checks pass This ensures API consistency across all index creation methods in the LanceDB Python SDK. Fixes #2616 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-27 14:02:48 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Weston Pace	fabe37274f	feat: add __getitems__ method impl for torch integration (#2596 ) This allows a lancedb Table to act as a torch dataset.	2025-08-25 13:23:22 -07:00
Lance Release	b88422e515	Bump version: 0.24.4-beta.0 → 0.24.4-beta.1	2025-08-22 03:54:34 +00:00
Jack Ye	04285a4a4e	feat(python): integrate with lance namespace (#2599 ) This PR integrates `lancedb` with `lance-namespace` so that users can use LanceDB client to access Lance tables in any catalog services. In general, we expect most of the logic to be delegated to the existing `LanceDBConnection` and `LanceTable`, but the namespace implemenation will control how table is created, dropped, and describe where the table is stored with any related storage options like access credentials. The implementation currently only supports a 1 level namespace that directly contains tables. We will introduce nested namespace support in a separated PR. Users are expected to use it in the following way: ```python >>> import lancedb >>> import pyarrow as pa >>> # Connect using GlueNamespace >>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"}) >>> # Create a table with schema >>> schema = pa.schema([ ... pa.field("id", pa.int64()), ... pa.field("vector", pa.list_(pa.float32(), 2)) ... ]) >>> table = db.create_table("my_table", schema=schema) >>> # List tables >>> db.table_names() ['my_table'] ```	2025-08-20 15:46:16 -07:00
Lance Release	adc3daa462	Bump version: 0.24.3 → 0.24.4-beta.0	2025-08-19 22:56:05 +00:00
Vitali Lovich	d602e9f98c	fix: make cloud features optional (#2567 ) (#2568 ) This shrinks the size of a local embedded build that can disable all the default features. When combined with https://github.com/lancedb/lance/pull/4362 and the dependencies are updated to point to the fix, this resolves #2567 fully. Verified by patching the workspace to redirect to my clone of lance with the PR applied. ``` cargo tree -p lancedb -e no-build -e no-dev --no-default-features -i aws-config \| less ``` The reason that lance itself needs to change too is that many dependencies within that project depend on lance-io/default and lancedb depends on them which transitively ends up enabling the cloud regardless. The PR in lance removes the dependency on lance-io/default from all sibling crates. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-08-15 16:46:52 -07:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00
Lance Release	bb809abd4b	Bump version: 0.24.3-beta.0 → 0.24.3	2025-08-15 18:02:04 +00:00
Lance Release	c87530f7a3	Bump version: 0.24.2 → 0.24.3-beta.0	2025-08-15 18:02:04 +00:00
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
Weston Pace	16beaaa656	ci: fix broken CI checks (#2585 )	2025-08-13 10:05:57 -07:00
Will Jones	9d683e4f0b	feat: infer vector columns when name contains 'vector' or 'embedding' (#2547 ) ## Summary - Enhanced vector column detection to use substring matching instead of exact matching - Now detects columns with names containing "vector" or "embedding" (case-insensitive) - Added integer vector support to Node.js implementation (matching Python) - Comprehensive test coverage for both float and integer vector types ## Changes ### Python (`python/python/lancedb/table.py`) - Updated `_infer_target_schema()` to use substring matching with helper function `_is_vector_column()` - Preserved original field names instead of forcing "vector" - Consolidated duplicate logic for better maintainability ### Node.js (`nodejs/lancedb/arrow.ts`) - Enhanced type inference with `nameSuggestsVectorColumn()` helper function - Added `isAllIntegers()` function with performance optimization (checks first 10 elements) - Implemented integer vector support using `Uint8` type (matching Python) - Improved type safety by removing `any` usage ### Tests - Python: Added `test_infer_target_schema_with_vector_embedding_names()` in `test_util.py` - Node.js: Added comprehensive test case in `arrow.test.ts` - Both test suites cover various naming patterns and integer/float vector types ## Examples of newly supported column names: - `user_vector`, `text_embedding`, `doc_embeddings` - `my_vector_field`, `embedding_model` - `VECTOR_COL`, `Vector_Mixed` (case-insensitive) - Both float and integer arrays are properly converted to fixed-size lists ## Test plan - [x] All existing tests pass (backward compatibility maintained) - [x] New tests pass for both Python and Node.js implementations - [x] Integer vector detection works correctly in Node.js - [x] Code passes linting and formatting checks - [x] Performance optimized for large vector arrays Fixes #2546 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 15:36:49 -07:00
Poornachandra.A.N	7d0127b376	feat(embeddings): add siglip embedding support to lancedb (#2499 ) ### Summary This PR adds SigLIP (Sigmoid Loss Image Pretraining) as a new embedding model in the LanceDB embedding registry. SigLIP improves image-text alignment performance using sigmoid-based contrastive loss and offers robust zero-shot generalization. Fixes #2498 ### What’s Implemented #### 1. `SigLIP` Embedding Class * Added `SigLIP` support under `python/lancedb/embeddings/siglip.py` * Implements: * `compute_source_embeddings` * `_batch_generate_embeddings` * Normalization logic * Batch-wise progress logging for image embedding #### 2. Registry Integration * Registered `SigLIP` in `embeddings/__init__.py` * `SigLIP` now usable via `connect(..., embedding="siglip")` #### 3. Evaluation Benchmark Support * Added SigLIP to `test_embeddings_slow.py` for side-by-side benchmarking with OpenCLIP and ImageBind ### New Test Methods #### `test_siglip` * End-to-end test to verify embeddings table creation and vector shape for SigLIP ![WhatsApp Image 2025-07-10 at 18 00 27_a3368163](https://github.com/user-attachments/assets/e5582ee1-80a3-43d7-a7a1-26ceecce9f4d) #### `test_siglip_vs_openclip_vs_imagebind_benchmark_full` * Benchmarks: * Recall\@1 / 5 / 10 * mAP (Mean Average Precision) * Embedding & Search Latency * Dimensionality reporting ![WhatsApp Image 2025-07-10 at 18 12 13_22c67a84](https://github.com/user-attachments/assets/455bf30f-62b7-4684-a3f3-ad52e2a1ffe5) ### Notes * SigLIP outputs 768D embeddings (vs 512D for OpenCLIP) * Benchmark shows competitive performance despite higher dimensionality * I'm still new to contributing to open-source and learning as I go. Please feel free to suggest any improvements — I'm happy to make changes!	2025-08-04 11:42:39 -07:00
Will Jones	02595dc475	feat: add overall timeout parameter to remote client (#2550 ) ## Summary - Adds an overall `timeout` parameter to `TimeoutConfig` that limits the total time for the entire request - Can be set via config or `LANCE_CLIENT_TIMEOUT` environment variable - Exposed in Python and Node.js bindings - Includes comprehensive tests ## Test plan - [x] Unit tests for Rust TimeoutConfig - [x] Integration tests for Python bindings - [x] Integration tests for Node.js bindings - [x] All existing tests pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-04 10:06:55 -07:00
Mark McCaskey	fe76496a59	fix: `.nprobes` method in python bindings, improve error messages (#2556 ) `nprobes` with a value greater than 20 fails with the minimum error: ``` self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30 def minimum_nprobes(self, minimum_nprobes: int) -> Self: """Set the minimum number of probes to use. See `nprobes` for more details. These partitions will be searched on every indexed vector query and will increase recall at the expense of latency. """ > self._inner.minimum_nprobes(minimum_nprobes) E ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes python/lancedb/query.py:2744: ValueError ``` Putting the max set before the min seems reasonable but it causes this reasonable case to fail: ``` def test_nprobes_min_max_works_sync(table): LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list() ``` with ``` self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4 def maximum_nprobes(self, maximum_nprobes: int) -> Self: """Set the maximum number of probes to use. See `nprobes` for more details. If this value is greater than `minimum_nprobes` then the excess partitions will be searched only if we have not found enough results. This can be useful when there is a narrow filter to allow these queries to spend more time searching and avoid potential false negatives. If this value is 0 then no limit will be applied and all partitions could be searched if needed to satisfy the limit. """ > self._inner.maximum_nprobes(maximum_nprobes) E ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes python/lancedb/query.py:2761: ValueError ```. The case I care about is where min == max, but this solution handles it even if they're not. If both min and max exist, we set both to the minimum and then set the max. This isn't 100% the same as the minimum setter checks for 0 on the min and `.nprobes` does not do any sanity checking at all. But I figured this was the most reasonable and general solution without touching more of this code. As part of this I noticed the error messages were a bit ambiguous so I made them symmetric and clarified them while I was here.	2025-07-30 09:23:25 -07:00
Lance Release	f79295c697	Bump version: 0.24.2-beta.2 → 0.24.2	2025-07-25 20:31:15 +00:00
Lance Release	381fad9b65	Bump version: 0.24.2-beta.1 → 0.24.2-beta.2	2025-07-25 20:31:15 +00:00
Tristan Zajonc	055bf91d3e	fix: handle empty list with schema in table creation (#2548 ) ## Summary Fixes IndexError when creating tables with empty list data and a provided schema. Previously, `_into_pyarrow_reader()` would attempt to access `data[0]` on empty lists, causing an IndexError. Now properly handles empty lists by using the provided schema. Also adds regression tests for GitHub issues #1968 and #303 to prevent future regressions with empty table scenarios. ## Changes - Fix IndexError in `_into_pyarrow_reader()` for empty list + schema case - Add Optional[pa.Schema] parameter to handle empty data gracefully - Add `test_create_table_empty_list_with_schema` for the IndexError fix - Add `test_create_empty_then_add_data` for issue #1968 - Add `test_search_empty_table` for issue #303 ## Test plan - [x] All new regression tests pass - [x] Existing tests continue to pass - [x] Code formatted with `make format`	2025-07-25 10:23:43 +08:00
Tristan Zajonc	10fa23e0d6	fix(python): expose register function in embeddings module (#2544 ) ## Summary Fixes #2541 Problem: The `register` function was not accessible via `from lancedb.embeddings import register` as documented, causing ImportError for users trying to create custom embedding functions. Solution: Added `register` to the exports in `python/lancedb/embeddings/__init__.py` to match the documented API and follow the same pattern as other registry functions (`get_registry`, `EmbeddingFunctionRegistry`). Root Cause: The function existed in `lancedb.embeddings.registry` but wasn't exposed through the main embeddings module interface. ## Changes - Add `register` to imports in `/python/python/lancedb/embeddings/__init__.py` ## Test Plan - [x] Verified `from lancedb.embeddings import register` works as documented - [x] Confirmed existing embedding tests pass - [x] Checked that the fix follows existing patterns (same as `get_registry`) - [x] Validated linting and formatting passes ## References Fixes #2541	2025-07-24 15:30:06 -07:00
yihong	43d9fc28b0	fix: can not build on python3.9 for dev (#2477 ) This patch fix can not build on python3.9 dev the reason is that for ibm-watsonx-ai the min version is py3.10 more can check on `pyoven` https://pyoven.org/package/ibm-watsonx-ai/ also fix tiny md lint --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-07-24 12:39:04 -07:00
aniaan	f45f0d0431	fix(python): correct type annotations in EmbeddingFunctionRegistry (#2478 ) - Fix register() method's alias parameter type from 'str = None' to 'Optional[str] = None' - Add return type annotation 'Type[EmbeddingFunction]' to get() method - Import Type from typing module for proper type hints	2025-07-24 12:31:49 -07:00
Tristan Zajonc	b9e3c36d82	fix: replace broken documentation URLs in error messages (#2533 ) Replaces broken 404 URL and unhelpful documentation links in type error messages with working URL and inline list of supported data types. Before: Points to https://lancedb.github.io/lance/read_and_write.html (404 error) After: Lists supported types inline and points to https://lancedb.github.io/lancedb/guides/tables/	2025-07-24 12:30:27 -07:00
Chen Chongchen	3cd7dd3375	fix: to_pydantic typing (#2517 ) currently, to_pydantic will always return LanceModel. If type checking is enabled in my project. I have to use `cast(data, List[RealModelType])` to solve type error. This PR uses generic to solve this problem.	2025-07-24 12:30:15 -07:00
Will Jones	3d1f102087	feat: allow Python and Typescript users to create `Session`s (#2530 ) ## Summary - Exposes `Session` in Python and Typescript so users can set the `index_cache_size_bytes` and `metadata_cache_size_bytes` * The `Session` is attached to the `Connection`, and thus shared across all tables in that connection. - Adds deprecation warnings for table-level cache configuration 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-24 12:06:29 -07:00
Tristan Zajonc	81afd8a42f	fix: use local random state in FTS test fixtures to prevent flaky failures (#2532 ) ## Summary Fixes intermittent CI failures in `test_search_fts[False]` where boolean FTS queries were returning fewer results than expected due to non-deterministic test data generation. ## Problem The test was using global `random` and `np.random` without seeding, causing the boolean query `MatchQuery("puppy", "text") & MatchQuery("runs", "text")` to sometimes return only 3 results instead of the expected 5, leading to `AssertionError: assert 3 == 5`. ## Solution - Replace global random calls with local `random.Random(42)` and `np.random.RandomState(42)` objects in test fixtures - Ensures deterministic test data while maintaining test isolation - No impact on other tests since random state is scoped to fixtures only ## Test Results - ✅ `test_search_fts[False]` now passes consistently - ✅ All other FTS tests continue to pass - ✅ No regression in other test suites (verified with `test_basic`) - ✅ Maintains existing test behavior and coverage	2025-07-24 11:30:02 -07:00

1 2 3 4 5 ...

846 Commits