lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 14:29:56 +00:00

Author	SHA1	Message	Date
Renato Marroquin	d0bc671cac	docs: add example for querying a lance table with SQL (#2389 ) Adds example for querying a dataset with SQL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added new guides on querying LanceDB tables using SQL with DuckDB and Apache Datafusion. - Included detailed instructions for integrating LanceDB with Datafusion in Python. - Updated navigation to include Datafusion and SQL querying documentation. - Improved formatting in TypeScript and vectordb update examples for consistency. - Tests - Added a new test demonstrating SQL querying on Lance tables via DataFusion integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-05-29 06:14:38 -07:00
David Myriel	d37e17593d	[doc] Add New Readme Page (#2405 ) Added a new readme for better navigation, updated language and more detail <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Updated the README with a modernized header, improved structure, and clearer descriptions of features and architecture. - Expanded and reorganized key features and product offerings for better clarity. - Simplified installation instructions and added a table of SDK interfaces with documentation links. - Enhanced community and contributor sections with new visuals and links to social and support channels. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-27 17:45:17 +02:00
BubbleCal	1902d65aad	docs: update the `num_partitions` recommendation (#2401 )	2025-05-23 23:45:37 +08:00
Ayush Chaurasia	dadcfebf8e	docs: add logos in genkit docs page (#2391 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added an integration banner image to the beginning of the Genkitx-LanceDB documentation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-20 01:40:12 +05:30
Ayush Chaurasia	81b59139f8	docs: add genkit integration docs (#2383 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added a comprehensive guide for integrating LanceDB with Genkit, including installation instructions, setup, indexing, retrieval, and building a custom RAG pipeline with example code and screenshots. - Updated the documentation navigation to include the new Genkit integration, making it accessible from the site menu. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-12 18:18:07 +05:30
ayush chaurasia	1026781ab6	Revert "update" This reverts commit `9c699b8cd9`.	2025-05-11 21:04:59 +05:30
ayush chaurasia	9c699b8cd9	update	2025-05-11 21:01:53 +05:30
Will Jones	272e4103b2	feat: provide timeout parameter for merge_insert (#2378 ) Provides the ability to set a timeout for merge insert. The default underlying timeout is however long the first attempt takes, or if there are multiple attempts, 30 seconds. This has two use cases: 1. Make the timeout shorter, when you want to fail if it takes too long. 2. Allow taking more time to do retries. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying a timeout when performing merge insert operations in Python, Node.js, and Rust APIs. - Introduced a new option to control the maximum allowed execution time for merge inserts, including retry timeout handling. - Documentation - Updated and added documentation to describe the new timeout option and its usage in APIs. - Tests - Added and updated tests to verify correct timeout behavior during merge insert operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-08 13:07:05 -07:00
LuQQiu	ed594b0f76	feat: return version for all write operations (#2368 ) return version info for all write operations (add, update, merge_insert and column modification operations) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Table modification operations (add, update, delete, merge, add/alter/drop columns) now return detailed result objects including version numbers and operation statistics. - Result objects provide clearer feedback such as rows affected and new table version after each operation. - Documentation - Updated documentation to describe new result objects and their fields for all relevant table operations. - Added documentation for new result interfaces and updated method return types in Node.js and Python APIs. - Tests - Enhanced test coverage to assert correctness of returned versioning and operation metadata after table modifications. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 14:25:34 -07:00
Alex Pilon	f315f9665a	feat: implement bindings to return merge stats (#2367 ) Based on this comment: https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075 and https://github.com/lancedb/lance/pull/2357 Here is my attempt at implementing bindings for returning merge stats from a `merge_insert.execute` call for lancedb. Note: I have almost no idea what I am doing in Rust but tried to follow existing code patterns and pay attention to compiler hints. - The change in nodejs binding appeared to be necessary to get compilation to work, presumably this could actual work properly by returning some kind of NAPI JS object of the stats data? - I am unsure of what to do with the remote/table.rs changes - necessarily for compilation to work; I assume this is related to LanceDB cloud, but unsure the best way to handle that at this point. Proof of function: ```python import pandas as pd import lancedb db = lancedb.connect("/tmp/test.db") test_data = pd.DataFrame( { "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"], "id": [1, 2, 3, 4, 5], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", ], } ) table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite") update_data = pd.DataFrame( { "title": [ "Hello, World", "Test Document, it's good", "Example", "Data Sample", "Last One", "New One", ], "id": [1, 2, 3, 4, 5, 6], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", "New content", ], } ) stats = ( table.merge_insert(on="id") .when_matched_update_all() .when_not_matched_insert_all() .execute(update_data) ) print(stats) ``` returns ``` {'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Merge-insert operations now return detailed statistics, including counts of inserted, updated, and deleted rows. - Bug Fixes - Tests updated to validate returned merge-insert statistics for accuracy. - Documentation - Method documentation improved to reflect new return values and clarify merge operation results. - Added documentation for the new `MergeStats` interface detailing operation statistics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 10:00:20 -07:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
Eileen Noonan	1620ba3508	docs: make table.update() nodejs guide consistent with API documentation (#2334 ) The docs in the Guide here do not match the [API reference] (https://lancedb.github.io/lancedb/js/classes/Table/#updateopts) for the nodejs client. I am writing an Elixir wrapper over the typescript library (Rust forthcoming!) and confirmed in testing that the API reference is correct vs the Guide. Following the Guide docs, the error I got was: "lance error: Invalid user input: Schema error: No field named bar. Valid fields are foo. For a query of: await table.update({foo: "buzz"}, { where: "foo = 'bar'"}); Over a table with a schema of just {foo: Utf8}. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Reformatted a code snippet in the guide to enhance readability by splitting it into multiple lines for improved clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:38:16 -07:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
Adam Azzam	c42a201389	docs: remove trailing commas from AWS IAM Policies (#2324 ) Before: <img width="1173" alt="Screenshot 2025-04-08 at 10 58 50 AM" src="https://github.com/user-attachments/assets/e5c69c45-ab68-488f-9c7f-e12f7ecbfaab" /> After: <img width="1136" alt="Screenshot 2025-04-08 at 10 58 58 AM" src="https://github.com/user-attachments/assets/108c11ea-09b3-49b5-9a50-b880e72a0270" /> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Updated JSON policy examples in the storage guides to correct formatting issues and enhance syntax clarity for readers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-16 13:09:21 -07:00
BubbleCal	2248aa9508	fix: bugs for new FTS APIs (#2314 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities with support for phrase queries, fuzzy matching, boosting, and multi-column matching. - Search methods now accept full-text query objects directly, improving query flexibility and precision. - Python and JavaScript SDKs updated to handle full-text queries seamlessly, including async search support. - Tests - Added comprehensive tests covering fuzzy search, phrase search, and boosted queries to ensure robust full-text search functionality. - Documentation - Updated query class documentation to reflect new constructor options and removal of deprecated methods for clarity and simplicity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-15 11:51:35 +08:00
Will Jones	b3a4efd587	fix: revert change default read_consistency_interval=5s (#2327 ) This reverts commit `a547c523c2` or #2281 The current implementation can cause panics and performance degradation. I will bring this back with more testing in https://github.com/lancedb/lancedb/pull/2311 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Enhanced clarity on read consistency settings with updated descriptions and default behavior. - Removed outdated warnings about eventual consistency from the troubleshooting guide. - Refactor - Streamlined the handling of the read consistency interval across integrations, now defaulting to "None" for improved performance. - Simplified internal logic to offer a more consistent experience. - Tests - Updated test expectations to reflect the new default representation for the read consistency interval. - Removed redundant tests related to "no consistency" settings for streamlined testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-04-14 08:48:15 -07:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
QianZhu	b3b5362632	docs: replace Lancedb Cloud link (#2259 ) * direct users to cloud.lancedb.com since LanceDB Cloud is in public beta * removed the `cast vector dimension` from alter columns as we don't support it	2025-03-21 17:43:00 -07:00
Will Jones	abe06fee3d	feat(python): warn on fork (#2258 ) Closes #768	2025-03-21 17:18:10 -07:00
Will Jones	b2a38ac366	fix: make pylance optional again (#2209 ) The two remaining blockers were: * A method `with_embeddings` that was deprecated a year ago * A typecheck for `LanceDataset`	2025-03-21 11:26:32 -07:00
BubbleCal	bdb6c09c3b	feat: support binary vector and IVF_FLAT in TypeScript (#2221 ) resolve #2218 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:57:08 -07:00
Will Jones	2bfdef2624	ci: refactor node releases (#2223 ) This PR fixes build issues associated with `aws-lc-rs`, while simplifying the build process. Previously, we used custom scripts for the musl and Windows ARM builds. These were complicated and prone to breaking. This PR switches to a setup that mirrors https://github.com/napi-rs/package-template/blob/main/.github/workflows/CI.yml. * linux glibc and musl builds now use the Docker images provided by the napi project * Windows ARM build now just cross compiles from Windows x64, which turns out to work quite well.	2025-03-21 10:56:29 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Ayush Chaurasia	b9afd9c860	docs: add late interaction, multi-vector guide & link example (#2231 ) 1/2 docs update for this week. Addesses issues from this docs epic - https://github.com/lancedb/lancedb/issues/1476	2025-03-20 20:29:32 +05:30
Ayush Chaurasia	ae1548b507	docs: add cloud & enterprise cta (#2235 ) 2/2 docs update this week - Add cloud & enterprise CTA - remove outdated projects/examples from landing page	2025-03-19 10:55:05 -07:00
Gagan Bhullar	14677d7c18	fix: metric type inconsistency (#2122 ) PR fixes #2113 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-12 10:28:37 -07:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
QianZhu	c9d6fc43a6	docs: use bypass_vector_index() instead of use_index=false (#2115 )	2025-03-12 09:31:09 -07:00
ayao227	dfe4ba8dad	chore: add reo integration (#2149 ) This PR adds reo integration to the lancedb documentation website.	2025-02-28 07:51:34 -08:00
Will Jones	7ac5f74c80	feat!: add variable store to embeddings registry (#2112 ) BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should not implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-02-24 15:52:19 -08:00
Will Jones	ecdee4d2b1	feat(python): add search() method to async API (#2049 ) Reviving #1966. Closes #1938 The `search()` method can apply embeddings for the user. This simplifies hybrid search, so instead of writing: ```python vector_query = embeddings.compute_query_embeddings("flower moon")[0] await ( async_tbl.query() .nearest_to(vector_query) .nearest_to_text("flower moon") .to_pandas() ) ``` You can write: ```python await (await async_tbl.search("flower moon", query_type="hybrid")).to_pandas() ``` Unfortunately, we had to do a double-await here because `search()` needs to be async. This is because it often needs to do IO to retrieve and run an embedding function.	2025-02-24 14:19:25 -08:00
Lei Xu	6fa1f37506	docs: improve pydantic integration docs (#2136 ) Address usage mistakes in https://github.com/lancedb/lancedb/issues/2135. * Add example of how to use `LanceModel` and `Vector` decorator * Add test for pydantic doc * Fix the example to directly use LanceModel instead of calling `MyModel.to_arrow_schema()` in the example. * Add cross-reference link to pydantic doc site * Configure mkdocs to watch code changes in python directory.	2025-02-21 12:48:37 -08:00
Weston Pace	a7755cb313	docs: standardize node example prints (#2080 ) Minor cleanup to help debug future CI failures	2025-02-11 08:26:29 -08:00
Will Jones	2e3b34e79b	feat(node): support inserting and upserting subschemas (#2100 ) Fixes #2095 Closes #1832	2025-02-07 09:30:18 -08:00
Weston Pace	1a449fa49e	refactor: rename drop_db / drop_database to drop_all_tables, expose database from connection (#2098 ) If we start supporting external catalogs then "drop database" may be misleading (and not possible). We should be more clear that this is a utility method to drop all tables. This is also a nice chance for some consistency cleanup as it was `drop_db` in rust, `drop_database` in python, and non-existent in typescript. This PR also adds a public accessor to get the database trait from a connection. BREAKING CHANGE: the `drop_database` / `drop_db` methods are now deprecated.	2025-02-06 13:22:28 -08:00
Will Jones	16851389ea	feat: extra headers parameter in client options (#2091 ) Closes #1106 Unfortunately, these need to be set at the connection level. I investigated whether if we let users provide a callback they could use `AsyncLocalStorage` to access their context. However, it doesn't seem like NAPI supports this right now. I filed an issue: https://github.com/napi-rs/napi-rs/issues/2456	2025-02-04 17:26:45 -08:00
Weston Pace	c269524b2f	feat!: refactor ConnectionInternal into a Database trait (#2067 ) This opens up the door for more custom database implementations than the two we have today. The biggest change should be inivisble: `ConnectionInternal` has been renamed to `Database`, made public, and refactored However, there are a few breaking changes. `data_storage_version` and `enable_v2_manifest_paths` have been moved from options on `create_table` to options for the database which are now set via `storage_options`. Before: ``` db = connect(uri) tbl = db.create_table("my_table", data, data_storage_version="legacy", enable_v2_manifest_paths=True) ``` After: ``` db = connect(uri, storage_options={ "new_table_enable_v2_manifest_paths": "true", "new_table_data_storage_version": "legacy" }) tbl = db.create_table("my_table", data) ``` BREAKING CHANGE: the data_storage_version, enable_v2_manifest_paths options have moved from options to create_table to storage_options. BREAKING CHANGE: the use_legacy_format option has been removed, data_storage_version has replaced it for some time now	2025-02-04 14:35:14 -08:00
Will Jones	2fc174f532	docs: add sync/async tabs to quickstart (#2087 ) Closes #2033	2025-01-31 15:43:54 -08:00
Will Jones	dba85f4d6f	docs: user guide for merge insert (#2083 ) Closes #2062	2025-01-31 10:03:21 -08:00
Will Jones	e05c0cd87e	ci(node): check docs in CI (#2084 ) * Make `npm run docs` fail if there are any warnings. This will catch items missing from the API reference. * Add a check in our CI to make sure `npm run dos` runs without warnings and doesn't generate any new files (indicating it might be out-of-date. * Hide constructors that aren't user facing. * Remove unused enum `WriteMode`. Closes #2068	2025-01-30 16:06:06 -08:00
Weston Pace	e6b4f14c1f	docs: clarify upper case characters in column names need to be escaped (#2079 )	2025-01-29 09:34:43 -08:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
V	d999d72c8d	docs: pandas example (#2044 ) Fix example for section ## From pandas DataFrame	2025-01-24 11:37:47 -08:00
Will Jones	bcfc93cc88	fix(python): various fixes for async query builders (#2048 ) This includes several improvements and fixes to the Python Async query builders: 1. The API reference docs show all the methods for each builder 2. The hybrid query builder now has all the same setter methods as the vector search one, so you can now set things like `.distance_type()` on a hybrid query. 3. Re-rankers are now properly hooked up and tested for FTS and vector search. Previously the re-rankers were accidentally bypassed in unit tests, because the builders overrode `.to_arrow()`, but the unit test called `.to_batches()` which was only defined in the base class. Now all builders implement `.to_batches()` and leave `.to_arrow()` to the base class. 4. The `AsyncQueryBase` and `AsyncVectoryQueryBase` setter methods now return `Self`, which provides the appropriate subclass as the type hint return value. Previously, `AsyncQueryBase` had them all hard-coded to `AsyncQuery`, which was unfortunate. (This required bringing in `typing-extensions` for older Python version, but I think it's worth it.)	2025-01-20 16:14:34 -08:00
BubbleCal	214d0debf5	docs: claim LanceDB supports float16/float32/float64 for multivector (#2040 )	2025-01-21 07:04:15 +08:00

1 2 3 4 5 ...

356 Commits