lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-01-07 12:22:59 +00:00

Author	SHA1	Message	Date
BubbleCal	bdb6c09c3b	feat: support binary vector and IVF_FLAT in TypeScript (#2221 ) resolve #2218 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:57:08 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Will Jones	b595d8a579	fix(nodejs): workaround for apache-arrow null vector issue (#2244 ) Fixes #2240	2025-03-20 08:07:10 -07:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
Will Jones	5b12a47119	feat!: revert query limit to be unbounded for scans (#2151 ) In earlier PRs (#1886, #1191) we made the default limit 10 regardless of the query type. This was confusing for users and in many cases a breaking change. Users would have queries that used to return all results, but instead only returned the first 10, causing silent bugs. Part of the cause was consistency: the Python sync API seems to have always had a limit of 10, while newer APIs (Python async and Nodejs) didn't. This PR sets the default limit only for searches (vector search, FTS), while letting scans (even with filters) be unbounded. It does this consistently for all SDKs. Fixes #1983 Fixes #1852 Fixes #2141	2025-02-26 10:32:14 -08:00
Will Jones	7ac5f74c80	feat!: add variable store to embeddings registry (#2112 ) BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should not implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-02-24 15:52:19 -08:00
Will Jones	2e3b34e79b	feat(node): support inserting and upserting subschemas (#2100 ) Fixes #2095 Closes #1832	2025-02-07 09:30:18 -08:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
Will Jones	f059372137	feat: add `drop_index()` method (#2039 ) Closes #1665	2025-01-20 10:08:51 -08:00
Will Jones	31f9c30ffb	chore: fix test of error message (#2018 ) Addresses failure on `main`: https://github.com/lancedb/lancedb/actions/runs/12757756657/job/35558683317	2025-01-13 15:36:46 -08:00
BubbleCal	3c0a64be8f	feat: support distance range in queries (#1999 ) this also updates the docs --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-01-08 11:03:27 +08:00
BubbleCal	c3ebac1a92	feat(node): support FTS options in nodejs (#1934 ) Closes #1790 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-12-12 08:19:04 -08:00
BubbleCal	3324e7d525	feat: support 4bit PQ (#1916 )	2024-12-10 10:36:03 +08:00
Will Jones	a43193c99b	fix(nodejs): upgrade arrow versions (#1924 ) Closes #1626	2024-12-09 15:37:11 -08:00
Will Jones	79eaa52184	feat: schema evolution APIs in all SDKs (#1851 ) * Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK and async Python * Add `data_type` parameter to node * Docs updates	2024-12-04 14:47:50 -08:00
QianZhu	2616a50502	fix: test errors after setting default limit (#1891 )	2024-11-26 16:03:16 -08:00
BubbleCal	b2f88f0b29	feat: support to sepcify ef search param (#1844 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-11-19 23:12:25 +08:00
Will Jones	587c0824af	feat: flexible null handling and insert subschemas in Python (#1827 ) * Test that we can insert subschemas (omit nullable columns) in Python. * More work is needed to support this in Node. See: https://github.com/lancedb/lancedb/issues/1832 * Test that we can insert data with nullable schema but no nulls in non-nullable schema. * Add `"null"` option for `on_bad_vectors` where we fill with null if the vector is bad. * Make null values not considered bad if the field itself is nullable.	2024-11-15 11:33:00 -08:00
Will Jones	abd75e0ead	feat: search multiple query vectors as one query (#1811 ) Allows users to pass multiple query vector as part of a single query plan. This just runs the queries in parallel without any further optimization. It's mostly a convenience. Previously, I think this was only handled by the sync Python remote API. This makes it common across all SDKs. Closes https://github.com/lancedb/lancedb/issues/1803 ```python >>> import lancedb >>> import asyncio >>> >>> async def main(): ... db = await lancedb.connect_async("./demo") ... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite") ... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas() ... >>> asyncio.run(main()) query_index id vector _distance 0 2 2 [4.0, 5.0, 6.0] 0.0 1 1 2 [4.0, 5.0, 6.0] 0.0 2 0 1 [1.0, 2.0, 3.0] 0.0 ```	2024-11-13 16:05:16 -08:00
Will Jones	3604d20ad3	feat(python,node): support with_row_id in Python and remote (#1784 ) Needed to support hybrid search in Remote SDK.	2024-11-04 11:25:45 -08:00
Will Jones	96181ab421	feat: `fast_search` in Python and Node (#1623 ) Sometimes it is acceptable to users to only search indexed data and skip and new un-indexed data. For example, if un-indexed data will be shortly indexed and they don't mind the delay. In these cases, we can save a lot of CPU time in search, and provide better latency. Users can activate this on queries using `fast_search()`.	2024-11-01 09:29:09 -07:00
Will Jones	f958f4d2e8	feat: remote index stats (#1702 ) BREAKING CHANGE: the return value of `index_stats` method has changed and all `index_stats` APIs now take index name instead of UUID. Also several deprecated index statistics methods were removed. * Removes deprecated methods for individual index statistics * Aligns public `IndexStatistics` struct with API response from LanceDB Cloud. * Implements `index_stats` for remote Rust SDK and Python async API.	2024-09-27 12:10:00 -07:00
BubbleCal	4b79db72bf	docs: improve the docs and API param name (#1629 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-11 10:18:29 +08:00
Gagan Bhullar	205fc530cf	feat: expose hnsw indices (#1595 ) PR closes #1522 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-10 11:08:13 -07:00
BubbleCal	2bde5401eb	feat: support to build FTS without positions (#1621 )	2024-09-10 22:51:32 +08:00
Gagan Bhullar	bcc19665ce	feat(nodejs): expose offset (#1620 ) PR closes #1555	2024-09-09 11:54:40 -07:00
Gagan Bhullar	d2caa5e202	feat(nodejs): add delete unverified (#1530 ) PR fixes part of #1527	2024-08-14 08:53:53 -07:00
Lei Xu	694ca30c7c	feat(nodejs): add bitmap and label list index types in nodejs (#1532 )	2024-08-11 12:06:02 -07:00
BubbleCal	f9d5fa88a1	feat!: migrate FTS from tantivy to lance-index (#1483 ) Lance now supports FTS, so add it into lancedb Python, TypeScript and Rust SDKs. For Python, we still use tantivy based FTS by default because the lance FTS index now misses some features of tantivy. For Python: - Support to create lance based FTS index - Support to specify columns for full text search (only available for lance based FTS index) For TypeScript: - Change the search method so that it can accept both string and vector - Support full text search For Rust - Support full text search The others: - Update the FTS doc BREAKING CHANGE: - for Python, this renames the attached score column of FTS from "score" to "_score", this could be a breaking change for users that rely the scores --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-08 15:33:15 +08:00
Will Jones	4f601a2d4c	fix: handle camelCase column names in select (#1460 ) Fixes #1385	2024-07-22 12:53:17 -07:00
Cory Grinstead	3b88f15774	fix(nodejs): lancedb arrow dependency (#1458 ) previously if you tried to install both vectordb and @lancedb/lancedb, you would get a peer dependency issue due to `vectordb` requiring `14.0.2` and `@lancedb/lancedb` requiring `15.0.0`. now `@lancedb/lancedb` should just work with any arrow version 13-17	2024-07-19 11:21:55 -05:00
Cory Grinstead	fdc949bafb	feat(nodejs): `update({values \| valuesSql})` (#1439 )	2024-07-10 14:09:39 -05:00
Cory Grinstead	b8ccea9f71	feat(nodejs): make tbl.search chainable (#1421 ) so this was annoying me when writing the docs. for a `search` query, one needed to chain `async` calls. ```ts const res = await (await tbl.search("greetings")).toArray() ``` now the promise will be deferred until the query is collected, leading to a more functional API ```ts const res = await tbl.search("greetings").toArray() ```	2024-07-02 14:31:57 -05:00
Nuvic	46c6ff889d	feat: add the explain_plan function (#1328 ) It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's `explain_plan` function. Addresses #1288 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-02 11:10:01 -07:00
Cory Grinstead	79a1667753	feat(nodejs): feature parity [6/N] - make public interface work with multiple arrow versions (#1392 ) previously we didnt have great compatibility with other versions of apache arrow. This should bridge that gap a bit. depends on https://github.com/lancedb/lancedb/pull/1391 see actual diff here https://github.com/universalmind303/lancedb/compare/query-filter...universalmind303:arrow-compatibility	2024-06-25 11:10:08 -05:00
Cory Grinstead	55f88346d0	feat(nodejs): table.indexStats (#1361 ) closes https://github.com/lancedb/lancedb/issues/1359	2024-06-21 17:06:52 -05:00
Cory Grinstead	a797f5fe59	feat(nodejs): feature parity [5/N] - add `query.filter()` alias (#1391 ) to make the transition from `vectordb` to `@lancedb/lancedb` as seamless as possible, this adds `query.filter` with a deprecated tag. depends on https://github.com/lancedb/lancedb/pull/1390 see actual diff here https://github.com/universalmind303/lancedb/compare/list-indices-name...universalmind303:query-filter	2024-06-21 16:03:58 -05:00
Cory Grinstead	3cd84c9375	feat(nodejs): feature parity [4/N] - add 'name' to 'IndexConfig' for 'listIndices' (#1390 ) depends on https://github.com/lancedb/lancedb/pull/1386 see actual diff here https://github.com/universalmind303/lancedb/compare/create-table-args...universalmind303:list-indices-name	2024-06-21 15:45:02 -05:00
Cory Grinstead	bc19a75f65	feat(nodejs): merge insert (#1351 ) closes https://github.com/lancedb/lancedb/issues/1349	2024-06-11 15:05:15 -05:00
Cory Grinstead	70f92f19a6	feat(nodejs): table.search functionality (#1341 ) closes https://github.com/lancedb/lancedb/issues/1256	2024-06-04 14:04:03 -05:00
Cory Grinstead	d9fb6457e1	fix(nodejs): better support for f16 and f64 (#1343 ) closes https://github.com/lancedb/lancedb/issues/1292 closes https://github.com/lancedb/lancedb/issues/1293	2024-06-04 13:41:21 -05:00
paul n walsh	7c133ec416	feat(nodejs): table.toArrow function (#1282 ) Addresses https://github.com/lancedb/lancedb/issues/1254. --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com>	2024-05-31 13:24:21 -05:00
Cory Grinstead	bc139000bd	feat(nodejs): add compatibility across arrow versions (#1337 ) while adding some more docs & examples for the new js sdk, i ran across a few compatibility issues when using different arrow versions. This should fix those issues.	2024-05-29 17:36:34 -05:00
Cory Grinstead	dbea3a7544	feat: js embedding registry (#1308 ) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-05-29 13:12:19 -05:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
Cory Grinstead	055efdcdb6	refactor(nodejs): use biomejs instead of eslint & prettier (#1304 ) I've been noticing a lot of friction with the current toolchain for '/nodejs'. Particularly with the usage of eslint and prettier. [Biome](https://biomejs.dev/) is an all in one formatter & linter that replaces the need for two different ones that can potentially clash with one another. I've been using it in the [nodejs-polars](https://github.com/pola-rs/nodejs-polars) repo for quite some time & have found it much more pleasant to work with. --- One other small change included in this PR: use [ts-jest](https://www.npmjs.com/package/ts-jest) so we can run our tests without having to rebuild typescript code first	2024-05-14 11:11:18 -05:00
QianZhu	bdd07a5dfa	fix nodejs test (#1141 ) changed the error msg for query with wrong vector dim thus need this change to pass the nodejs tests.	2024-04-05 16:33:37 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00

1 2

61 Commits