lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-06 12:30:40 +00:00

Author	SHA1	Message	Date
BubbleCal	c3ebac1a92	feat(node): support FTS options in nodejs (#1934 ) Closes #1790 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-12-12 08:19:04 -08:00
BubbleCal	3324e7d525	feat: support 4bit PQ (#1916 )	2024-12-10 10:36:03 +08:00
Will Jones	a43193c99b	fix(nodejs): upgrade arrow versions (#1924 ) Closes #1626	2024-12-09 15:37:11 -08:00
Will Jones	79eaa52184	feat: schema evolution APIs in all SDKs (#1851 ) * Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK and async Python * Add `data_type` parameter to node * Docs updates	2024-12-04 14:47:50 -08:00
QianZhu	2616a50502	fix: test errors after setting default limit (#1891 )	2024-11-26 16:03:16 -08:00
BubbleCal	b2f88f0b29	feat: support to sepcify ef search param (#1844 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-11-19 23:12:25 +08:00
Will Jones	587c0824af	feat: flexible null handling and insert subschemas in Python (#1827 ) * Test that we can insert subschemas (omit nullable columns) in Python. * More work is needed to support this in Node. See: https://github.com/lancedb/lancedb/issues/1832 * Test that we can insert data with nullable schema but no nulls in non-nullable schema. * Add `"null"` option for `on_bad_vectors` where we fill with null if the vector is bad. * Make null values not considered bad if the field itself is nullable.	2024-11-15 11:33:00 -08:00
Will Jones	abd75e0ead	feat: search multiple query vectors as one query (#1811 ) Allows users to pass multiple query vector as part of a single query plan. This just runs the queries in parallel without any further optimization. It's mostly a convenience. Previously, I think this was only handled by the sync Python remote API. This makes it common across all SDKs. Closes https://github.com/lancedb/lancedb/issues/1803 ```python >>> import lancedb >>> import asyncio >>> >>> async def main(): ... db = await lancedb.connect_async("./demo") ... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite") ... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas() ... >>> asyncio.run(main()) query_index id vector _distance 0 2 2 [4.0, 5.0, 6.0] 0.0 1 1 2 [4.0, 5.0, 6.0] 0.0 2 0 1 [1.0, 2.0, 3.0] 0.0 ```	2024-11-13 16:05:16 -08:00
Will Jones	3604d20ad3	feat(python,node): support with_row_id in Python and remote (#1784 ) Needed to support hybrid search in Remote SDK.	2024-11-04 11:25:45 -08:00
Will Jones	96181ab421	feat: `fast_search` in Python and Node (#1623 ) Sometimes it is acceptable to users to only search indexed data and skip and new un-indexed data. For example, if un-indexed data will be shortly indexed and they don't mind the delay. In these cases, we can save a lot of CPU time in search, and provide better latency. Users can activate this on queries using `fast_search()`.	2024-11-01 09:29:09 -07:00
Will Jones	a324f4ad7a	feat(node): enable logging and show full errors (#1775 ) This exposes the `LANCEDB_LOG` environment variable in node, so that users can now turn on logging. In addition, fixes a bug where only the top-level error from Rust was being shown. This PR makes sure the full error chain is included in the error message. In the future, will improve this so the error chain is set on the [cause](https://nodejs.org/api/errors.html#errorcause) property of JS errors https://github.com/lancedb/lancedb/issues/1779 Fixes #1774	2024-10-29 15:13:34 -07:00
Will Jones	f3b6a1f55b	feat(node): bind remote SDK to rust implementation (#1730 ) Closes [#2509](https://github.com/lancedb/sophon/issues/2509) This is the Node.js analogue of #1700	2024-10-09 11:46:27 -06:00
Will Jones	f958f4d2e8	feat: remote index stats (#1702 ) BREAKING CHANGE: the return value of `index_stats` method has changed and all `index_stats` APIs now take index name instead of UUID. Also several deprecated index statistics methods were removed. * Removes deprecated methods for individual index statistics * Aligns public `IndexStatistics` struct with API response from LanceDB Cloud. * Implements `index_stats` for remote Rust SDK and Python async API.	2024-09-27 12:10:00 -07:00
LuQQiu	abeaae3d80	feat!: upgrade Lance to 0.18.0 (#1657 ) BREAKING CHANGE: default file format changed to Lance v2.0. Upgrade Lance to 0.18.0 Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0	2024-09-19 10:50:26 -07:00
BubbleCal	4b79db72bf	docs: improve the docs and API param name (#1629 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-11 10:18:29 +08:00
Gagan Bhullar	205fc530cf	feat: expose hnsw indices (#1595 ) PR closes #1522 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-10 11:08:13 -07:00
BubbleCal	2bde5401eb	feat: support to build FTS without positions (#1621 )	2024-09-10 22:51:32 +08:00
Gagan Bhullar	bcc19665ce	feat(nodejs): expose offset (#1620 ) PR closes #1555	2024-09-09 11:54:40 -07:00
Will Jones	2a6586d6fb	feat: add flag to enable faster manifest paths (#1612 ) The new V2 manifest path scheme makes discovering the latest version of a table constant time on object stores, regardless of the number of versions in the table. See benchmarks in the PR here: https://github.com/lancedb/lance/pull/2798 Closes #1583	2024-09-09 11:34:36 -07:00
Gagan Bhullar	d2caa5e202	feat(nodejs): add delete unverified (#1530 ) PR fixes part of #1527	2024-08-14 08:53:53 -07:00
Lei Xu	694ca30c7c	feat(nodejs): add bitmap and label list index types in nodejs (#1532 )	2024-08-11 12:06:02 -07:00
BubbleCal	f9d5fa88a1	feat!: migrate FTS from tantivy to lance-index (#1483 ) Lance now supports FTS, so add it into lancedb Python, TypeScript and Rust SDKs. For Python, we still use tantivy based FTS by default because the lance FTS index now misses some features of tantivy. For Python: - Support to create lance based FTS index - Support to specify columns for full text search (only available for lance based FTS index) For TypeScript: - Change the search method so that it can accept both string and vector - Support full text search For Rust - Support full text search The others: - Update the FTS doc BREAKING CHANGE: - for Python, this renames the attached score column of FTS from "score" to "_score", this could be a breaking change for users that rely the scores --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-08 15:33:15 +08:00
Will Jones	61c05b51a0	fix(nodejs): address import issues in `lancedb` npm module (#1503 ) Fixes [#1496](https://github.com/lancedb/lancedb/issues/1496)	2024-08-05 16:30:27 -07:00
Will Jones	4f601a2d4c	fix: handle camelCase column names in select (#1460 ) Fixes #1385	2024-07-22 12:53:17 -07:00
Cory Grinstead	3b88f15774	fix(nodejs): lancedb arrow dependency (#1458 ) previously if you tried to install both vectordb and @lancedb/lancedb, you would get a peer dependency issue due to `vectordb` requiring `14.0.2` and `@lancedb/lancedb` requiring `15.0.0`. now `@lancedb/lancedb` should just work with any arrow version 13-17	2024-07-19 11:21:55 -05:00
Cory Grinstead	fdc949bafb	feat(nodejs): `update({values \| valuesSql})` (#1439 )	2024-07-10 14:09:39 -05:00
Cory Grinstead	31be9212da	docs(nodejs): add @lancedb/lancedb examples everywhere (#1411 ) Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-10 13:29:03 -05:00
Cory Grinstead	b8ccea9f71	feat(nodejs): make tbl.search chainable (#1421 ) so this was annoying me when writing the docs. for a `search` query, one needed to chain `async` calls. ```ts const res = await (await tbl.search("greetings")).toArray() ``` now the promise will be deferred until the query is collected, leading to a more functional API ```ts const res = await tbl.search("greetings").toArray() ```	2024-07-02 14:31:57 -05:00
Nuvic	46c6ff889d	feat: add the explain_plan function (#1328 ) It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's `explain_plan` function. Addresses #1288 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-02 11:10:01 -07:00
Cory Grinstead	5c3a88b6b2	feat(nodejs): add better typehints for registry (#1408 ) previously the `registry` would return `undefined \| EmbeddingFunction` even for built in functions such as "openai" now it'll return the correct type for `getRegistry().get("openai") as well as pass in the correct options type to `create` ### before ```ts const options: {model: 'not-a-real-model'} // this'd compile just fine, but result in runtime error const openai: EmbeddingFunction \| undefined = getRegistry().get("openai").create(options) // this'd also compile fine const openai: EmbeddingFunction \| undefined = getRegistry().get("openai").create({MODEL: ''}) ``` ### after ```ts const options: {model: 'not-a-real-model'} const openai: OpenAIEmbeddingFunction = getRegistry().get("openai").create(options) // Type '"not-a-real-model"' is not assignable to type '"text-embedding-ada-002" \| "text-embedding-3-large" \| "text-embedding-3-small" \| undefined' ```	2024-07-01 12:49:42 -05:00
Will Jones	865ed99881	feat: dynamodb commit store support (#1410 ) This allows users to specify URIs like: ``` s3+ddb://my_bucket/path?ddbTableName=myCommitTable ``` and it will support concurrent writes in S3. * [x] Add dynamodb integration tests * [x] Add modifications to get it working in Python sync API * [x] Added section in documentation describing how to configure. Closes #534 --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com>	2024-06-28 09:30:36 -07:00
Cory Grinstead	79a1667753	feat(nodejs): feature parity [6/N] - make public interface work with multiple arrow versions (#1392 ) previously we didnt have great compatibility with other versions of apache arrow. This should bridge that gap a bit. depends on https://github.com/lancedb/lancedb/pull/1391 see actual diff here https://github.com/universalmind303/lancedb/compare/query-filter...universalmind303:arrow-compatibility	2024-06-25 11:10:08 -05:00
Cory Grinstead	55f88346d0	feat(nodejs): table.indexStats (#1361 ) closes https://github.com/lancedb/lancedb/issues/1359	2024-06-21 17:06:52 -05:00
Cory Grinstead	a797f5fe59	feat(nodejs): feature parity [5/N] - add `query.filter()` alias (#1391 ) to make the transition from `vectordb` to `@lancedb/lancedb` as seamless as possible, this adds `query.filter` with a deprecated tag. depends on https://github.com/lancedb/lancedb/pull/1390 see actual diff here https://github.com/universalmind303/lancedb/compare/list-indices-name...universalmind303:query-filter	2024-06-21 16:03:58 -05:00
Cory Grinstead	3cd84c9375	feat(nodejs): feature parity [4/N] - add 'name' to 'IndexConfig' for 'listIndices' (#1390 ) depends on https://github.com/lancedb/lancedb/pull/1386 see actual diff here https://github.com/universalmind303/lancedb/compare/create-table-args...universalmind303:list-indices-name	2024-06-21 15:45:02 -05:00
Cory Grinstead	33cc9b682f	feat(nodejs): feature parity [3/N] - `createTable({name, data, ...options})` (#1386 ) adds support for the `vectordb` syntax of `createTable({name, data, ...options})`. depends on https://github.com/lancedb/lancedb/pull/1380 see actual diff here https://github.com/universalmind303/lancedb/compare/table-name...universalmind303:create-table-args	2024-06-21 12:17:39 -05:00
Cory Grinstead	bc19a75f65	feat(nodejs): merge insert (#1351 ) closes https://github.com/lancedb/lancedb/issues/1349	2024-06-11 15:05:15 -05:00
Weston Pace	d5586c9c32	feat: make it possible to opt in to using the v2 format (#1352 ) This also exposed the max_batch_length configuration option in python/node (it was needed to verify if we are actually in v2 mode or not)	2024-06-04 21:52:14 -07:00
Cory Grinstead	70f92f19a6	feat(nodejs): table.search functionality (#1341 ) closes https://github.com/lancedb/lancedb/issues/1256	2024-06-04 14:04:03 -05:00
Cory Grinstead	d9fb6457e1	fix(nodejs): better support for f16 and f64 (#1343 ) closes https://github.com/lancedb/lancedb/issues/1292 closes https://github.com/lancedb/lancedb/issues/1293	2024-06-04 13:41:21 -05:00
paul n walsh	7c133ec416	feat(nodejs): table.toArrow function (#1282 ) Addresses https://github.com/lancedb/lancedb/issues/1254. --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com>	2024-05-31 13:24:21 -05:00
Cory Grinstead	bc139000bd	feat(nodejs): add compatibility across arrow versions (#1337 ) while adding some more docs & examples for the new js sdk, i ran across a few compatibility issues when using different arrow versions. This should fix those issues.	2024-05-29 17:36:34 -05:00
Cory Grinstead	dbea3a7544	feat: js embedding registry (#1308 ) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-05-29 13:12:19 -05:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
Cory Grinstead	055efdcdb6	refactor(nodejs): use biomejs instead of eslint & prettier (#1304 ) I've been noticing a lot of friction with the current toolchain for '/nodejs'. Particularly with the usage of eslint and prettier. [Biome](https://biomejs.dev/) is an all in one formatter & linter that replaces the need for two different ones that can potentially clash with one another. I've been using it in the [nodejs-polars](https://github.com/pola-rs/nodejs-polars) repo for quite some time & have found it much more pleasant to work with. --- One other small change included in this PR: use [ts-jest](https://www.npmjs.com/package/ts-jest) so we can run our tests without having to rebuild typescript code first	2024-05-14 11:11:18 -05:00
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
QianZhu	bdd07a5dfa	fix nodejs test (#1141 ) changed the error msg for query with wrong vector dim thus need this change to pass the nodejs tests.	2024-04-05 16:33:37 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00

1 2

64 Commits