lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 23:12:58 +00:00

Author	SHA1	Message	Date
Weston Pace	b36c750cc7	fix: fix compile error in example caused by merge conflict (#1135 )	2024-04-05 16:33:06 -07:00
Weston Pace	a23b856410	feat: change DistanceType to be independent thing instead of resuing lance_linalg (#1133 ) This PR originated from a request to add `Serialize` / `Deserialize` to `lance_linalg::distance::DistanceType`. However, that is a strange request for `lance_linalg` which shouldn't really have to worry about `Serialize` / `Deserialize`. The problem is that `lancedb` is re-using `DistanceType` and things in `lancedb` do need to worry about `Serialize`/`Deserialize` (because `lancedb` needs to support remote client). On the bright side, separating the two types allows us to independently document distance type and allows `lance_linalg` to make changes to `DistanceType` in the future without having to worry about backwards compatibility concerns.	2024-04-05 16:33:06 -07:00
Weston Pace	0fe0976a0e	docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131 )	2024-04-05 16:33:05 -07:00
Weston Pace	abde77eafb	feat(rust): add trait for incoming data (#1128 ) This will make it easier for 3rd party integrations. They simply need to implement `IntoArrow` for their types in order for those types to be used in ingestion.	2024-04-05 16:32:47 -07:00
vincent d warmerdam	85a9ef472f	Unhide Pydantic guides in Docs (#1122 ) @wjones127 after fixing https://github.com/lancedb/lancedb/issues/1112 I noticed something else on the docs. There's an odd chunk of the docs missing [here](https://lancedb.github.io/lancedb/guides/tables/#from-a-polars-dataframe). I can see the heading, but after clicking it the contents don't show. ![CleanShot 2024-03-15 at 23 40 17@2x](https://github.com/lancedb/lancedb/assets/1019791/04784b19-0200-4c3f-ae17-7a8f871ef9bd) Apon inspection it was a markdown issue, one tab too many on a whole segment. This PR fixes it. It looks like this now and the sections appear again: ![CleanShot 2024-03-15 at 23 42 32@2x](https://github.com/lancedb/lancedb/assets/1019791/c5aaec4c-1c37-474d-9fb0-641f4cf52626)	2024-04-05 16:32:47 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Lance Release	2db257ca29	[python] Bump version: 0.6.3 → 0.6.4	2024-04-05 16:32:41 -07:00
Lance Release	1f816d597a	Bump version: 0.4.12 → 0.4.13	2024-04-05 16:32:31 -07:00
Weston Pace	c1e3dc48af	feat: bump lance to 0.10.4 (#1123 )	2024-04-05 16:32:31 -07:00
vincent d warmerdam	b9afc01cfd	Explain vonoroi seed initalisation (#1114 ) This PR fixes https://github.com/lancedb/lancedb/issues/1112. It turned out that K-means is currently used internally, so I figured adding that context to the docs would be nice.	2024-04-05 16:32:31 -07:00
Christian Di Lorenzo	8bb983bc3d	fix(python): Add python azure blob read support (#1102 ) I know there's a larger effort to have the python client based on the core rust implementation, but in the meantime there have been several issues (#1072 and #485) with some of the azure blob storage calls due to pyarrow not natively supporting an azure backend. To this end, I've added an optional import of the fsspec implementation of azure blob storage [`adlfs`](https://pypi.org/project/adlfs/) and passed it to `pyarrow.fs`. I've modified the existing test and manually verified it with some real credentials to make sure it behaves as expected. It should be now as simple as: ```python import lancedb db = lancedb.connect("az://blob_name/path") table = db.open_table("test") table.search(...) ``` Thank you for this cool project and we're excited to start using this for real shortly! 🎉 And thanks to @dwhitena for bringing it to my attention with his prediction guard posts. Co-authored-by: christiandilorenzo <christian.dilorenzo@infiniaml.com>	2024-04-05 16:32:31 -07:00
Weston Pace	1ea0c33545	feat: update lance to v0.10.3 (#1094 )	2024-04-05 16:32:31 -07:00
Raghav Dixit	765569425c	doc updates (#1085 ) closes #1084	2024-04-05 16:32:15 -07:00
Chang She	377832e532	feat(python): support optional vector field in pydantic model (#1097 ) The LanceDB embeddings registry allows users to annotate the pydantic model used as table schema with the desired embedding function, e.g.: ```python class Schema(LanceModel): id: str vector: Vector(openai.ndims()) = openai.VectorField() text: str = openai.SourceField() ``` Tables created like this does not require embeddings to be calculated by the user explicitly, e.g. this works: ```python table.add([{"id": "foo", "text": "rust all the things"}]) ``` However, trying to construct pydantic model instances without vector doesn't because it's a required field. Instead, you need add a default value: ```python class Schema(LanceModel): id: str vector: Vector(openai.ndims()) = openai.VectorField(default=None) text: str = openai.SourceField() ``` then this completes without errors: ```python table.add([Schema(id="foo", text="rust all the things")]) ``` However, all of the vectors are filled with zeros. Instead in add_vector_col we have to add an additional check so that the embedding generation is called.	2024-04-05 16:32:15 -07:00
QianZhu	723defbe7e	add index_stats python api (#1096 ) the integration test will be covered in another PR: https://github.com/lancedb/sophon/pull/1876	2024-04-05 16:32:15 -07:00
Chang She	c33110397e	fix(python): fix typo in passing in the api_key explicitly (#1098 ) fix silly typo	2024-04-05 16:32:15 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Will Jones	f0c5f5ba62	fix: handle uri in object (#1091 ) Fixes #1078	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00
Weston Pace	f822255683	feat: add create_index to the async python API (#1052 ) This also refactors the rust lancedb index builder API (and, correspondingly, the nodejs API)	2024-04-05 16:32:14 -07:00
Will Jones	90af5cf028	fix: propagate filter validation errors (#1092 ) In Rust and Node, we have been swallowing filter validation errors. If there was an error in parsing the filter, then the filter was silently ignored, returning unfiltered results. Fixes #1081	2024-04-05 16:31:53 -07:00
Lance Release	fec6f92184	[python] Bump version: 0.6.2 → 0.6.3	2024-04-05 16:31:53 -07:00
Rob Meng	35bc4f3078	feat: configurable timeout for LanceDB Cloud queries (#1090 )	2024-04-05 16:31:53 -07:00
Ivan Leo	89ce417452	Update default_embedding_functions.md (#1073 ) Added a small bit of documentation for the `dim` feature which is provided by the new `text-embedding-3` model series that allows users to shorten an embedding. Happy to discuss a bit on the phrasing but I struggled quite a bit with getting it to work so wanted to help others who might want to use the newer model too	2024-04-05 16:31:53 -07:00
Weston Pace	d4502add44	Remove remote integration workflow (#1076 )	2024-04-05 16:31:53 -07:00
Will Jones	334857a8cb	fix: Allow converting from NativeTable to Table (#1069 )	2024-04-05 16:31:53 -07:00
Lance Release	386d5da22f	Bump version: 0.4.11 → 0.4.12	2024-04-05 16:31:45 -07:00
Lance Release	77ba97416d	[python] Bump version: 0.6.1 → 0.6.2	2024-04-05 16:31:45 -07:00
Will Jones	5120bf262b	fix: make checkout_latest force a reload (#1064 ) #1002 accidentally changed `checkout_latest` to do nothing if the table was already in latest mode. This PR makes sure it forces a reload of the table (if there is a newer version).	2024-04-05 16:31:45 -07:00
Lei Xu	f27167017b	chore: bump lance to 0.10.2 (#1061 )	2024-04-05 16:31:45 -07:00
Weston Pace	73c69a6b9a	feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 ) The synchronous table_names function in python lancedb relies on arrow's filesystem which behaves slightly differently than object_store. As a result, the function would not work properly in GCS. However, the async table_names function uses object_store directly and thus is accurate. In most cases we can fallback to using the async table_names function and so this PR does so. The one case we cannot is if the user is already in an async context (we can't start a new async event loop). Soon, we can just redirect those users to use the async API instead of the sync API and so that case will eventually go away. For now, we fallback to the old behavior.	2024-04-05 16:31:45 -07:00
Will Jones	05f9a77baf	feat: more accessible errors (#1025 ) The fact that we convert errors to strings makes them really hard to work with. For example, in SaaS we want to know whether the underlying `lance::Error` was the `InvalidInput` variant, so we can return a 400 instead of a 500.	2024-04-05 16:31:45 -07:00
Chang She	10089481c0	doc(python): document the method in fts (#982 ) Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:31:45 -07:00
Ayush Chaurasia	b5326d31e9	fix(python): Few fts patches (#1039 ) 1. filtering with fts mutated the schema, which caused schema mistmatch problems with hybrid search as it combines fts and vector search tables. 2. fts with filter failed with `with_row_id`. This was because row_id was calculated before filtering which caused size mismatch on attaching it after. 3. The fix for 1 meant that now row_id is attached before filtering but passing a filter to `to_lance` on a dataset that already contains `_rowid` raises a panic from lance. So temporarily, in case where fts is used with a filter AND `with_row_id`, we just force user to using the duckdb pathway. --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:31:45 -07:00
Weston Pace	c60a193767	fix: sanitize foreign schemas (#1058 ) Arrow-js uses brittle `instanceof` checks throughout the code base. These fail unless the library instance that produced the object matches exactly the same instance the vectordb is using. At a minimum, this means that a user using arrow version 15 (or any version that doesn't match exactly the version that vectordb is using) will get strange errors when they try and use vectordb. However, there are even cases where the versions can be perfectly identical, and the instanceof check still fails. One such example is when using `vite` (e.g. https://github.com/vitejs/vite/issues/3910) This PR solves the problem in a rather brute force, but workable, fashion. If we encounter a schema that does not pass the `instanceof` check then we will attempt to sanitize that schema by traversing the object and, if it has all the correct properties, constructing an appropriate `Schema` instance via deep cloning.	2024-04-05 16:31:42 -07:00
Weston Pace	785ecfa037	feat: reconfigure typescript linter / formatter for nodejs (#1042 ) The eslint rules specify some formatting requirements that are rather strict and conflict with vscode's default formatter. I was unable to get auto-formatting to setup correctly. Also, eslint has quite recently [given up on formatting](https://eslint.org/blog/2023/10/deprecating-formatting-rules/) and recommends using a 3rd party formatter. This PR adds prettier as the formatter. It restores the eslint rules to their defaults. This does mean we now have the "no explicit any" check back on. I know that rule is pedantic but it did help me catch a few corner cases in type testing that weren't covered in the current code. Leaving in draft as this is dependent on other PRs.	2024-04-05 16:31:36 -07:00
Weston Pace	8033a44d68	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:31:36 -07:00
Chang She	3bbcaba65b	chore(rust): update rust version (#810 )	2024-04-05 16:31:36 -07:00
Chang She	e60fde73ba	feat(python): allow user to override api url (#1054 )	2024-04-05 16:31:36 -07:00
Chang She	a7dbe933dc	chore(python): use pypi tantivy to speed up CI (#987 )	2024-04-05 16:31:36 -07:00
Chang She	4f34a01020	doc: fix docs deployment GHA (#1055 )	2024-04-05 16:31:36 -07:00
Prashanth Rao	f9c244e608	[docs]: Fix issues with Rust code snippets in "quick start" (#1047 ) The renaming of `vectordb` to `lancedb` broke the [quick start docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's pointing to a non-existent directory). This PR fixes the code snippets and the paths in the docs page. Additionally, more fixes related to indexing docs below 👇🏽.	2024-04-05 16:31:36 -07:00
Louis Guitton	7f9ef0d329	Fix default_embedding_functions.md (#1043 ) typo and broken table	2024-04-05 16:31:36 -07:00
Chang She	a3761f4209	doc: fix langchain link (#1053 )	2024-04-05 16:31:36 -07:00
Chang She	4b40dad963	feat(python): add model_names() method to openai embedding function (#1049 ) small QoL improvement	2024-04-05 16:31:36 -07:00
QianZhu	b32b69c993	Add create scalar index to sdk (#1033 )	2024-04-05 16:31:36 -07:00
Weston Pace	4299f719ec	feat: port create_table to the async python API and the remote rust API (#1031 ) I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking changes from sync to async python.	2024-04-05 16:31:36 -07:00
Lance Release	accf31fa92	[python] Bump version: 0.6.0 → 0.6.1	2024-04-05 16:31:36 -07:00
Rob Meng	b8eb5d4bfe	fix: fix columns type for pydantic 2.x (#1045 )	2024-04-05 16:31:36 -07:00

... 3 4 5 6 7 ...

1057 Commits