Commit Graph

853 Commits

Author SHA1 Message Date
qzhu
b7c816c919 add index_stats to python api 2024-03-12 16:28:15 -07:00
qzhu
34dd548bc8 init commit for test 2024-03-11 13:28:24 -07:00
Ivan Leo
553dae1607 Update default_embedding_functions.md (#1073)
Added a small bit of documentation for the `dim` feature which is
provided by the new `text-embedding-3` model series that allows users to
shorten an embedding.

Happy to discuss a bit on the phrasing but I struggled quite a bit with
getting it to work so wanted to help others who might want to use the
newer model too
2024-03-11 21:30:07 +05:30
Weston Pace
9c7e00eec3 Remove remote integration workflow (#1076) 2024-03-07 12:00:04 -08:00
Will Jones
a7d66032aa fix: Allow converting from NativeTable to Table (#1069) 2024-03-07 08:33:46 -08:00
Lance Release
7fb8a732a5 Updating package-lock.json 2024-03-07 01:05:09 +00:00
Lance Release
f393ac3b0d Updating package-lock.json 2024-03-06 23:26:48 +00:00
Lance Release
ca83354780 Bump version: 0.4.11 → 0.4.12 v0.4.12 2024-03-06 23:26:38 +00:00
Lance Release
272cbcad7a [python] Bump version: 0.6.1 → 0.6.2 python-v0.6.2 2024-03-06 16:28:50 +00:00
Will Jones
722fe1836c fix: make checkout_latest force a reload (#1064)
#1002 accidentally changed `checkout_latest` to do nothing if the table
was already in latest mode. This PR makes sure it forces a reload of the
table (if there is a newer version).
2024-03-05 11:51:47 -08:00
Lei Xu
d1983602c2 chore: bump lance to 0.10.2 (#1061) 2024-03-05 10:16:07 -08:00
Weston Pace
9148cd6d47 feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059)
The synchronous table_names function in python lancedb relies on arrow's
filesystem which behaves slightly differently than object_store. As a
result, the function would not work properly in GCS.

However, the async table_names function uses object_store directly and
thus is accurate. In most cases we can fallback to using the async
table_names function and so this PR does so. The one case we cannot is
if the user is already in an async context (we can't start a new async
event loop). Soon, we can just redirect those users to use the async API
instead of the sync API and so that case will eventually go away. For
now, we fallback to the old behavior.
2024-03-05 08:38:18 -08:00
Will Jones
47dbb988bf feat: more accessible errors (#1025)
The fact that we convert errors to strings makes them really hard to
work with. For example, in SaaS we want to know whether the underlying
`lance::Error` was the `InvalidInput` variant, so we can return a 400
instead of a 500.
2024-03-05 07:57:11 -08:00
Chang She
6821536d44 doc(python): document the method in fts (#982)
Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
2024-03-04 16:42:24 -08:00
Ayush Chaurasia
d6f0663671 fix(python): Few fts patches (#1039)
1. filtering with fts mutated the schema, which caused schema mistmatch
problems with hybrid search as it combines fts and vector search tables.
2. fts with filter failed with `with_row_id`. This was because row_id
was calculated before filtering which caused size mismatch on attaching
it after.
3. The fix for 1 meant that now row_id is attached before filtering but
passing a filter to `to_lance` on a dataset that already contains
`_rowid` raises a panic from lance. So temporarily, in case where fts is
used with a filter AND `with_row_id`, we just force user to using the
duckdb pathway.

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-03-04 16:41:59 -08:00
Weston Pace
ea33b68c6c fix: sanitize foreign schemas (#1058)
Arrow-js uses brittle `instanceof` checks throughout the code base.
These fail unless the library instance that produced the object matches
exactly the same instance the vectordb is using. At a minimum, this
means that a user using arrow version 15 (or any version that doesn't
match exactly the version that vectordb is using) will get strange
errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly
identical, and the instanceof check still fails. One such example is
when using `vite` (e.g. https://github.com/vitejs/vite/issues/3910)

This PR solves the problem in a rather brute force, but workable,
fashion. If we encounter a schema that does not pass the `instanceof`
check then we will attempt to sanitize that schema by traversing the
object and, if it has all the correct properties, constructing an
appropriate `Schema` instance via deep cloning.
2024-03-04 13:06:36 -08:00
Weston Pace
1453bf4e7a feat: reconfigure typescript linter / formatter for nodejs (#1042)
The eslint rules specify some formatting requirements that are rather
strict and conflict with vscode's default formatter. I was unable to get
auto-formatting to setup correctly. Also, eslint has quite recently
[given up on
formatting](https://eslint.org/blog/2023/10/deprecating-formatting-rules/)
and recommends using a 3rd party formatter.

This PR adds prettier as the formatter. It restores the eslint rules to
their defaults. This does mean we now have the "no explicit any" check
back on. I know that rule is pedantic but it did help me catch a few
corner cases in type testing that weren't covered in the current code.
Leaving in draft as this is dependent on other PRs.
2024-03-04 10:49:08 -08:00
Weston Pace
abaf315baf feat: add support for add to async python API (#1037)
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
cc5f2136a6
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-03-04 09:27:41 -08:00
Chang She
14b9277ac1 chore(rust): update rust version (#810) 2024-03-03 18:51:58 -08:00
Chang She
d621826b79 feat(python): allow user to override api url (#1054) 2024-03-03 18:29:47 -08:00
Chang She
08c0803ae1 chore(python): use pypi tantivy to speed up CI (#987) 2024-03-03 16:57:55 -08:00
Chang She
62632cb90b doc: fix docs deployment GHA (#1055) 2024-03-03 16:04:45 -08:00
Prashanth Rao
14566df213 [docs]: Fix issues with Rust code snippets in "quick start" (#1047)
The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
2024-03-03 15:59:57 -08:00
Louis Guitton
acfdf1b9cb Fix default_embedding_functions.md (#1043)
typo and broken table
2024-03-03 15:22:53 -08:00
Chang She
f95402af7c doc: fix langchain link (#1053) 2024-03-03 15:20:48 -08:00
Chang She
d14c9b6d9e feat(python): add model_names() method to openai embedding function (#1049)
small QoL improvement
2024-03-03 12:33:00 -08:00
QianZhu
c1af53b787 Add create scalar index to sdk (#1033) 2024-02-29 13:32:01 -08:00
Weston Pace
2a02d1394b feat: port create_table to the async python API and the remote rust API (#1031)
I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking
changes from sync to async python.
2024-02-29 13:29:29 -08:00
Lance Release
085066d2a8 [python] Bump version: 0.6.0 → 0.6.1 python-v0.6.1 2024-02-29 19:48:16 +00:00
Rob Meng
adf1a38f4d fix: fix columns type for pydantic 2.x (#1045) 2024-02-29 14:47:56 -05:00
Weston Pace
294c33a42e feat: Initial remote table implementation for rust (#1024)
This will eventually replace the remote table implementations in python
and node.
2024-02-29 10:55:49 -08:00
Lance Release
245786fed7 [python] Bump version: 0.5.7 → 0.6.0 python-v0.6.0 2024-02-29 16:03:01 +00:00
BubbleCal
edd9a043f8 chore: enable test for dropping table (#1038)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-02-29 15:00:24 +08:00
natcharacter
38c09fc294 A simple base usage that install the dependencies necessary to use FT… (#1036)
A simple base usage that install the dependencies necessary to use FTS
and Hybrid search

---------

Co-authored-by: Nat Roth <natroth@Nats-MacBook-Pro.local>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-02-28 09:38:05 -08:00
Rob Meng
ebaa2dede5 chore: upgrade to lance 0.10.1 (#1034)
upgrade to lance 0.10.1 and update doc string to reflect dynamic
projection options
2024-02-28 11:06:46 -05:00
BubbleCal
ba7618a026 chore(rust): report the TableNotFound error while dropping non-exist table (#1022)
this will work after upgrading lance with
https://github.com/lancedb/lance/pull/1995 merged
see #884 for details

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-02-28 04:46:39 -08:00
Weston Pace
a6bcbd007b feat: add a basic async python client starting point (#1014)
This changes `lancedb` from a "pure python" setuptools project to a
maturin project and adds a rust lancedb dependency.

The async python client is extremely minimal (only `connect` and
`Connection.table_names` are supported). The purpose of this PR is to
get the infrastructure in place for building out the rest of the async
client.

Although this is not technically a breaking change (no APIs are
changing) it is still a considerable change in the way the wheels are
built because they now include the native shared library.
2024-02-27 04:52:02 -08:00
Will Jones
5af74b5aca feat: {add|alter|drop}_columns APIs (#1015)
Initial work for #959. This exposes the basic functionality for each in
all of the APIs. Will add user guide documentation in a later PR.
2024-02-26 11:04:53 -08:00
Weston Pace
8a52619bc0 refactor: change arrow from a direct dependency to a peer dependency (#984)
BREAKING CHANGE: users will now need to npm install `apache-arrow` and
`@apache-arrow/ts` themselves.
2024-02-23 14:08:39 -08:00
Lance Release
314d4c93e5 Updating package-lock.json 2024-02-23 05:11:22 +00:00
Lance Release
c5471ee694 Updating package-lock.json 2024-02-23 03:57:39 +00:00
Lance Release
4605359d3b Bump version: 0.4.10 → 0.4.11 v0.4.11 2024-02-23 03:57:28 +00:00
Weston Pace
f1596122e6 refactor: rename the rust crate from vectordb to lancedb (#1012)
This also renames the new experimental node package to lancedb. The
classic node package remains named vectordb.

The goal here is to avoid introducing piecemeal breaking changes to the
vectordb crate. Instead, once the new API is stabilized, we will
officially release the lancedb crate and deprecate the vectordb crate.
The same pattern will eventually happen with the npm package vectordb.
2024-02-22 19:56:39 -08:00
Will Jones
3aa0c40168 feat(node): add read_consistency_interval to Node and Rust (#1002)
This PR adds the same consistency semantics as was added in #828. It
*does not* add the same lazy-loading of tables, since that breaks some
existing tests.

This closes #998.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-02-22 15:04:30 -08:00
Lance Release
677b7c1fcc [python] Bump version: 0.5.6 → 0.5.7 python-v0.5.7 2024-02-22 20:07:12 +00:00
Lei Xu
8303a7197b chore: bump pylance to 0.9.18 (#1011) 2024-02-22 11:47:36 -08:00
Raghav Dixit
5fa9bfc4a8 python(feat): Imagebind embedding fn support (#1003)
Added imagebind fn support , steps to install mentioned in docstring. 
pytest slow checks done locally

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2024-02-22 11:47:08 +05:30
Ayush Chaurasia
bf2e9d0088 Docs: add meta tags (#1006) 2024-02-21 23:22:47 +05:30
Weston Pace
f04590ddad refactor: rust vectordb API stabilization of the Connection trait (#993)
This is the start of a more comprehensive refactor and stabilization of
the Rust API. The `Connection` trait is cleaned up to not require
`lance` and to match the `Connection` trait in other APIs. In addition,
the concrete implementation `Database` is hidden.

BREAKING CHANGE: The struct `crate::connection::Database` is now gone.
Several examples opened a connection using `Database::connect` or
`Database::connect_with_params`. Users should now use
`vectordb::connect`.

BREAKING CHANGE: The `connect`, `create_table`, and `open_table` methods
now all return a builder object. This means that a call like
`conn.open_table(..., opt1, opt2)` will now become
`conn.open_table(...).opt1(opt1).opt2(opt2).execute()` In addition, the
structure of options has changed slightly. However, no options
capability has been removed.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-02-20 18:35:52 -08:00
Lance Release
62c5117def [python] Bump version: 0.5.5 → 0.5.6 python-v0.5.6 2024-02-20 20:45:02 +00:00