Compare commits

...

304 Commits

Author SHA1 Message Date
Lance Release
38321fa226 [python] Bump version: 0.3.3 → 0.3.4 2023-11-19 00:24:01 +00:00
Lance Release
22749c3fa2 Updating package-lock.json 2023-11-19 00:04:08 +00:00
Lance Release
123a49df77 Bump version: 0.3.7 → 0.3.8 2023-11-19 00:03:58 +00:00
Will Jones
a57aa4b142 chore: upgrade lance to v0.8.17 (#656)
Readying for the next Lance release.
2023-11-18 15:57:23 -08:00
Rok Mihevc
d8e3e54226 feat(python): expose index cache size (#655)
This is to enable https://github.com/lancedb/lancedb/issues/641.
Should be merged after https://github.com/lancedb/lance/pull/1587 is
released.
2023-11-18 14:17:40 -08:00
Ayush Chaurasia
ccfdf4853a [Docs]: Add Instructor embeddings and rate limit handler docs (#651) 2023-11-18 06:08:26 +05:30
Ayush Chaurasia
87e5d86e90 [Docs][SEO] Add sitemap and robots.txt (#645)
Sitemap improves SEO by ranking pages and tracking updates.
2023-11-18 06:08:13 +05:30
Aidan
1cf8a3e4e0 SaaS create_index API (#649) 2023-11-15 19:12:52 -05:00
Lance Release
5372843281 Updating package-lock.json 2023-11-15 03:15:10 +00:00
Lance Release
54677b8f0b Updating package-lock.json 2023-11-15 02:42:38 +00:00
Lance Release
ebcf9bf6ae Bump version: 0.3.6 → 0.3.7 2023-11-15 02:42:25 +00:00
Bert
797514bcbf fix: node remote implement table.countRows (#648) 2023-11-13 17:43:20 -05:00
Rok Mihevc
1c872ce501 feat: add RemoteTable.version in Python (#644)
Please note: this is not tested as we don't have a server here and
testing against a mock object wouldn't be that interesting.
2023-11-13 21:43:48 +01:00
Bert
479f471c14 fix: node send db header for GET requests (#646) 2023-11-11 16:33:25 -05:00
Ayush Chaurasia
ae0d2f2599 fix: Pydantic 1.x compat for weak_lru caching in embeddings API (#643)
Colab has pydantic 1.x by default and pydantic 1.x BaseModel objects
don't support weakref creation by default that we use to cache embedding
models
https://github.com/lancedb/lancedb/blob/main/python/lancedb/embeddings/utils.py#L206
. It needs to be added to slot.
2023-11-10 15:02:38 +05:30
Ayush Chaurasia
1e8678f11a Multi-task instructor model with quantization support & weak_lru cache for embedding function models (#612)
resolves #608
2023-11-09 12:34:18 +05:30
QianZhu
662968559d fix saas open_table and table_names issues (#640)
- added check whether a table exists in SaaS open_table
- remove prefilter not supported warning in SaaS search
- fixed issues for SaaS table_names
2023-11-07 17:34:38 -08:00
Rob Meng
9d895801f2 upgrade lance to 0.8.14 (#636)
upgrade lance
2023-11-07 19:01:29 -05:00
Rob Meng
80613a40fd skip missing file on mirrored dir when deleting (#635)
mirrored store is not garueeteed to have all the files. Ignore the ones
that doesn't exist.
2023-11-07 12:33:32 -05:00
Lei Xu
d43ef7f11e chore: apple silicon runner (#633)
Close #632
2023-11-06 21:04:32 -08:00
Lei Xu
554e068917 chore: improve create_table API consistency between local and remote SDK (#627) 2023-11-03 13:15:11 -07:00
Bert
567734dd6e fix: node remote connection handles non http errors (#624)
https://github.com/lancedb/lancedb/issues/623

Fixes issue trying to print response status when using remote client. If
the error is not an HTTP error (e.g. dns/network failure), there won't
be a response.
2023-11-03 10:24:56 -04:00
Ayush Chaurasia
1589499f89 Exponential standoff retry support for handling rate limited embedding functions (#614)
Users ingesting data using rate limited apis don't need to manually make
the process sleep for counter rate limits
resolves #579
2023-11-02 19:20:10 +05:30
Lance Release
682e95fa83 Updating package-lock.json 2023-11-01 22:20:49 +00:00
Lance Release
1ad5e7f2f0 Updating package-lock.json 2023-11-01 21:16:20 +00:00
Lance Release
ddb3ef4ce5 Bump version: 0.3.5 → 0.3.6 2023-11-01 21:16:06 +00:00
Lance Release
ef20b2a138 [python] Bump version: 0.3.2 → 0.3.3 2023-11-01 21:15:55 +00:00
Lei Xu
2e0f251bfd chore: bump lance to 8.10 (#622) 2023-11-01 14:14:38 -07:00
Ayush Chaurasia
2cb91e818d Disable posthog on docs & reduce sentry trace factor (#607)
- posthog charges per event and docs events are registered very
frequently. We can keep tracking them on GA
- Reduced sentry trace factor
2023-11-02 01:13:16 +05:30
Chang She
2835c76336 doc: node sdk now supports windows (#616) 2023-11-01 10:04:18 -07:00
Bert
8068a2bbc3 ci: cancel in progress runs on new push (#620) 2023-11-01 11:33:48 -04:00
Bert
24111d543a fix!: sort table names (#619)
https://github.com/lancedb/lance/issues/1385
2023-11-01 10:50:09 -04:00
QianZhu
7eec2b8f9a Qian/query option doc (#615)
- API documentation improvement for queries (table.search)
- a small bug fix for the remote API on create_table

![image](https://github.com/lancedb/lancedb/assets/1305083/712e9bd3-deb8-4d81-8cd0-d8e98ef68f4e)

![image](https://github.com/lancedb/lancedb/assets/1305083/ba22125a-8c36-4e34-a07f-e39f0136e62c)
2023-10-31 19:50:05 -07:00
Will Jones
b2b70ea399 increment pylance (#618) 2023-10-31 18:07:03 -07:00
Bert
e50a3c1783 added api docs for prefilter flag (#617)
Added the prefilter flag argument to the `LanceQueryBuilder.where`.

This should make it display here:

https://lancedb.github.io/lancedb/python/python/#lancedb.query.LanceQueryBuilder.select

And also in intellisense like this:
<img width="848" alt="image"
src="https://github.com/lancedb/lancedb/assets/5846846/e0c53f4f-96bc-411b-9159-680a6c4d0070">

Also adds some improved documentation about the `where` argument to this
method.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2023-10-31 16:39:32 -04:00
Weston Pace
b517134309 feat: allow prefiltering with index (#610)
Support for prefiltering with an index was added in lance version 0.8.7.
We can remove the lancedb check that prevents this. Closes #261
2023-10-31 13:11:03 -07:00
Lei Xu
6fb539b5bf doc: add doc to use GPU for indexing (#611) 2023-10-30 15:25:00 -07:00
Lance Release
f37fe120fd Updating package-lock.json 2023-10-26 22:30:16 +00:00
Lance Release
2e115acb9a Updating package-lock.json 2023-10-26 21:48:01 +00:00
Lance Release
27a638362d Bump version: 0.3.4 → 0.3.5 2023-10-26 21:47:44 +00:00
Bert
22a6695d7a fix conv version (#605) 2023-10-26 17:44:11 -04:00
Lance Release
57eff82ee7 Updating package-lock.json 2023-10-26 21:03:07 +00:00
Lance Release
7732f7d41c Bump version: 0.3.3 → 0.3.4 2023-10-26 21:02:52 +00:00
Bert
5ca98c326f feat: added dataset stats api to node (#604) 2023-10-26 17:00:48 -04:00
Bert
b55db397eb feat: added data stats apis (#596) 2023-10-26 13:10:17 -04:00
Rob Meng
c04d72ac8a expose remap index api (#603)
expose index remap options in `compact_files`
2023-10-25 22:10:37 -04:00
Rob Meng
28b02fb72a feat: expose optimize index api (#602)
expose `optimize_index` api.
2023-10-25 19:40:23 -04:00
Lance Release
f3cf986777 [python] Bump version: 0.3.1 → 0.3.2 2023-10-24 19:06:38 +00:00
Bert
c73fcc8898 update lance to 0.8.7 (#598) 2023-10-24 14:49:36 -04:00
Chang She
cd9debc3b7 fix(python): fix multiple embedding functions bug (#597)
Closes #594

The embedding functions are pydantic models so multiple instances with
the same parameters are considered ==, which means that if you have
multiple embedding columns it's possible for the embeddings to get
overwritten. Instead we use `is` instead of == to avoid this problem.

testing: modified unit test to include this case
2023-10-24 13:05:05 -04:00
Rob Meng
26a97ba997 feat: add checkout method to table to reuse existing store and connections (#593)
Prior to this PR, to get a new version of a table, we need to re-open
the table. This has a few downsides w.r.t. performance:
* Object store is recreated, which takes time and throws away existing
warm connections
* Commit handler is thrown aways as well, which also may contain warm
connections
2023-10-23 12:06:13 -04:00
Rob Meng
ce19fedb08 feat: include manifest files in mirrow store (#589) 2023-10-21 12:21:41 -04:00
Will Jones
14e8e48de2 Revert "[python] Bump version: 0.3.2 → 0.3.3"
This reverts commit c30faf6083.
2023-10-20 17:52:49 -07:00
Will Jones
c30faf6083 [python] Bump version: 0.3.2 → 0.3.3 2023-10-20 17:30:00 -07:00
Ayush Chaurasia
64a4f025bb [Docs]: Minor Fixes (#587)
* Filename typo
* Remove rick_morty csv as users won't really be able to use it.. We can
create a an executable colab and download it from a bucket or smth.
2023-10-20 16:14:35 +02:00
Ayush Chaurasia
6dc968e7d3 [Docs] Embeddings API: Add multi-lingual semantic search example (#582) 2023-10-20 18:40:49 +05:30
Ayush Chaurasia
06b5b69f1e [Docs]Versioning docs (#586)
closes #564

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-10-20 18:40:16 +05:30
Lance Release
6bd3a838fc Updating package-lock.json 2023-10-19 20:45:39 +00:00
Lance Release
f36fea8f20 Updating package-lock.json 2023-10-19 20:06:10 +00:00
Lance Release
0a30591729 Bump version: 0.3.2 → 0.3.3 2023-10-19 20:05:57 +00:00
Chang She
0ed39b6146 chore: bump lance version in python/rust lancedb (#584)
To include latest v0.8.6

Co-authored-by: Chang She <chang@lancedb.com>
2023-10-19 13:05:12 -07:00
Ayush Chaurasia
a8c7f80073 [Docs] Update embedding function docs (#581) 2023-10-18 13:04:42 +05:30
Ayush Chaurasia
0293bbe142 [Python]Embeddings API refactor (#580)
Sets things up for this -> https://github.com/lancedb/lancedb/issues/579
- Just separates out the registry/ingestion code from the function
implementation code
- adds a `get_registry` util
- package name "open-clip" -> "open-clip-torch"
2023-10-17 22:32:19 -07:00
Ayush Chaurasia
7372656369 [Docs] Add posthog telemetry to docs (#577)
Allows creation of funnels and user journeys
2023-10-17 21:11:59 -07:00
QianZhu
d46bc5dd6e list table pagination draft (#574) 2023-10-16 21:09:20 -07:00
Prashanth Rao
86efb11572 Add pyarrow date and timestamp type conversion from pydantic (#576) 2023-10-16 19:42:24 -07:00
Chang She
bb01ad5290 doc: fix broken link and add README (#573)
Fix broken link to embedding functions

testing: broken link was verified after local docs build to have been
repaired

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-10-16 16:13:07 -07:00
Lance Release
1b8cda0941 Updating package-lock.json 2023-10-16 16:10:07 +00:00
Lance Release
bc85a749a3 Updating package-lock.json 2023-10-16 15:12:15 +00:00
Lance Release
02c35d3457 Bump version: 0.3.1 → 0.3.2 2023-10-16 15:11:57 +00:00
Rob Meng
345c136cfb implement remote api calls for table mutation (#567)
Add more APIs to remote table for Node SDK
* `add` rows
* `overwrite` table with rows
* `create` table

This has been tested against dev stack
2023-10-16 11:07:58 -04:00
Rok Mihevc
043e388254 docs: show source of documented functions (#569) 2023-10-15 09:05:36 -07:00
Lei Xu
fe64fc4671 feat(python,js): deletion operation on remote tables (#568) 2023-10-14 15:47:19 -07:00
Rok Mihevc
6d66404506 docs: switch python examples to be row based (#554) 2023-10-14 14:07:43 -07:00
Lei Xu
eff94ecea8 chore: bump lance to 0.8.5 (#561)
Bump lance to 0.5.8
2023-10-14 12:38:43 -07:00
Ayush Chaurasia
7dfb555fea [DOCS][PYTHON] Update embeddings API docs & Example (#516)
This PR adds an overview of embeddings docs:
- 2 ways to vectorize your data using lancedb - explicit & implicit
- explicit - manually vectorize your data using `wit_embedding` function
- Implicit - automatically vectorize your data as it comes by ingesting
your embedding function details as table metadata
- Multi-modal example w/ disappearing embedding function
2023-10-14 07:56:07 +05:30
Lance Release
f762a669e7 Updating package-lock.json 2023-10-13 22:27:48 +00:00
Lance Release
0bdc7140dd Updating package-lock.json 2023-10-13 21:24:05 +00:00
Lance Release
8f6e955b24 Bump version: 0.3.0 → 0.3.1 2023-10-13 21:23:54 +00:00
Lance Release
1096da09da [python] Bump version: 0.3.0 → 0.3.1 2023-10-13 21:23:47 +00:00
Ayush Chaurasia
683824f1e9 Add cohere embedding function (#550) 2023-10-13 16:27:34 +05:30
Will Jones
db7bdefe77 feat: cleanup and compaction (#518)
#488
2023-10-11 12:49:12 -07:00
Ayush Chaurasia
e41894b071 [Docs] Improve visibility of table ops (#553)
A little verbose, but better than being non-discoverable 
![Screenshot from 2023-10-11
16-26-02](https://github.com/lancedb/lancedb/assets/15766192/9ba539a7-0cf8-4d9e-94e7-ce5d37c35df0)
2023-10-11 12:20:46 -07:00
Chang She
e1ae2bcbd8 feat: add to_list and to_pandas api's (#556)
Add `to_list` to return query results as list of python dict (so we're
not too pandas-centric). Closes #555

Add `to_pandas` API and add deprecation warning on `to_df`. Closes #545

Co-authored-by: Chang She <chang@lancedb.com>
2023-10-11 12:18:55 -07:00
Ankur Goyal
ababc3f8ec Use query.limit(..) in README (#543)
If you run the README javascript example in typescript, it complains
that the type of limit is a function and cannot be set to a number.
2023-10-11 11:54:14 -07:00
Ayush Chaurasia
a1377afcaa feat: telemetry, error tracking, CLI & config manager (#538)
Co-authored-by: Lance Release <lance-dev@lancedb.com>
Co-authored-by: Rob Meng <rob.xu.meng@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: rmeng <rob@lancedb.com>
Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
2023-10-08 23:11:39 +05:30
Lei Xu
a26c8f3316 feat: use GPU for index creation. (#540)
Bump lance to 0.8.3 to include GPU training

---------

Co-authored-by: Rob Meng <rob.xu.meng@gmail.com>
2023-10-05 20:49:00 -07:00
Josh Wein
88d8d7249e Typo cleanup (#539) 2023-10-05 23:07:28 -04:00
Rob Meng
0eb7c9ea0c fix stackoverflow (#542)
closes #541 

two functions was calling itself instead of routing to primary
2023-10-05 20:05:04 -04:00
Rob Meng
1db66c6980 implement mirroring object store (#537)
This PR implements a mirroring object store and allows and table to be
mirrored to a local path when param `mirroredStore` is set in the url
2023-10-04 21:23:34 -04:00
Lance Release
c58da8fc8a Updating package-lock.json 2023-10-03 22:59:02 +00:00
Lance Release
448c4a835d Updating package-lock.json 2023-10-03 22:09:00 +00:00
Lance Release
850f80de99 Bump version: 0.2.6 → 0.3.0 2023-10-03 22:08:44 +00:00
Lance Release
a022368426 [python] Bump version: 0.2.6 → 0.3.0 2023-10-03 21:48:22 +00:00
Lei Xu
8b815ef5a8 chore: upgrade lance to 0.8.1 (#536)
Bump to lance 0.8.1 for both javascript and python sdk
2023-10-03 14:29:18 -07:00
Tan Li
e4c3a9346c [doc] make the tensor width differnt from height (#533) 2023-10-03 00:55:16 -07:00
Prashanth Rao
1d1f8964d2 [DOCS][PYTHON] Update docs for clarity (#531)
I only modified those docs pages that are untouched in existing unmerged
PRs, so hopefully there are no merge conflicts!

1. The `tantivy-py` version specified in the docs doesn't work (pip
install fails), but with the latest version of pip and wheel installed
on my Mac, I was able to just `pip install tantivy` and FTS works great
for me. I updated the docs page to include this in
7ca4b757ce but can always modify to
another specific version in case this breaks any tests.
2. The `.add()` method for Python should take in a list of dicts as the
first option (to also align with the JS API), and additionally, users
can pass an existing pandas DataFrame to add to a table. Hope this makes
sense.
3. I've had multiple conversations with users who are unclear that the
terms "exhaustive", "flat" and "KNN" are all the same kind of search, so
I've updated the verbiage of this section to clarify this.
4. Fixed typos and improved clarity in the ANN indexes page.
2023-10-03 09:46:53 +05:30
Lance Release
d326146a40 [python] Bump version: 0.2.5 → 0.2.6 2023-10-01 17:48:59 +00:00
Chang She
693bca1eba feat(python): expose prefilter to lancedb (#522)
We have experimental support for prefiltering (without ANN) in pylance.
This means that we can now apply a filter BEFORE vector search is
performed. This can be done via the `.where(filter_string,
prefilter=True)` kwargs of the query.

Limitations:
- When connecting to LanceDB cloud, `prefilter=True` will raise
NotImplemented
- When an ANN index is present, `prefilter=True` will raise
NotImplemented
- This option is not available for full text search query
- This option is not available for empty search query (just
filter/project)

Additional changes in this PR:
- Bump pylance version to v0.8.0 which supports the experimental
prefiltering.

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-10-01 10:34:12 -07:00
Will Jones
343e274ea5 fix: define minimum dependency versions (#515)
Closes #513

For each of these, I found the minimum version that would allow the unit
tests to pass.
2023-09-29 09:04:49 -07:00
Rob Meng
a695fb8030 fix import attr to use import attrs (#510)
Thanks to #508, I used `attr` instead of the correct package `attrs`

s/attr/attrs
2023-09-23 00:30:56 -04:00
Hynek Schlawack
bc8670d7af [Python] Fix attrs dependency (#508)
The `attr` project is unrelated to `attrs` that also provides the `attr`
namespace (see also <https://hynek.me/articles/import-attrs/>).

It used to _usually_ work, because attrs is a dependency of aiohttp and
somehow took precedence over `attr`'s `attr`.

Yes, sorry, it's a mess.
2023-09-21 12:35:34 -04:00
Lance Release
74004161ff [python] Bump version: 0.2.4 → 0.2.5 2023-09-19 16:43:06 +00:00
Lance Release
34ddb1de6d Updating package-lock.json 2023-09-19 13:48:20 +00:00
Lance Release
1029fc9cb0 Updating package-lock.json 2023-09-19 12:19:23 +00:00
Lance Release
31c5df6d99 Bump version: 0.2.5 → 0.2.6 2023-09-19 12:19:05 +00:00
Rob Meng
dbf37a0434 fix: upgrade lance to 0.7.5 and add tests for searching empty dataset (#505)
This PR upgrade lance to `0.7.5`, which include fixes for searching an
empty dataset.

This PR also adds two tests in node SDK to make sure searching empty
dataset do no throw

Co-authored-by: rmeng <rob@lancedb.com>
2023-09-18 22:12:11 -07:00
Chang She
f20f19b804 feat: improve pydantic 1.x compat (#503) 2023-09-18 19:01:30 -07:00
Chang She
55207ce844 feat: add lancedb.__version__ (#504) 2023-09-18 18:51:51 -07:00
Chang She
c21f9cdda0 ci: fix docs build (#496)
python/python.md contains typos in the class references

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-09-18 13:07:21 -07:00
Rob Meng
bc38abb781 refactor connection string processing (#500)
in #486 `connect` started converting path into uri. However, the PR
didn't handle relative path and appended `file://` to relative path.

This PR changes the parsing strat to be more rational. If a path is
provided instead of url, we do not try anythinng special.

engine and engine params may only be specified when a url with schema is
provided

Co-authored-by: rmeng <rob@lancedb.com>
2023-09-18 12:38:00 -07:00
Rob Meng
731f86e44c add health check to wait for all service ready before next step (#501)
aws integration tests are flaky because we didn't wait for the services
to become healthy. (we only waited for the localstack service, this PR
adds wait for sub services)
2023-09-18 15:17:45 -04:00
Chang She
31dad71c94 multi-modal embedding-function (#484) 2023-09-16 21:23:51 -04:00
Will Jones
9585f550b3 fix: increase S3 timeouts (#494)
Closes #493
2023-09-15 20:21:34 -07:00
Lance Release
8dc2315479 [python] Bump version: 0.2.3 → 0.2.4 2023-09-15 14:23:26 +00:00
Rob Meng
f6bfb5da11 chore: upgrade lance to 0.7.4 (#491) 2023-09-14 16:02:23 -04:00
Lance Release
661fcecf38 [python] Bump version: 0.2.2 → 0.2.3 2023-09-14 17:48:32 +00:00
Lance Release
07fe284810 Updating package-lock.json 2023-09-10 23:58:06 +00:00
Lance Release
800bb691c3 Updating package-lock.json 2023-09-09 19:45:58 +00:00
Lance Release
ec24e09add Bump version: 0.2.4 → 0.2.5 2023-09-09 19:45:43 +00:00
Rob Meng
0554db03b3 progagate uri query string to lance; add aws integration tests (#486)
# WARNING: specifying engine is NOT a publicly supported feature in
lancedb yet. THE API WILL CHANGE.

This PR exposes dynamodb based commit to `vectordb` and JS SDK (will do
python in another PR since it's on a different release track)

This PR also added aws integration test using `localstack`

## What?
This PR adds uri parameters to DB connection string. User may specify
`engine` in the connection string to let LanceDB know that the user
wants to use an external store when reading and writing a table. User
may also pass any parameters required by the commitStore in the
connection string, these parameters will be propagated to lance.

e.g.
```
vectordb.connect("s3://my-db-bucket?engine=ddb&ddbTableName=my-commit-table")
```
will automatically convert table path to
```
s3+ddb://my-db-bucket/my_table.lance?&ddbTableName=my-commit-table
```
2023-09-09 13:33:16 -04:00
Lei Xu
b315ea3978 [Python] Pydantic vector field with default value (#474)
Rename `lance.pydantic.vector` to `Vector` and deprecate `vector(dim)`
2023-09-08 22:35:31 -07:00
Ayush Chaurasia
aa7806cf0d [Python]Fix record_batch_generator (#483)
Should fix - https://github.com/lancedb/lancedb/issues/482
2023-09-08 21:18:50 +05:30
Lei Xu
6799613109 feat: upgrade lance to 0.7.3 (#481) 2023-09-07 17:01:45 -07:00
Lei Xu
0f26915d22 [Rust] schema coerce and vector column inference (#476)
Split the rust core from #466 for easy review and less merge conflicts.
2023-09-06 10:00:46 -07:00
Chang She
32163063dc Fix up docs (#477) 2023-09-05 22:29:50 -07:00
Chang She
9a9a73a65d [python] Use pydantic for embedding function persistence (#467)
1. Support persistent embedding function so users can just search using
query string
2. Add fixed size list conversion for multiple vector columns
3. Add support for empty query (just apply select/where/limit).
4. Refactor and simplify some of the data prep code

---------

Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2023-09-05 21:30:45 -07:00
Ayush Chaurasia
52fa7f5577 [Docs] Small typo fixes (#460) 2023-09-02 22:17:19 +05:30
Chang She
0cba0f4f92 [python] Temporary update feature (#457)
Combine delete and append to make a temporary update feature that is
only enabled for the local python lancedb.

The reason why this is temporary is because it first has to load the
data that matches the where clause into memory, which is technical
unbounded.

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-08-30 00:25:26 -07:00
Will Jones
8391ffee84 chore: make crate more discoverable (#443)
A few small changes to make the Rust crate more discoverable.
2023-08-25 08:59:14 -07:00
Lance Release
fe8848efb9 [python] Bump version: 0.2.1 → 0.2.2 2023-08-24 23:18:10 +00:00
Chang She
213c313b99 Revert "Updating package-lock.json" (#455)
This reverts commit ab97e5d632.

Co-authored-by: Chang She <chang@lancedb.com>
2023-08-24 15:54:57 -07:00
Chang She
157e995a43 Revert "Bump version: 0.2.4 → 0.2.5" (#454)
This reverts commit 87e9a0250f.

I triggered the nodejs release commit GHA by mistake. Reverting it.
The tag will be removed manually.

Co-authored-by: Chang She <chang@lancedb.com>
2023-08-24 15:44:37 -07:00
Lance Release
ab97e5d632 Updating package-lock.json 2023-08-24 21:54:35 +00:00
Lance Release
87e9a0250f Bump version: 0.2.4 → 0.2.5 2023-08-24 21:54:18 +00:00
Chang She
e587a17a64 [python] Support schema evolution in local LanceDB (#452)
Previously if you needed to add a column to a table you'd have to
rewrite the whole table. Instead,
we use the merge functionality from Lance format
to incrementally add columns from another table
or dataframe.

---------

Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2023-08-24 14:40:49 -07:00
Chang She
2f1f9f6338 [python] improve restore functionality (#451)
Previously the temporary restore feature required copying data. The new
feature in pylance does not.

---------

Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2023-08-24 11:00:34 -07:00
Lance Release
a34fa4df26 Updating package-lock.json 2023-08-24 05:23:19 +00:00
Lance Release
e20979b335 Updating package-lock.json 2023-08-24 04:48:11 +00:00
Lance Release
08689c345d Bump version: 0.2.3 → 0.2.4 2023-08-24 04:47:57 +00:00
Lance Release
909b7e90cd [python] Bump version: 0.2.0 → 0.2.1 2023-08-24 04:00:11 +00:00
QianZhu
ae8486cc8f bump lance version to 0.6.5 for lancedb release (#453) 2023-08-23 20:59:03 -07:00
Tevin Wang
b8f32d082f Clean up docs testing - exclude by glob instead of by file (#450) 2023-08-24 07:30:37 +05:30
Jai
ea7522baa5 fix url to image in docs (#444) 2023-08-22 16:21:02 -07:00
Lance Release
8764741116 Updating package-lock.json 2023-08-22 21:11:28 +00:00
Ayush Chaurasia
cc916389a6 [DOCS] Major Docs Revamp (#435) 2023-08-22 14:06:26 -07:00
Lance Release
3d7d903d88 Updating package-lock.json 2023-08-22 20:15:13 +00:00
Lance Release
cc5e2d3e10 Bump version: 0.2.2 → 0.2.3 2023-08-22 20:14:58 +00:00
Rob Meng
30f5bc5865 expose awsRegion to be configurable (#441) 2023-08-22 16:00:14 -04:00
gsilvestrin
2737315cb2 feat(node): Create empty tables / Arrow Tables (#399)
- Supports creating an empty table as long as an Arrow Schema is provided
- Supports creating a table from an Arrow Table (can be passed as data)
- Simplified some Arrow code in the TS/FFI side
- removed createTableArrow method, it was never documented / tested.
2023-08-22 10:57:45 -07:00
Rob Meng
d52422603c use a lambda function to hide the value of credentials when printing a connection/table (#438)
Previously when logging the `LocalConnection` and `LocalTable` classes,
we would expose the aws creds inside them. This PR changes the stored
creds to a anonymous function to hide the creds
2023-08-21 23:06:44 -04:00
Ayush Chaurasia
f35f8e451f [DOCS] Update integrations + small typos (#432)
Depends on - https://github.com/lancedb/lancedb/pull/430

---------

Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
2023-08-18 09:59:22 +05:30
Ayush Chaurasia
0b9924b432 Make creating (and adding to) tables via Iterators more flexible & intuitive (#430)
It improves the UX as iterators can be of any type supported by the
table (plus recordbatch) & there is no separate requirement.
Also expands the test cases for pydantic & arrow schema.
If this is looks good I'll update the docs.

Example usage:
```
class Content(LanceModel):
    vector: vector(2)
    item: str
    price: float

def make_batches():
    for _ in range(5):
        yield from [ 
        # pandas
        pd.DataFrame({
            "vector": [[3.1, 4.1], [1, 1]],
            "item": ["foo", "bar"],
            "price": [10.0, 20.0],
        }),
        
        # pylist
        [
            {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
            {"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
        ],

        # recordbatch
        pa.RecordBatch.from_arrays(
            [
                pa.array([[3.1, 4.1], [5.9, 26.5]], pa.list_(pa.float32(), 2)),
                pa.array(["foo", "bar"]),
                pa.array([10.0, 20.0]),
            ], 
            ["vector", "item", "price"],
        ),

        # pydantic list
        [
            Content(vector=[3.1, 4.1], item="foo", price=10.0),
            Content(vector=[5.9, 26.5], item="bar", price=20.0),
        ]]

db = lancedb.connect("db")
tbl = db.create_table("tabley", make_batches(), schema=Content, mode="overwrite")

tbl.add(make_batches())
```
Same should with arrow schema.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2023-08-18 09:56:30 +05:30
Lance Release
ba416a571d Updating package-lock.json 2023-08-17 23:48:01 +00:00
Lance Release
13317ffb46 Updating package-lock.json 2023-08-17 23:07:51 +00:00
Lance Release
ca961567fe Bump version: 0.2.1 → 0.2.2 2023-08-17 23:07:36 +00:00
gsilvestrin
31a12a141d fix(node) Electron crashes when creating external buffer (#424) 2023-08-17 14:47:54 -07:00
Chang She
e3061d4cb4 [python] Temporary restore feature (#428)
This adds LanceTable.restore as a temporary feature. It reads data from
a previous version and creates
a new snapshot version using that data. This makes the version writeable
unlike checkout. This should be replaced once the feature is implemented
in pylance.

Co-authored-by: Chang She <chang@lancedb.com>
2023-08-14 20:10:29 -07:00
Lance Release
1fcc67fd2c Updating package-lock.json 2023-08-14 23:02:39 +00:00
Rob Meng
ac18812af0 fix moka version (#427) 2023-08-14 18:28:55 -04:00
Lance Release
8324e0f171 Bump version: 0.2.0 → 0.2.1 2023-08-14 22:22:24 +00:00
Rob Meng
f0bcb26f32 Upgrade lance and pass AWS creds when opening a table (#426) 2023-08-14 18:22:02 -04:00
Lance Release
b281c5255c Updating package-lock.json 2023-08-14 17:03:51 +00:00
Lance Release
d349d2a44a Updating package-lock.json 2023-08-14 16:06:52 +00:00
Lance Release
0699a6fa7b Bump version: 0.1.19 → 0.2.0 2023-08-14 16:06:36 +00:00
Lance Release
b1a5c251ba [python] Bump version: 0.1.16 → 0.2.0 2023-08-12 04:43:16 +00:00
Will Jones
722462c38b chore: upgrade Lance and rename score to _distance (#398)
BREAKING CHANGE: The `score` column has been renamed to `_distance` to
more accurately describe the semantics (smaller means closer / better).

---------

Co-authored-by: Lei Xu <lei@lancedb.com>
2023-08-11 21:42:33 -07:00
Ashis Kumar Naik
902a402951 implementation of drop_database (#418)
#416 Fixed.

added drop_database() method . This deletes all the tables from the
database with a single command.

---------

Signed-off-by: Ashis Kumar Naik <ashishami2002@gmail.com>
2023-08-11 20:59:56 -07:00
Rob Meng
2f2cb984d4 [breaking change] make schema a property (#414) 2023-08-11 18:58:41 -04:00
Lei Xu
9921b2a4e5 [Node] Use index by default (#422) 2023-08-11 15:26:44 -07:00
gsilvestrin
03b8f99dca feat(node) Remote drop table (#412) 2023-08-10 09:21:36 -07:00
Lei Xu
aa91f35a28 [Python][Remote] Raise meaningful exception for to_arrow() / to_pandas() (#413) 2023-08-08 14:40:09 -07:00
gsilvestrin
f227658e08 fix(node) Remove mpsc from JS SDK (#407)
- Callers / SDKs are responsible for keeping track of the last version of the Table
-  Remove the mpsc from Table and make all Table operations non-blocking
2023-08-08 10:35:43 -07:00
Rob Meng
fd65887d87 implement remote drop table call (#411)
Also moves `request_id` to header instead of request param
2023-08-08 13:24:16 -04:00
Weston Pace
4673958543 fix(docs) fix minor typo (#408) 2023-08-08 08:37:32 -07:00
Chang She
a54d1e5618 Automatically convert pydantic model (#400)
Saves users from having to explicitly call
`LanceModel.to_arrow_schema()` when creating an empty table.
See new docs for full details.

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-08-06 14:50:03 -07:00
Tevin Wang
8f7264f81d [Documentation Code Testing] temp fix for nodejs docs test hang (#404) 2023-08-06 13:13:35 -07:00
Ayush Chaurasia
44b8271fde [Docs] Allow edit suggestions and analytics (#394) 2023-08-06 22:53:35 +05:30
Ayush Chaurasia
74ef141b9c [Docs] add Tables guide (#381)
* Rename "Reference" -> "Guides" to create distinction b/w api reference
and user facing docs
* Add all the various ways to create, add and delete from table

Related - https://github.com/lancedb/lancedb/pull/391
2023-08-06 12:34:08 +05:30
gsilvestrin
b69b1e3ec8 fix(node) Unit tests hangs and don't exit (#396) 2023-08-04 20:18:23 -07:00
Ayush Chaurasia
bbfadfe58d [python] Allow adding via iterators (#391)
Makes the following work so all the formats accepted by `create_table()`
are also accepted by `add()`
```
import lancedb
import pyarrow as pa

db = lancedb.connect("/tmp")

def make_batches():
    for i in range(5):
        yield pa.RecordBatch.from_arrays(
            [
                pa.array([[3.1, 4.1], [5.9, 26.5]]),
                pa.array(["foo", "bar"]),
                pa.array([10.0, 20.0]),
            ],
            ["vector", "item", "price"],
        )

schema = pa.schema([
    pa.field("vector", pa.list_(pa.float32())),
    pa.field("item", pa.utf8()),
    pa.field("price", pa.float32()),
])

tbl = db.create_table("table4", make_batches(), schema=schema)
tbl.add(make_batches())
```
2023-08-04 12:49:44 -07:00
Leon Yee
cf977866d8 [WIP] Workflow to trigger vectordb-recipes workflow (#371) 2023-08-02 11:27:08 -07:00
gsilvestrin
3ff3068a1e fix(node) Give preference to local index.node lib (#393) 2023-08-01 15:29:15 -07:00
gsilvestrin
593b5939be feat(node): Improve concurrency (#376)
- Moved computation out of JS main thread by using a mpsc
- Removes the Arc/Mutex since Table is owned by JsTable now
- Moved table / query methods to their own files 
- Fixed js-transformers example
2023-08-01 14:22:04 -07:00
Lei Xu
f0e1290ae6 Restrict semver version to 3.0 (#389) 2023-07-31 22:26:24 -07:00
Chang She
4b45128bd6 add LanceModel to docs (#386)
Co-authored-by: Chang She <chang@lancedb.com>
2023-07-31 15:12:02 -04:00
Lance Release
b06e214d29 [python] Bump version: 0.1.15 → 0.1.16 2023-07-31 18:32:40 +00:00
Chang She
c1f8feb6ed make pandas an optional dependency in lancedb as well (#385) 2023-07-31 14:08:58 -04:00
Chang She
cada35d5b7 Improve pydantic integration (#384) 2023-07-31 12:16:44 -04:00
Chang She
2d25c263e9 Implement drop table if exists (#383) 2023-07-31 10:25:09 +02:00
gsilvestrin
bcd7f66dc7 fix(node): Handle overflows in the node bridge (#372)
- Fixes many numeric conversions that results in hard to reproduce issues
- JsObjectExt extends JsObject with safe methods to extract numericvalues
2023-07-28 13:15:21 -07:00
gsilvestrin
1daecac648 fix(python): Pin pylance and add pandas as test dependency (#373) 2023-07-27 15:21:45 -07:00
Lance Release
b8e656b2a7 Updating package-lock.json 2023-07-27 21:53:30 +00:00
Lance Release
ff7c1193a7 Updating package-lock.json 2023-07-27 21:06:32 +00:00
Lance Release
6d70e7c29b Bump version: 0.1.18 → 0.1.19 2023-07-27 21:06:17 +00:00
gsilvestrin
73cc12ecc5 fix(node): Relax EmbeddingFunction type guard (#370) 2023-07-27 12:51:59 -07:00
gsilvestrin
6036cf48a7 fix(node) Replace panic errors with friendlier ones (#366)
- Implement Result/Error in the node FFI
- Implement a trait (ResultExt) to make error handling less verbose
- Refactor some parts of the code that touch arrow into arrow.rs
2023-07-26 13:44:58 -07:00
Ayush Chaurasia
15f4787cc8 [Docs]: Add badges, CTA and updates examples (#358)
<img width="1054" alt="Screenshot 2023-07-24 at 6 13 00 PM"
src="https://github.com/lancedb/lancedb/assets/15766192/a263a17e-66d0-4591-adc7-b520aa5b23f6">
Is this a problem? Are we using metadata to track usage or something?
2023-07-26 16:35:46 +05:30
Lance Release
0e4050e706 [python] Bump version: 0.1.14 → 0.1.15 2023-07-25 18:58:44 +00:00
Rob Meng
147796ffcd bump lance version for vectordb, fix minor bugs in lancedb remote client (#365) 2023-07-24 21:30:57 -04:00
Lance Release
6fd465ceef Updating package-lock.json 2023-07-24 20:02:35 +00:00
Lance Release
e2e5a0fb83 Updating package-lock.json 2023-07-24 19:27:32 +00:00
Lance Release
ff8d5a6d51 Bump version: 0.1.17 → 0.1.18 2023-07-24 19:27:17 +00:00
Will Jones
8829988ada ci: build node in manylinux docker container (#350)
Closes #359

TODO:
 * [x] test in a sample of Linux distro docker containers
2023-07-24 11:31:47 -07:00
gsilvestrin
80a32be121 bugfix(node): make WriteMode optional when specifying embeddings (#336) 2023-07-24 11:26:43 -07:00
Rob Meng
8325979bb8 dont print apikey in remote client toString, add hostoverride to python client (#353) 2023-07-23 18:44:00 -04:00
lindt
ed5ff5a482 [docs] typo fix (#352)
Co-authored-by: Stefan Rohe <think@eduroam152-169.nbk.vse.cz>
2023-07-22 11:18:58 +02:00
Lance Release
2c9371dcc4 Updating package-lock.json 2023-07-21 23:18:22 +00:00
Lance Release
6d5621da4a Updating package-lock.json 2023-07-21 22:39:21 +00:00
Lance Release
380c1572f3 Bump version: 0.1.16 → 0.1.17 2023-07-21 22:39:06 +00:00
gsilvestrin
4383848d53 feat(node): Add Linux ARM build (#348) 2023-07-21 15:33:02 -07:00
gsilvestrin
473c43860c bugfix: Set Github token when pushing changes (#351) 2023-07-21 15:31:44 -07:00
gsilvestrin
17cf244e53 Updating package-lock.json (#347) 2023-07-20 14:44:10 -07:00
Leon Yee
0b60694df4 [docs] typo fix (#346) 2023-07-20 14:33:56 -07:00
Lance Release
600da476e8 Updating package-lock.json 2023-07-20 20:24:54 +00:00
Lance Release
458217783c Bump version: 0.1.15 → 0.1.16 2023-07-20 20:24:37 +00:00
gsilvestrin
21b1a71a6b bugfix(node): Don't persist credentials on make-release-commit.yml (#345) 2023-07-20 13:24:06 -07:00
gsilvestrin
2d899675e8 bugfix(node): Make release task can't push to repo (#344) 2023-07-20 13:15:29 -07:00
Lance Release
1cbfc1bbf4 [python] Bump version: 0.1.13 → 0.1.14 2023-07-20 20:06:15 +00:00
gsilvestrin
a2bb497135 feat(node) Move native packages to @lancedb NPM org (#341)
- Move native packages to @lancedb org
- Move package-lock.json update to a reusable action and created a target to run it manually.
2023-07-20 12:54:39 -07:00
Will Jones
0cf40c8da3 fix: only use util function to build filesystem (#339) 2023-07-20 10:41:50 -07:00
Rob Meng
8233c689c3 fix remote SDK (#342) 2023-07-20 02:01:13 -04:00
gsilvestrin
6e24e731b8 Updating package-lock.json (#338) 2023-07-18 21:10:18 -07:00
Lance Release
f4ce86e12c [python] Bump version: 0.1.12 → 0.1.13 2023-07-19 03:09:50 +00:00
Lance Release
0664eaec82 Bump version: 0.1.14 → 0.1.15 2023-07-19 02:54:10 +00:00
Lei Xu
63acdc2069 [Python] Support pydantic v1 as well (#337)
Support both Pydantic v1 and v2 (breaking changes)
2023-07-18 19:53:09 -07:00
Rob Meng
a636bb1075 add support for host override (#335) 2023-07-18 21:21:39 -04:00
Lance Release
5e3167da83 [python] Bump version: 0.1.11 → 0.1.12 2023-07-19 01:18:28 +00:00
Lei Xu
f09db4a6d6 [Python] Do not return Table count for every add operation (#328)
`Table::count()` will be linearly slower with more fragments ingested.
2023-07-18 17:11:17 -07:00
Lei Xu
1d343edbd4 [Node] implement remote db.TableNames (#334) 2023-07-18 16:56:47 -07:00
Lei Xu
980f910f50 [Node] initial support of nodejs remote sdk (#333) 2023-07-18 16:15:27 -07:00
Will Jones
fb97b03a51 feat: pass AWS_ENDPOINT environment variable down (#330)
Tested locally against minio.
2023-07-18 15:07:26 -07:00
Lei Xu
141b6647a8 [Python] Fix bumpversion.cfg (#327) 2023-07-18 09:18:14 -07:00
gsilvestrin
b45ac4608f feat(node): Explicitly set registry url when publishing package (#324) 2023-07-18 08:55:56 -07:00
Lei Xu
a86bc05131 [Bug] Fix dataset path check in Table::open (#326)
Fixed a bug that prevents to open remote tables.
2023-07-18 08:45:10 -07:00
Will Jones
3537afb2c3 docs: show how to delete rows in user guide (#309)
Closes #265
2023-07-18 08:19:48 -07:00
Lei Xu
23f5dddc7c [Rust] Checkout a version of dataset. (#321)
* `Table::open()` from absolute path, and gives the responsibility of
organizing metadata out of Table object
* Fix Clippy warnings
* Add `Table::checkout(version)` API
2023-07-17 17:29:58 -07:00
gsilvestrin
9748406cba Updating package-lock.json (#322) 2023-07-17 16:48:22 -07:00
gsilvestrin
6271949d38 feat(node): Update package-lock.json on each release (#302) 2023-07-17 16:33:43 -07:00
Lance Release
131ad09ab3 Bump version: 0.1.13 → 0.1.14 2023-07-17 20:06:58 +00:00
Lei Xu
030f07e7f0 Bump minimal lance version to 0.5.8 (#318) 2023-07-17 12:41:29 -07:00
gsilvestrin
72afa06b7a feat(node): Add Windows support (#294) 2023-07-17 08:48:24 -07:00
Lei Xu
088e745e1d [Python] Create table with Iterator[RecordBatch] and add docs (#316) 2023-07-16 21:45:55 -07:00
Lei Xu
7a57cddb2c [Python] Add records to remote (#315) 2023-07-16 13:24:38 -07:00
Lei Xu
8ff5f88916 [Python] Bug fixes in remote API (#314) 2023-07-16 11:09:19 -07:00
Lei Xu
028a6e433d [Python] Get table schema (#313) 2023-07-15 17:39:37 -07:00
Lei Xu
04c6814fb1 [Rust] Expose Table schema and version in Rust (#312) 2023-07-14 22:01:23 -07:00
Lei Xu
c62e4ca1eb Bump lance version to 0.5.7 (#311) 2023-07-14 17:17:31 -07:00
gsilvestrin
aecc5fc42b feat(node): Fix npm publish task (#298) 2023-07-14 13:39:15 -07:00
Chang She
2fdcb307eb [python] Fix a few minor bugs (#304) 2023-07-15 03:47:42 +08:00
Tevin Wang
ad18826579 [Documentation Code Testing] build node sdk in release (#307) 2023-07-14 12:46:48 -07:00
Leon Yee
a8a50591d7 [docs] small fixes (#308)
Closes #288 and #287
2023-07-14 12:46:31 -07:00
gsilvestrin
6dfe7fabc2 pin half (#310) 2023-07-14 12:45:05 -07:00
gsilvestrin
2b108e1c80 Updating package-lock.json file (#301) 2023-07-13 17:50:01 -07:00
Lei Xu
8c9edafccc [Doc] Add more Python integrations documents (#299) 2023-07-13 17:09:39 -07:00
Leon Yee
0590413b96 Added transformersJS example to docs and node/examples (#297) 2023-07-13 17:01:36 -07:00
Lance Release
bd2d40a927 Bump version: 0.1.12 → 0.1.13 2023-07-13 21:17:35 +00:00
Lei Xu
08944bf4fd [Python] Convert Pydantic Model to Arrow Schema (#291)
Provide utility to automatically convert Pydantic model to Arrow Schema

Closes #256
2023-07-13 11:16:37 -07:00
gsilvestrin
826dc90151 feat(node): add option object to connect method (#286) 2023-07-13 11:03:48 -07:00
Lei Xu
08cc483ec9 [Doc] Describe the difference between ANN and KNN, and how to create indices. (#293) 2023-07-13 08:52:58 -07:00
Lei Xu
ff1d206182 [Doc] Split the python integration into different topics (#292) 2023-07-12 21:26:59 -07:00
gsilvestrin
c385c55629 feat(node): pull node binaries into separate packages (3) (#285) 2023-07-12 16:52:04 -07:00
Lance Release
0a03f7ca5a Bump version: 0.1.11 → 0.1.12 2023-07-12 04:20:34 +00:00
Rob Meng
88be978e87 allow logging in JS (#283)
tested with `RUST_LOG=info npm test`
2023-07-11 22:50:36 -04:00
Rob Meng
98b12caa06 export create table with aws credentials (#282) 2023-07-11 17:21:10 -04:00
Lance Release
091dffb171 Bump version: 0.1.10 → 0.1.11 2023-07-11 20:42:15 +00:00
Rob Meng
ace6aa883a Upgrade lance to 0.5.5, and plumb thru new features from the upgrade (#279)
* upgrade
* fixes for the upgrade
* allow JS users to pass custom AWS credentials
2023-07-11 16:33:39 -04:00
Tevin Wang
80c25f9896 [Docs] uncomment cosine metric (#271)
- Change k value to `10` for js search to keep it consistent with python
docs
- Uncomment now that cosine metrix is fixed in lance:
https://github.com/lancedb/lance/pull/1035
2023-07-11 12:30:11 -07:00
gsilvestrin
caf22fdb71 Run rust tests when Cargo.toml changes (#276) 2023-07-11 11:19:06 -07:00
Lei Xu
0e7ae5dfbf [Python] Fix list type conversion to JSON and temporal types (#274) 2023-07-11 11:05:51 -07:00
gsilvestrin
b261e27222 Pin lance version (#275)
we shouldn't auto-upgrade lance
2023-07-11 10:58:15 -07:00
Lei Xu
9f603f73a9 [Python] Schema to JSON (#272) 2023-07-10 18:11:24 -07:00
Lei Xu
9ef846929b [Python] List tables from remote service (#262) 2023-07-09 23:58:03 -07:00
Lei Xu
97364a2514 Bump to v0.1.10-python 2023-07-09 21:52:11 -07:00
Lei Xu
e6c6da6104 [Python] Initial support of cloud API (#260)
Support connect with remote database, and implement Search API
2023-07-07 15:41:15 -07:00
Leon Yee
a5eb665b7d [docs] dynamic docs generation and deployment (#253)
Solves #245 , edited docs.yml to run the generation of docs before
deployment. Tested on a test repository
2023-07-06 21:10:36 -07:00
Chang She
e2325c634b Allow creation of an empty table (#254)
It's inconvenient to always require data at table creation time.
Here we enable you to create an empty table and add data and set schema
later.

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-06 20:44:58 -07:00
Chang She
507eeae9c8 Set default to error instead of drop (#259)
when encountering bad input data, we can default to principle of least
surprise and raise an exception.

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-05 22:44:18 -07:00
Lance Release
bb3df62dce Bump version: 0.1.9 → 0.1.10 2023-07-06 03:05:32 +00:00
Lei Xu
dc7146b2cb [Node] Expose IVF PQ config (#258) 2023-07-05 19:54:21 -07:00
Lei Xu
d701947f0b [Rust] Re-export WriteMode from lancedb instead of lance (#257)
`Table::add(.., mode: WriteMode)`, which is a public API, currently uses
the WriteMode exported from `lance`. Re-export it to lancedb so that the
pub API looks better.
2023-07-05 18:20:31 -07:00
Chang She
3c46d7f268 Handle NaN input data (#241)
Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.

Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-04 20:00:46 -07:00
Leon Yee
9600a38ff0 [docs] fixed javascript docs for overloaded functions (#247)
Solves #244 :


![image](https://github.com/lancedb/lancedb/assets/43097991/d1fd9d2a-0d6a-4c16-a0ab-f460cc709349)

Problem was function overloading in the interface caused some weird
`typedoc` formatting, so breaking it apart into methods fixed the issue.

Also regenerated and updated javascript docs

---------

Co-authored-by: Tevin Wang <tevin@cmu.edu>
2023-07-04 13:07:34 -07:00
Lei Xu
148ed82607 Bump Lance version to 0.5.3 (#250) 2023-07-04 08:34:41 -07:00
Lei Xu
fc725c99f0 [Node] Create Table with WriteMode (#246)
Support `createTable(name, data, mode?)`  to be consistent with Python.

Closes #242
2023-07-03 17:04:21 -07:00
Rob Meng
a6bdffd75b bump lance to 0.5.2, make object store construction hook public (#237)
* bump to 0.5.2 to pick up S3 auth fixes
* make `open_table_params` a public attribute
* add `open_table_with_params` on `Database`
2023-06-29 18:50:02 -04:00
Lei Xu
051c03c3c9 Add dot product support (#239)
Closes #207
2023-06-29 10:32:01 -07:00
Tevin Wang
39479dcf8e fix sha error in npm (#236)
Currently getting a `npm ERR! code EINTEGRITY` on merge, need to fix
asap.


https://stackoverflow.com/questions/75905223/github-action-npm-install-give-code-eintegrity-integrity-checksum-failed
2023-06-29 09:31:23 -07:00
Tevin Wang
b731a6aed9 Add docs code testing & documentation syntax changes (#196)
- Creates testing files `md_testing.py` and `md_testing.js` for testing
python and nodejs code in markdown files in the documentation
This listens for HTML tags as well: `<!--[language] code code
code...-->` will create a set-up file to create some mock tables or to
fulfill some assumptions in the documentation.
- Creates a github action workflow that triggers every push/pr to
`docs/**`
- Modifies documentation so tests run (mostly indentation, some small
syntax errors and some missing imports)

A list of excluded files that we need to take a closer look at later on:
```javascript
const excludedFiles = [
  "../src/fts.md",
  "../src/embedding.md",
  "../src/examples/serverless_lancedb_with_s3_and_lambda.md",
  "../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
  "../src/examples/youtube_transcript_bot_with_nodejs.md",
];
```
Many of them can't be done because we need the OpenAI API key :(.
`fts.md` has some issues with the library, I believe this is still
experimental?

Closes #170

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2023-06-28 11:07:26 -07:00
Rob Meng
0f58bd7af2 allow passing ReadParams to dataset when opening a table (#234)
Plumb thru object store construction hook from
[lance/pull/1014](https://github.com/lancedb/lance/pull/1014)
2023-06-28 11:20:09 -04:00
Rob Meng
01abf82808 Refactor TS client to use interface + implementation pattern (#226)
## What?
* Changed `Connection` and `Table` to interfaces
* Renamed original `Connection` and `Table` to `LocalConnection` and
`LocalTable`
2023-06-27 21:45:01 -04:00
Leon Yee
eb5bcda337 Error implementations (#232)
Solves #216 by adding a check on table open for existence of the
`.lance` file. Does not check for it for remote connections.
2023-06-27 16:48:31 -07:00
Lei Xu
4bc676e26a [Python] Support replace during create_index (#233)
Closes #214
2023-06-27 16:02:07 -07:00
Lei Xu
c68c236f17 [Js] Create index with replace flag (#229) 2023-06-26 18:38:20 -07:00
Philip Kung
313e66c4c5 Specify and Index Column for Vector Search (#217) 2023-06-26 16:11:08 -07:00
Lei Xu
e850df56f1 fix requirements 2023-06-26 12:25:29 -07:00
Lei Xu
8c5507075c Sql filter document (#228) 2023-06-26 12:22:22 -07:00
Will Jones
0e4c52b8a6 bump python module version 2023-06-26 11:25:39 -07:00
Lance Release
c8bebf4776 Bump version: 0.1.8 → 0.1.9 2023-06-26 18:12:38 +00:00
Lei Xu
c14ad91df0 [Node] drop table api (#221)
Provide `drop_table` in rust and node. Closes #86
2023-06-23 19:58:37 -07:00
Will Jones
ad48242ffb feat: support for deletion (#219)
Also upgrades Arrow and Lance.
2023-06-23 18:09:07 -07:00
Leon Yee
1a9a392e20 [docs] CTA for discord + twitter (#218)
![image](https://github.com/lancedb/lancedb/assets/43097991/33eb893c-3baf-4166-8291-47d2f4bde23a)

Includes discord and twitter links in documentation

[#1001](https://github.com/lancedb/sophon/issues/1001)
2023-06-22 16:52:34 -07:00
Ayush Chaurasia
b489edc576 Add favicon in docs (#209) 2023-06-19 20:30:46 -07:00
gsilvestrin
8708fde3ef Revert "feat(node): pull node binaries into separate packages (2) (#1… (#206)
…97)"

This reverts commit 0724d41c4b.
2023-06-16 18:15:49 -07:00
185 changed files with 18640 additions and 6075 deletions

View File

@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.1.8
current_version = 0.3.8
commit = True
message = Bump version: {current_version} → {new_version}
tag = True

View File

@@ -39,6 +39,28 @@ jobs:
run: |
python -m pip install -e .
python -m pip install -r ../docs/requirements.txt
- name: Set up node
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: node/package-lock.json
- uses: Swatinem/rust-cache@v2
- name: Install node dependencies
working-directory: node
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build node
working-directory: node
run: |
npm ci
npm run build
npm run tsc
- name: Create markdown files
working-directory: node
run: |
npx typedoc --plugin typedoc-plugin-markdown --out ../docs/src/javascript src/index.ts
- name: Build docs
run: |
PYTHONPATH=. mkdocs build -f docs/mkdocs.yml

93
.github/workflows/docs_test.yml vendored Normal file
View File

@@ -0,0 +1,93 @@
name: Documentation Code Testing
on:
push:
branches:
- main
paths:
- docs/**
- .github/workflows/docs_test.yml
pull_request:
paths:
- docs/**
- .github/workflows/docs_test.yml
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
RUST_BACKTRACE: "1"
jobs:
test-python:
name: Test doc python code
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-minor-version: [ "11" ]
os: ["ubuntu-22.04"]
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.${{ matrix.python-minor-version }}
cache: "pip"
cache-dependency-path: "docs/test/requirements.txt"
- name: Build Python
working-directory: docs/test
run:
python -m pip install -r requirements.txt
- name: Create test files
run: |
cd docs/test
python md_testing.py
- name: Test
run: |
cd docs/test/python
for d in *; do cd "$d"; echo "$d".py; python "$d".py; cd ..; done
test-node:
name: Test doc nodejs code
runs-on: ${{ matrix.os }}
strategy:
matrix:
node-version: [ "18" ]
os: ["ubuntu-22.04"]
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
lfs: true
- name: Set up Node
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- name: Install dependecies needed for ubuntu
if: ${{ matrix.os == 'ubuntu-22.04' }}
run: |
sudo apt install -y protobuf-compiler libssl-dev
- name: Install node dependencies
run: |
cd docs/test
npm install
- name: Rust cache
uses: swatinem/rust-cache@v2
- name: Install LanceDB
run: |
cd docs/test/node_modules/vectordb
npm ci
npm run build-release
npm run tsc
- name: Create test files
run: |
cd docs/test
node md_testing.js
- name: Test
run: |
cd docs/test/node
for d in *; do cd "$d"; echo "$d".js; node "$d".js; cd ..; done

View File

@@ -52,4 +52,8 @@ jobs:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
branch: main
tags: true
- uses: ./.github/workflows/update_package_lock
if: ${{ inputs.dry_run }} == "false"
with:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}

View File

@@ -9,6 +9,11 @@ on:
- node/**
- rust/ffi/node/**
- .github/workflows/node.yml
- docker-compose.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
@@ -70,7 +75,7 @@ jobs:
npm run tsc
npm run build
npm run pack-build
npm install --no-save ./dist/vectordb-*.tgz
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
@@ -101,9 +106,62 @@ jobs:
npm run tsc
npm run build
npm run pack-build
npm install --no-save ./dist/vectordb-*.tgz
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
run: |
npm run test
aws-integtest:
timeout-minutes: 45
runs-on: "ubuntu-22.04"
defaults:
run:
shell: bash
working-directory: node
env:
AWS_ACCESS_KEY_ID: ACCESSKEY
AWS_SECRET_ACCESS_KEY: SECRETKEY
AWS_DEFAULT_REGION: us-west-2
# this one is for s3
AWS_ENDPOINT: http://localhost:4566
# this one is for dynamodb
DYNAMODB_ENDPOINT: http://localhost:4566
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
with:
node-version: 18
cache: 'npm'
cache-dependency-path: node/package-lock.json
- name: start local stack
run: docker compose -f ../docker-compose.yml up -d --wait
- name: create s3
run: aws s3 mb s3://lancedb-integtest --endpoint $AWS_ENDPOINT
- name: create ddb
run: |
aws dynamodb create-table \
--table-name lancedb-integtest \
--attribute-definitions '[{"AttributeName": "base_uri", "AttributeType": "S"}, {"AttributeName": "version", "AttributeType": "N"}]' \
--key-schema '[{"AttributeName": "base_uri", "KeyType": "HASH"}, {"AttributeName": "version", "KeyType": "RANGE"}]' \
--provisioned-throughput '{"ReadCapacityUnits": 10, "WriteCapacityUnits": 10}' \
--endpoint-url $DYNAMODB_ENDPOINT
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Build
run: |
npm ci
npm run tsc
npm run build
npm run pack-build
npm install --no-save ./dist/lancedb-vectordb-*.tgz
# Remove index.node to test with dependency installed
rm index.node
- name: Test
run: npm run integration-test

View File

@@ -38,7 +38,7 @@ jobs:
node/vectordb-*.tgz
node-macos:
runs-on: macos-12
runs-on: macos-13
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
strategy:
@@ -62,62 +62,71 @@ jobs:
- name: Upload Darwin Artifacts
uses: actions/upload-artifact@v3
with:
name: darwin-native
name: native-darwin
path: |
node/dist/vectordb-darwin*.tgz
node/dist/lancedb-vectordb-darwin*.tgz
node-linux:
name: node-linux (${{ matrix.arch}}-unknown-linux-${{ matrix.libc }})
runs-on: ubuntu-latest
name: node-linux (${{ matrix.config.arch}}-unknown-linux-gnu
runs-on: ${{ matrix.config.runner }}
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
strategy:
fail-fast: false
matrix:
libc:
- gnu
# TODO: re-enable musl once we have refactored to pre-built containers
# Right now we have to build node from source which is too expensive.
# - musl
arch:
- x86_64
# Building on aarch64 is too slow for now
# - aarch64
config:
- arch: x86_64
runner: ubuntu-latest
- arch: aarch64
runner: buildjet-4vcpu-ubuntu-2204-arm
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Change owner to root (for npm)
# The docker container is run as root, so we need the files to be owned by root
# Otherwise npm is a nightmare: https://github.com/npm/cli/issues/3773
run: sudo chown -R root:root .
- name: Set up QEMU
if: ${{ matrix.arch == 'aarch64' }}
uses: docker/setup-qemu-action@v2
with:
platforms: arm64
- name: Build Linux GNU native node modules
if: ${{ matrix.libc == 'gnu' }}
- name: Build Linux Artifacts
run: |
docker run \
-v $(pwd):/io -w /io \
quay.io/pypa/manylinux2014_${{ matrix.arch }} \
bash ci/build_linux_artifacts.sh ${{ matrix.arch }}-unknown-linux-gnu
- name: Build musl Linux native node modules
if: ${{ matrix.libc == 'musl' }}
run: |
docker run --platform linux/arm64/v8 \
-v $(pwd):/io -w /io \
quay.io/pypa/musllinux_1_1_${{ matrix.arch }} \
bash ci/build_linux_artifacts.sh ${{ matrix.arch }}-unknown-linux-musl
bash ci/build_linux_artifacts.sh ${{ matrix.config.arch }}
- name: Upload Linux Artifacts
uses: actions/upload-artifact@v3
with:
name: linux-native
name: native-linux
path: |
node/dist/vectordb-linux*.tgz
node/dist/lancedb-vectordb-linux*.tgz
node-windows:
runs-on: windows-2022
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
strategy:
fail-fast: false
matrix:
target: [x86_64-pc-windows-msvc]
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Install Protoc v21.12
working-directory: C:\
run: |
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
7z x protoc.zip
Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
- name: Install npm dependencies
run: |
cd node
npm ci
- name: Build Windows native node modules
run: .\ci\build_windows_artifacts.ps1 ${{ matrix.target }}
- name: Upload Windows Artifacts
uses: actions/upload-artifact@v3
with:
name: native-windows
path: |
node/dist/lancedb-vectordb-win32*.tgz
release:
needs: [node, node-macos, node-linux]
needs: [node, node-macos, node-linux, node-windows]
runs-on: ubuntu-latest
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
@@ -128,10 +137,27 @@ jobs:
- uses: actions/setup-node@v3
with:
node-version: 20
registry-url: 'https://registry.npmjs.org'
- name: Publish to NPM
env:
NODE_AUTH_TOKEN: ${{ secrets.LANCEDB_NPM_REGISTRY_TOKEN }}
run: |
for filename in */*.tgz; do
mv */*.tgz .
for filename in *.tgz; do
npm publish $filename
done
update-package-lock:
needs: [release]
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: main
persist-credentials: false
fetch-depth: 0
lfs: true
- uses: ./.github/workflows/update_package_lock
with:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}

View File

@@ -8,6 +8,11 @@ on:
paths:
- python/**
- .github/workflows/python.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
linux:
timeout-minutes: 30
@@ -30,20 +35,21 @@ jobs:
python-version: 3.${{ matrix.python-minor-version }}
- name: Install lancedb
run: |
pip install -e .
pip install -e .[tests]
pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985
pip install pytest pytest-mock black isort
- name: Black
run: black --check --diff --no-color --quiet .
- name: isort
run: isort --check --diff --quiet .
pip install pytest pytest-mock ruff
- name: Lint
run: ruff format --check .
- name: Run tests
run: pytest -x -v --durations=30 tests
run: pytest -m "not slow" -x -v --durations=30 tests
- name: doctest
run: pytest --doctest-modules lancedb
mac:
timeout-minutes: 30
runs-on: "macos-12"
strategy:
matrix:
mac-runner: [ "macos-13", "macos-13-xlarge" ]
runs-on: "${{ matrix.mac-runner }}"
defaults:
run:
shell: bash
@@ -59,8 +65,38 @@ jobs:
python-version: "3.11"
- name: Install lancedb
run: |
pip install -e .
pip install -e .[tests]
pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985
pip install pytest pytest-mock
pip install pytest pytest-mock black
- name: Run tests
run: pytest -x -v --durations=30 tests
run: pytest -m "not slow" -x -v --durations=30 tests
pydantic1x:
timeout-minutes: 30
runs-on: "ubuntu-22.04"
defaults:
run:
shell: bash
working-directory: python
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
lfs: true
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
- name: Install lancedb
run: |
pip install "pydantic<2"
pip install -e .[tests]
pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985
pip install pytest pytest-mock black isort
- name: Black
run: black --check --diff --no-color --quiet .
- name: isort
run: isort --check --diff --quiet .
- name: Run tests
run: pytest -m "not slow" -x -v --durations=30 tests
- name: doctest
run: pytest --doctest-modules lancedb

View File

@@ -6,9 +6,14 @@ on:
- main
pull_request:
paths:
- Cargo.toml
- rust/**
- .github/workflows/rust.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# This env var is used by Swatinem/rust-cache@v2 for the cache
# key, so we set it to make sure it is always consistent.
@@ -43,8 +48,11 @@ jobs:
- name: Run tests
run: cargo test --all-features
macos:
runs-on: macos-12
timeout-minutes: 30
strategy:
matrix:
mac-runner: [ "macos-13", "macos-13-xlarge" ]
runs-on: "${{ matrix.mac-runner }}"
defaults:
run:
shell: bash
@@ -65,3 +73,24 @@ jobs:
run: cargo build --all-features
- name: Run tests
run: cargo test --all-features
windows:
runs-on: windows-2022
steps:
- uses: actions/checkout@v3
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install Protoc v21.12
working-directory: C:\
run: |
New-Item -Path 'C:\protoc' -ItemType Directory
Set-Location C:\protoc
Invoke-WebRequest https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-win64.zip -OutFile C:\protoc\protoc.zip
7z x protoc.zip
Add-Content $env:GITHUB_PATH "C:\protoc\bin"
shell: powershell
- name: Run tests
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo build
cargo test

View File

@@ -0,0 +1,26 @@
name: Trigger vectordb-recipers workflow
on:
push:
branches: [ main ]
pull_request:
paths:
- .github/workflows/trigger-vectordb-recipes.yml
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Trigger vectordb-recipes workflow
uses: actions/github-script@v6
with:
github-token: ${{ secrets.VECTORDB_RECIPES_ACTION_TOKEN }}
script: |
const result = await github.rest.actions.createWorkflowDispatch({
owner: 'lancedb',
repo: 'vectordb-recipes',
workflow_id: 'examples-test.yml',
ref: 'main'
});
console.log(result);

View File

@@ -0,0 +1,33 @@
name: update_package_lock
description: "Update node's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./node
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

View File

@@ -0,0 +1,19 @@
name: Update package-lock.json
on:
workflow_dispatch:
jobs:
publish:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: main
persist-credentials: false
fetch-depth: 0
lfs: true
- uses: ./.github/workflows/update_package_lock
with:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}

2
.gitignore vendored
View File

@@ -3,6 +3,7 @@
*.egg-info
**/__pycache__
.DS_Store
venv
.vscode
@@ -32,3 +33,4 @@ node/examples/**/dist
## Rust
target
Cargo.lock

View File

@@ -9,3 +9,13 @@ repos:
rev: 22.12.0
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.0.277
hooks:
- id: ruff
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)

3825
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,28 @@
[workspace]
members = [
"rust/vectordb",
"rust/ffi/node"
]
members = ["rust/ffi/node", "rust/vectordb"]
# Python package needs to be built by maturin.
exclude = ["python"]
resolver = "2"
[workspace.dependencies]
lance = { "version" = "=0.8.17", "features" = ["dynamodb"] }
lance-index = { "version" = "=0.8.17" }
lance-linalg = { "version" = "=0.8.17" }
lance-testing = { "version" = "=0.8.17" }
# Note that this one does not include pyarrow
arrow = { version = "47.0.0", optional = false }
arrow-array = "47.0"
arrow-data = "47.0"
arrow-ipc = "47.0"
arrow-ord = "47.0"
arrow-schema = "47.0"
arrow-arith = "47.0"
arrow-cast = "47.0"
chrono = "0.4.23"
half = { "version" = "=2.3.1", default-features = false, features = [
"num-traits",
] }
log = "0.4"
object_store = "0.7.1"
snafu = "0.7.4"
url = "2"

View File

@@ -33,6 +33,8 @@ The key features of LanceDB include:
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
* GPU support in building vector index(*).
* Ecosystem integrations with [LangChain 🦜️🔗](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lanecdb.html), [LlamaIndex 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
LanceDB's core is written in Rust 🦀 and is built using <a href="https://github.com/lancedb/lance">Lance</a>, an open-source columnar format designed for performant ML workloads.
@@ -52,8 +54,7 @@ const table = await db.createTable('vectors',
[{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
{ id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }])
const query = table.search([0.1, 0.3]);
query.limit = 20;
const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();
```
@@ -65,12 +66,12 @@ pip install lancedb
```python
import lancedb
uri = "/tmp/lancedb"
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_df()
result = table.search([100, 100]).limit(2).to_pandas()
```
## Blogs, Tutorials & Videos

102
ci/build_linux_artifacts.sh Normal file → Executable file
View File

@@ -1,91 +1,19 @@
#!/bin/bash
# Builds the Linux artifacts (node binaries).
# Usage: ./build_linux_artifacts.sh [target]
# Targets supported:
# - x86_64-unknown-linux-gnu:centos
# - aarch64-unknown-linux-gnu:centos
# - aarch64-unknown-linux-musl
# - x86_64-unknown-linux-musl
# TODO: refactor this into a Docker container we can pull
set -e
ARCH=${1:-x86_64}
setup_dependencies() {
echo "Installing system dependencies..."
if [[ $1 == *musl ]]; then
# musllinux
apk add openssl-dev
else
# manylinux2014
yum install -y openssl-devel unzip
fi
# We pass down the current user so that when we later mount the local files
# into the container, the files are accessible by the current user.
pushd ci/manylinux_node
docker build \
-t lancedb-node-manylinux \
--build-arg="ARCH=$ARCH" \
--build-arg="DOCKER_USER=$(id -u)" \
--progress=plain \
.
popd
if [[ $1 == x86_64* ]]; then
ARCH=x86_64
else
# gnu target
ARCH=aarch_64
fi
# Install new enough protobuf (yum-provided is old)
PB_REL=https://github.com/protocolbuffers/protobuf/releases
PB_VERSION=23.1
curl -LO $PB_REL/download/v$PB_VERSION/protoc-$PB_VERSION-linux-$ARCH.zip
unzip protoc-$PB_VERSION-linux-$ARCH.zip -d /usr/local
}
install_node() {
echo "Installing node..."
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash
source "$HOME"/.bashrc
if [[ $1 == *musl ]]; then
# This node version is 15, we need 16 or higher:
# apk add nodejs-current npm
# So instead we install from source (nvm doesn't provide binaries for musl):
nvm install -s --no-progress 17
else
nvm install --no-progress 17 # latest that supports glibc 2.17
fi
}
install_rust() {
echo "Installing rust..."
curl https://sh.rustup.rs -sSf | bash -s -- -y
export PATH="$PATH:/root/.cargo/bin"
}
build_node_binary() {
echo "Building node library for $1..."
pushd node
npm ci
if [[ $1 == *musl ]]; then
# This is needed for cargo to allow build cdylibs with musl
export RUSTFLAGS="-C target-feature=-crt-static"
fi
# Cargo can run out of memory while pulling dependencies, espcially when running
# in QEMU. This is a workaround for that.
export CARGO_NET_GIT_FETCH_WITH_CLI=true
# We don't pass in target, since the native target here already matches
# and openblas-src doesn't do well with cross-compilation.
npm run build-release
npm run pack-build
popd
}
TARGET=${1:-x86_64-unknown-linux-gnu}
# Others:
# aarch64-unknown-linux-gnu
# x86_64-unknown-linux-musl
# aarch64-unknown-linux-musl
setup_dependencies $TARGET
install_node $TARGET
install_rust
build_node_binary $TARGET
docker run \
-v $(pwd):/io -w /io \
lancedb-node-manylinux \
bash ci/manylinux_node/build.sh $ARCH

View File

@@ -0,0 +1,41 @@
# Builds the Windows artifacts (node binaries).
# Usage: .\ci\build_windows_artifacts.ps1 [target]
# Targets supported:
# - x86_64-pc-windows-msvc
# - i686-pc-windows-msvc
function Prebuild-Rust {
param (
[string]$target
)
# Building here for the sake of easier debugging.
Push-Location -Path "rust/ffi/node"
Write-Host "Building rust library for $target"
$env:RUST_BACKTRACE=1
cargo build --release --target $target
Pop-Location
}
function Build-NodeBinaries {
param (
[string]$target
)
Push-Location -Path "node"
Write-Host "Building node library for $target"
npm run build-release -- --target $target
npm run pack-build -- --target $target
Pop-Location
}
$targets = $args[0]
if (-not $targets) {
$targets = "x86_64-pc-windows-msvc"
}
Write-Host "Building artifacts for targets: $targets"
foreach ($target in $targets) {
Prebuild-Rust $target
Build-NodeBinaries $target
}

View File

@@ -0,0 +1,31 @@
# Many linux dockerfile with Rust, Node, and Lance dependencies installed.
# This container allows building the node modules native libraries in an
# environment with a very old glibc, so that we are compatible with a wide
# range of linux distributions.
ARG ARCH=x86_64
FROM quay.io/pypa/manylinux2014_${ARCH}
ARG ARCH=x86_64
ARG DOCKER_USER=default_user
# Install static openssl
COPY install_openssl.sh install_openssl.sh
RUN ./install_openssl.sh ${ARCH} > /dev/null
# Protobuf is also installed as root.
COPY install_protobuf.sh install_protobuf.sh
RUN ./install_protobuf.sh ${ARCH}
ENV DOCKER_USER=${DOCKER_USER}
# Create a group and user
RUN echo ${ARCH} && adduser --user-group --create-home --uid ${DOCKER_USER} build_user
# We switch to the user to install Rust and Node, since those like to be
# installed at the user level.
USER ${DOCKER_USER}
COPY prepare_manylinux_node.sh prepare_manylinux_node.sh
RUN cp /prepare_manylinux_node.sh $HOME/ && \
cd $HOME && \
./prepare_manylinux_node.sh ${ARCH}

19
ci/manylinux_node/build.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/bin/bash
# Builds the node module for manylinux. Invoked by ci/build_linux_artifacts.sh.
set -e
ARCH=${1:-x86_64}
if [ "$ARCH" = "x86_64" ]; then
export OPENSSL_LIB_DIR=/usr/local/lib64/
else
export OPENSSL_LIB_DIR=/usr/local/lib/
fi
export OPENSSL_STATIC=1
export OPENSSL_INCLUDE_DIR=/usr/local/include/openssl
source $HOME/.bashrc
cd node
npm ci
npm run build-release
npm run pack-build

View File

@@ -0,0 +1,26 @@
#!/bin/bash
# Builds openssl from source so we can statically link to it
# this is to avoid the error we get with the system installation:
# /usr/bin/ld: <library>: version node not found for symbol SSLeay@@OPENSSL_1.0.1
# /usr/bin/ld: failed to set dynamic section sizes: Bad value
set -e
git clone -b OpenSSL_1_1_1u \
--single-branch \
https://github.com/openssl/openssl.git
pushd openssl
if [[ $1 == x86_64* ]]; then
ARCH=linux-x86_64
else
# gnu target
ARCH=linux-aarch64
fi
./Configure no-shared $ARCH
make
make install

View File

@@ -0,0 +1,15 @@
#!/bin/bash
# Installs protobuf compiler. Should be run as root.
set -e
if [[ $1 == x86_64* ]]; then
ARCH=x86_64
else
# gnu target
ARCH=aarch_64
fi
PB_REL=https://github.com/protocolbuffers/protobuf/releases
PB_VERSION=23.1
curl -LO $PB_REL/download/v$PB_VERSION/protoc-$PB_VERSION-linux-$ARCH.zip
unzip protoc-$PB_VERSION-linux-$ARCH.zip -d /usr/local

View File

@@ -0,0 +1,21 @@
#!/bin/bash
set -e
install_node() {
echo "Installing node..."
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash
source "$HOME"/.bashrc
nvm install --no-progress 16
}
install_rust() {
echo "Installing rust..."
curl https://sh.rustup.rs -sSf | bash -s -- -y
export PATH="$PATH:/root/.cargo/bin"
}
install_node
install_rust

18
docker-compose.yml Normal file
View File

@@ -0,0 +1,18 @@
version: "3.9"
services:
localstack:
image: localstack/localstack:0.14
ports:
- 4566:4566
environment:
- SERVICES=s3,dynamodb
- DEBUG=1
- LS_LOG=trace
- DOCKER_HOST=unix:///var/run/docker.sock
- AWS_ACCESS_KEY_ID=ACCESSKEY
- AWS_SECRET_ACCESS_KEY=SECRETKEY
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:4566/health" ]
interval: 5s
retries: 3
start_period: 10s

26
docs/README.md Normal file
View File

@@ -0,0 +1,26 @@
# LanceDB Documentation
LanceDB docs are deployed to https://lancedb.github.io/lancedb/.
Docs is built and deployed automatically by [Github Actions](.github/workflows/docs.yml)
whenever a commit is pushed to the `main` branch. So it is possible for the docs to show
unreleased features.
## Building the docs
### Setup
1. Install LanceDB. From LanceDB repo root: `pip install -e python`
2. Install dependencies. From LanceDB repo root: `pip install -r docs/requirements.txt`
3. Make sure you have node and npm setup
4. Make sure protobuf and libssl are installed
### Building node module and create markdown files
See [Javascript docs README](docs/src/javascript/README.md)
### Build docs
From LanceDB repo root:
Run: `PYTHONPATH=. mkdocs build -f docs/mkdocs.yml`
If successful, you should see a `docs/site` directory that you can verify locally.

View File

@@ -1,16 +1,31 @@
site_name: LanceDB Docs
site_url: https://lancedb.github.io/lancedb/
repo_url: https://github.com/lancedb/lancedb
edit_uri: https://github.com/lancedb/lancedb/tree/main/docs/src
repo_name: lancedb/lancedb
docs_dir: src
theme:
name: "material"
logo: assets/logo.png
favicon: assets/logo.png
features:
- content.code.copy
- content.tabs.link
- content.action.edit
- toc.follow
- toc.integrate
- navigation.top
- navigation.tabs
- navigation.tabs.sticky
- navigation.footer
- navigation.tracking
- navigation.instant
- navigation.indexes
- navigation.expand
icon:
repo: fontawesome/brands/github
custom_dir: overrides
plugins:
- search
@@ -23,7 +38,7 @@ plugins:
docstring_style: numpy
rendering:
heading_level: 4
show_source: false
show_source: true
show_symbol_type_in_heading: true
show_signature_annotations: true
show_root_heading: true
@@ -36,6 +51,7 @@ plugins:
markdown_extensions:
- admonition
- footnotes
- pymdownx.superfences
- pymdownx.details
- pymdownx.highlight:
@@ -47,27 +63,96 @@ markdown_extensions:
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- md_in_html
nav:
- Home: index.md
- Home:
- 🏢 Home: index.md
- 💡 Basics: basic.md
- 📚 Guides:
- Create Ingest Update Delete: guides/tables.md
- Vector Search: search.md
- SQL filters: sql.md
- Indexing: ann_indexes.md
- Versioning & Reproducibility: notebooks/reproducibility.ipynb
- 🧬 Embeddings:
- embeddings/index.md
- Ingest Embedding Functions: embeddings/embedding_functions.md
- Available Functions: embeddings/default_embedding_functions.md
- Create Custom Embedding Functions: embeddings/api.md
- Example - Multi-lingual semantic search: notebooks/multi_lingual_example.ipynb
- Example - MultiModal CLIP Embeddings: notebooks/DisappearingEmbeddingFunction.ipynb
- 🔍 Python full-text search: fts.md
- 🔌 Integrations:
- integrations/index.md
- Pandas and PyArrow: python/arrow.md
- DuckDB: python/duckdb.md
- LangChain 🔗: https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lancedb.html
- LangChain JS/TS 🔗: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/lancedb
- LlamaIndex 🦙: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- Pydantic: python/pydantic.md
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
- 🐍 Python examples:
- examples/index.md
- YouTube Transcript Search: notebooks/youtube_transcript_search.ipynb
- Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb
- Multimodal search using CLIP: notebooks/multimodal_search.ipynb
- Serverless QA Bot with S3 and Lambda: examples/serverless_lancedb_with_s3_and_lambda.md
- Serverless QA Bot with Modal: examples/serverless_qa_bot_with_modal_and_langchain.md
- 🌐 Javascript examples:
- Examples: examples/index_js.md
- Serverless Website Chatbot: examples/serverless_website_chatbot.md
- YouTube Transcript Search: examples/youtube_transcript_bot_with_nodejs.md
- TransformersJS Embedding Search: examples/transformerjs_embedding_search_nodejs.md
- ⚙️ CLI & Config: cli_config.md
- Basics: basic.md
- Embeddings: embedding.md
- Guides:
- Create Ingest Update Delete: guides/tables.md
- Vector Search: search.md
- SQL filters: sql.md
- Indexing: ann_indexes.md
- Versioning & Reproducibility: notebooks/reproducibility.ipynb
- Embeddings:
- embeddings/index.md
- Ingest Embedding Functions: embeddings/embedding_functions.md
- Available Functions: embeddings/default_embedding_functions.md
- Create Custom Embedding Functions: embeddings/api.md
- Example - Multi-lingual semantic search: notebooks/multi_lingual_example.ipynb
- Example - MultiModal CLIP Embeddings: notebooks/DisappearingEmbeddingFunction.ipynb
- Python full-text search: fts.md
- Python integrations: integrations.md
- Integrations:
- integrations/index.md
- Pandas and PyArrow: python/arrow.md
- DuckDB: python/duckdb.md
- LangChain 🦜️🔗: https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lancedb.html
- LangChain JS/TS 🦜️🔗: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/lancedb
- LlamaIndex 🦙: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- Pydantic: python/pydantic.md
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
- Python examples:
- examples/index.md
- YouTube Transcript Search: notebooks/youtube_transcript_search.ipynb
- Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb
- Multimodal search using CLIP: notebooks/multimodal_search.ipynb
- Serverless QA Bot with S3 and Lambda: examples/serverless_lancedb_with_s3_and_lambda.md
- Serverless QA Bot with Modal: examples/serverless_qa_bot_with_modal_and_langchain.md
- Javascript examples:
- examples/index_js.md
- YouTube Transcript Search: examples/youtube_transcript_bot_with_nodejs.md
- References:
- Vector Search: search.md
- Indexing: ann_indexes.md
- Serverless Chatbot from any website: examples/serverless_website_chatbot.md
- TransformersJS Embedding Search: examples/transformerjs_embedding_search_nodejs.md
- API references:
- Python API: python/python.md
- Javascript API: javascript/modules.md
- LanceDB Cloud↗: https://noteforms.com/forms/lancedb-mailing-list-cloud-kty1o5?notionforms=1&utm_source=notionforms
extra_css:
- styles/global.css
extra:
analytics:
provider: google
property: G-B7NFM40W74

View File

@@ -0,0 +1,176 @@
<!--
Copyright (c) 2016-2023 Martin Donath <martin.donath@squidfunk.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
-->
{% set class = "md-header" %}
{% if "navigation.tabs.sticky" in features %}
{% set class = class ~ " md-header--shadow md-header--lifted" %}
{% elif "navigation.tabs" not in features %}
{% set class = class ~ " md-header--shadow" %}
{% endif %}
<!-- Header -->
<header class="{{ class }}" data-md-component="header">
<nav
class="md-header__inner md-grid"
aria-label="{{ lang.t('header') }}"
>
<!-- Link to home -->
<a
href="{{ config.extra.homepage | d(nav.homepage.url, true) | url }}"
title="{{ config.site_name | e }}"
class="md-header__button md-logo"
aria-label="{{ config.site_name }}"
data-md-component="logo"
>
{% include "partials/logo.html" %}
</a>
<!-- Button to open drawer -->
<label class="md-header__button md-icon" for="__drawer">
{% include ".icons/material/menu" ~ ".svg" %}
</label>
<!-- Header title -->
<div class="md-header__title" style="width: auto !important;" data-md-component="header-title">
<div class="md-header__ellipsis">
<div class="md-header__topic">
<span class="md-ellipsis">
{{ config.site_name }}
</span>
</div>
<div class="md-header__topic" data-md-component="header-topic">
<span class="md-ellipsis">
{% if page.meta and page.meta.title %}
{{ page.meta.title }}
{% else %}
{{ page.title }}
{% endif %}
</span>
</div>
</div>
</div>
<!-- Color palette -->
{% if config.theme.palette %}
{% if not config.theme.palette is mapping %}
<form class="md-header__option" data-md-component="palette">
{% for option in config.theme.palette %}
{% set scheme = option.scheme | d("default", true) %}
{% set primary = option.primary | d("indigo", true) %}
{% set accent = option.accent | d("indigo", true) %}
<input
class="md-option"
data-md-color-media="{{ option.media }}"
data-md-color-scheme="{{ scheme | replace(' ', '-') }}"
data-md-color-primary="{{ primary | replace(' ', '-') }}"
data-md-color-accent="{{ accent | replace(' ', '-') }}"
{% if option.toggle %}
aria-label="{{ option.toggle.name }}"
{% else %}
aria-hidden="true"
{% endif %}
type="radio"
name="__palette"
id="__palette_{{ loop.index }}"
/>
{% if option.toggle %}
<label
class="md-header__button md-icon"
title="{{ option.toggle.name }}"
for="__palette_{{ loop.index0 or loop.length }}"
hidden
>
{% include ".icons/" ~ option.toggle.icon ~ ".svg" %}
</label>
{% endif %}
{% endfor %}
</form>
{% endif %}
{% endif %}
<!-- Site language selector -->
{% if config.extra.alternate %}
<div class="md-header__option">
<div class="md-select">
{% set icon = config.theme.icon.alternate or "material/translate" %}
<button
class="md-header__button md-icon"
aria-label="{{ lang.t('select.language') }}"
>
{% include ".icons/" ~ icon ~ ".svg" %}
</button>
<div class="md-select__inner">
<ul class="md-select__list">
{% for alt in config.extra.alternate %}
<li class="md-select__item">
<a
href="{{ alt.link | url }}"
hreflang="{{ alt.lang }}"
class="md-select__link"
>
{{ alt.name }}
</a>
</li>
{% endfor %}
</ul>
</div>
</div>
</div>
{% endif %}
<!-- Button to open search modal -->
{% if "material/search" in config.plugins %}
<label class="md-header__button md-icon" for="__search">
{% include ".icons/material/magnify.svg" %}
</label>
<!-- Search interface -->
{% include "partials/search.html" %}
{% endif %}
<div style="margin-left: 10px; margin-right: 5px;">
<a href="https://discord.com/invite/zMM32dvNtd" target="_blank" rel="noopener noreferrer">
<svg fill="#FFFFFF" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 50 50" width="25px" height="25px"><path d="M 41.625 10.769531 C 37.644531 7.566406 31.347656 7.023438 31.078125 7.003906 C 30.660156 6.96875 30.261719 7.203125 30.089844 7.589844 C 30.074219 7.613281 29.9375 7.929688 29.785156 8.421875 C 32.417969 8.867188 35.652344 9.761719 38.578125 11.578125 C 39.046875 11.867188 39.191406 12.484375 38.902344 12.953125 C 38.710938 13.261719 38.386719 13.429688 38.050781 13.429688 C 37.871094 13.429688 37.6875 13.378906 37.523438 13.277344 C 32.492188 10.15625 26.210938 10 25 10 C 23.789063 10 17.503906 10.15625 12.476563 13.277344 C 12.007813 13.570313 11.390625 13.425781 11.101563 12.957031 C 10.808594 12.484375 10.953125 11.871094 11.421875 11.578125 C 14.347656 9.765625 17.582031 8.867188 20.214844 8.425781 C 20.0625 7.929688 19.925781 7.617188 19.914063 7.589844 C 19.738281 7.203125 19.34375 6.960938 18.921875 7.003906 C 18.652344 7.023438 12.355469 7.566406 8.320313 10.8125 C 6.214844 12.761719 2 24.152344 2 34 C 2 34.175781 2.046875 34.34375 2.132813 34.496094 C 5.039063 39.605469 12.972656 40.941406 14.78125 41 C 14.789063 41 14.800781 41 14.8125 41 C 15.132813 41 15.433594 40.847656 15.621094 40.589844 L 17.449219 38.074219 C 12.515625 36.800781 9.996094 34.636719 9.851563 34.507813 C 9.4375 34.144531 9.398438 33.511719 9.765625 33.097656 C 10.128906 32.683594 10.761719 32.644531 11.175781 33.007813 C 11.234375 33.0625 15.875 37 25 37 C 34.140625 37 38.78125 33.046875 38.828125 33.007813 C 39.242188 32.648438 39.871094 32.683594 40.238281 33.101563 C 40.601563 33.515625 40.5625 34.144531 40.148438 34.507813 C 40.003906 34.636719 37.484375 36.800781 32.550781 38.074219 L 34.378906 40.589844 C 34.566406 40.847656 34.867188 41 35.1875 41 C 35.199219 41 35.210938 41 35.21875 41 C 37.027344 40.941406 44.960938 39.605469 47.867188 34.496094 C 47.953125 34.34375 48 34.175781 48 34 C 48 24.152344 43.785156 12.761719 41.625 10.769531 Z M 18.5 30 C 16.566406 30 15 28.210938 15 26 C 15 23.789063 16.566406 22 18.5 22 C 20.433594 22 22 23.789063 22 26 C 22 28.210938 20.433594 30 18.5 30 Z M 31.5 30 C 29.566406 30 28 28.210938 28 26 C 28 23.789063 29.566406 22 31.5 22 C 33.433594 22 35 23.789063 35 26 C 35 28.210938 33.433594 30 31.5 30 Z"/></svg>
</a>
</div>
<div style="margin-left: 5px; margin-right: 5px;">
<a href="https://twitter.com/lancedb" target="_blank" rel="noopener noreferrer">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0,0,256,256" width="25px" height="25px" fill-rule="nonzero"><g fill-opacity="0" fill="#ffffff" fill-rule="nonzero" stroke="none" stroke-width="1" stroke-linecap="butt" stroke-linejoin="miter" stroke-miterlimit="10" stroke-dasharray="" stroke-dashoffset="0" font-family="none" font-weight="none" font-size="none" text-anchor="none" style="mix-blend-mode: normal"><path d="M0,256v-256h256v256z" id="bgRectangle"></path></g><g fill="#ffffff" fill-rule="nonzero" stroke="none" stroke-width="1" stroke-linecap="butt" stroke-linejoin="miter" stroke-miterlimit="10" stroke-dasharray="" stroke-dashoffset="0" font-family="none" font-weight="none" font-size="none" text-anchor="none" style="mix-blend-mode: normal"><g transform="scale(4,4)"><path d="M57,17.114c-1.32,1.973 -2.991,3.707 -4.916,5.097c0.018,0.423 0.028,0.847 0.028,1.274c0,13.013 -9.902,28.018 -28.016,28.018c-5.562,0 -12.81,-1.948 -15.095,-4.423c0.772,0.092 1.556,0.138 2.35,0.138c4.615,0 8.861,-1.575 12.23,-4.216c-4.309,-0.079 -7.946,-2.928 -9.199,-6.84c1.96,0.308 4.447,-0.17 4.447,-0.17c0,0 -7.7,-1.322 -7.899,-9.779c2.226,1.291 4.46,1.231 4.46,1.231c0,0 -4.441,-2.734 -4.379,-8.195c0.037,-3.221 1.331,-4.953 1.331,-4.953c8.414,10.361 20.298,10.29 20.298,10.29c0,0 -0.255,-1.471 -0.255,-2.243c0,-5.437 4.408,-9.847 9.847,-9.847c2.832,0 5.391,1.196 7.187,3.111c2.245,-0.443 4.353,-1.263 6.255,-2.391c-0.859,3.44 -4.329,5.448 -4.329,5.448c0,0 2.969,-0.329 5.655,-1.55z"></path></g></g></svg>
</a>
</div>
<!-- Repository information -->
{% if config.repo_url %}
<div class="md-header__source" style="margin-left: -5px !important;">
{% include "partials/source.html" %}
</div>
{% endif %}
</nav>
<!-- Navigation tabs (sticky) -->
{% if "navigation.tabs.sticky" in features %}
{% if "navigation.tabs" in features %}
{% include "partials/tabs.html" %}
{% endif %}
{% endif %}
</header>

View File

@@ -1,16 +1,27 @@
# ANN (Approximate Nearest Neighbor) Indexes
You can create an index over your vector data to make search faster.
Vector indexes are faster but less accurate than exhaustive search.
Vector indexes are faster but less accurate than exhaustive search (KNN or Flat Search).
LanceDB provides many parameters to fine-tune the index's size, the speed of queries, and the accuracy of results.
Currently, LanceDB does *not* automatically create the ANN index.
LanceDB has optimized code for KNN as well. For many use-cases, datasets under 100K vectors won't require index creation at all.
If you can live with <100ms latency, skipping index creation is a simpler workflow while guaranteeing 100% recall.
If you can live with < 100ms latency, skipping index creation is a simpler workflow while guaranteeing 100% recall.
In the future we will look to automatically create and configure the ANN index.
## Creating an ANN Index
## Types of Index
Lance can support multiple index types, the most widely used one is `IVF_PQ`.
* `IVF_PQ`: use **Inverted File Index (IVF)** to first divide the dataset into `N` partitions,
and then use **Product Quantization** to compress vectors in each partition.
* `DISKANN` (**Experimental**): organize the vector as a on-disk graph, where the vertices approximately
represent the nearest neighbors of each vector.
## Creating an IVF_PQ Index
Lance supports `IVF_PQ` index type by default.
=== "Python"
Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.
@@ -23,7 +34,7 @@ In the future we will look to automatically create and configure the ANN index.
# Create 10,000 sample vectors
data = [{"vector": row, "item": f"item {i}"}
for i, row in enumerate(np.random.random((10_000, 768)).astype('float32'))]
for i, row in enumerate(np.random.random((10_000, 1536)).astype('float32'))]
# Add the vectors to a table
tbl = db.create_table("my_vectors", data=data)
@@ -41,19 +52,60 @@ In the future we will look to automatically create and configure the ANN index.
for (let i = 0; i < 10_000; i++) {
data.push({vector: Array(1536).fill(i), id: `${i}`, content: "", longId: `${i}`},)
}
const table = await db.createTable('vectors', data)
await table.create_index({ type: 'ivf_pq', column: 'vector', num_partitions: 256, num_sub_vectors: 96 })
const table = await db.createTable('my_vectors', data)
await table.createIndex({ type: 'ivf_pq', column: 'vector', num_partitions: 256, num_sub_vectors: 96 })
```
Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
creation by providing the following parameters:
- **metric** (default: "L2"): The distance metric to use. By default it uses euclidean distance "`L2`".
We also support "cosine" and "dot" distance as well.
- **num_partitions** (default: 256): The number of partitions of the index.
- **num_sub_vectors** (default: 96): The number of sub-vectors (M) that will be created during Product Quantization (PQ).
For D dimensional vector, it will be divided into `M` of `D/M` sub-vectors, each of which is presented by
a single PQ code.
<figure markdown>
![IVF PQ](./assets/ivf_pq.png)
<figcaption>IVF_PQ index with <code>num_partitions=2, num_sub_vectors=4</code></figcaption>
</figure>
### Use GPU to build vector index
Lance Python SDK has experimental GPU support for creating IVF index.
Using GPU for index creation requires [PyTorch>2.0](https://pytorch.org/) being installed.
You can specify the GPU device to train IVF partitions via
- **accelerator**: Specify to ``cuda`` or ``mps`` (on Apple Silicon) to enable GPU training.
=== "Linux"
<!-- skip-test -->
``` { .python .copy }
# Create index using CUDA on Nvidia GPUs.
tbl.create_index(
num_partitions=256,
num_sub_vectors=96,
accelerator="cuda"
)
```
=== "Macos"
<!-- skip-test -->
```python
# Create index using MPS on Apple Silicon.
tbl.create_index(
num_partitions=256,
num_sub_vectors=96,
accelerator="mps"
)
```
Trouble shootings:
If you see ``AssertionError: Torch not compiled with CUDA enabled``, you need to [install
PyTorch with CUDA support](https://pytorch.org/get-started/locally/).
- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support "cosine" distance.
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
A higher number leads to faster queries, but it makes index generation slower.
- **num_sub_vectors** (default: 96): The number of subvectors (M) that will be created during Product Quantization (PQ). A larger number makes
search more accurate, but also makes the index larger and slower to build.
## Querying an ANN Index
@@ -73,30 +125,30 @@ There are a couple of parameters that can be used to fine-tune the search:
=== "Python"
```python
tbl.search(np.random.random((768))) \
tbl.search(np.random.random((1536))) \
.limit(2) \
.nprobes(20) \
.refine_factor(10) \
.to_df()
vector item score
.to_pandas()
```
```
vector item _distance
0 [0.44949695, 0.8444449, 0.06281311, 0.23338133... item 1141 103.575333
1 [0.48587373, 0.269207, 0.15095535, 0.65531915,... item 3953 108.393867
```
=== "Javascript"
```javascript
const results = await table
.search(Array(768).fill(1.2))
const results_1 = await table
.search(Array(1536).fill(1.2))
.limit(2)
.nprobes(20)
.refineFactor(10)
.execute()
```
The search will return the data requested in addition to the score of each item.
The search will return the data requested in addition to the distance of each item.
**Note:** The score is the distance between the query vector and the element. A lower number means that the result is more relevant.
### Filtering (where clause)
@@ -104,14 +156,14 @@ You can further filter the elements returned by a search using a where clause.
=== "Python"
```python
tbl.search(np.random.random((768))).where("item != 'item 1141'").to_df()
tbl.search(np.random.random((1536))).where("item != 'item 1141'").to_pandas()
```
=== "Javascript"
```javascript
const results = await table
const results_2 = await table
.search(Array(1536).fill(1.2))
.where("item != 'item 1141'")
.where("id != '1141'")
.execute()
```
@@ -121,8 +173,10 @@ You can select the columns returned by the query using a select clause.
=== "Python"
```python
tbl.search(np.random.random((768))).select(["vector"]).to_df()
vector score
tbl.search(np.random.random((1536))).select(["vector"]).to_pandas()
```
```
vector _distance
0 [0.30928212, 0.022668175, 0.1756372, 0.4911822... 93.971092
1 [0.2525465, 0.01723831, 0.261568, 0.002007689,... 95.173485
...
@@ -130,8 +184,36 @@ You can select the columns returned by the query using a select clause.
=== "Javascript"
```javascript
const results = await table
const results_3 = await table
.search(Array(1536).fill(1.2))
.select(["id"])
.execute()
```
## FAQ
### When is it necessary to create an ANN vector index?
`LanceDB` has manually-tuned SIMD code for computing vector distances.
In our benchmarks, computing 100K pairs of 1K dimension vectors takes **less than 20ms**.
For small datasets (< 100K rows) or applications that can accept 100ms latency, vector indices are usually not necessary.
For large-scale or higher dimension vectors, it is beneficial to create vector index.
### How big is my index, and how many memory will it take?
In LanceDB, all vector indices are **disk-based**, meaning that when responding to a vector query, only the relevant pages from the index file are loaded from disk and cached in memory. Additionally, each sub-vector is usually encoded into 1 byte PQ code.
For example, with a 1024-dimension dataset, if we choose `num_sub_vectors=64`, each sub-vector has `1024 / 64 = 16` float32 numbers.
Product quantization can lead to approximately `16 * sizeof(float32) / 1 = 64` times of space reduction.
### How to choose `num_partitions` and `num_sub_vectors` for `IVF_PQ` index?
`num_partitions` is used to decide how many partitions the first level `IVF` index uses.
Higher number of partitions could lead to more efficient I/O during queries and better accuracy, but it takes much more time to train.
On `SIFT-1M` dataset, our benchmark shows that keeping each partition 1K-4K rows lead to a good latency / recall.
`num_sub_vectors` specifies how many Product Quantization (PQ) short codes to generate on each vector. Because
PQ is a lossy compression of the original vector, a higher `num_sub_vectors` usually results in
less space distortion, and thus yields better accuracy. However, a higher `num_sub_vectors` also causes heavier I/O and
more PQ computation, and thus, higher latency. `dimension / num_sub_vectors` should be a multiple of 8 for optimum SIMD efficiency.

Binary file not shown.

After

Width:  |  Height:  |  Size: 342 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 245 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

BIN
docs/src/assets/ivf_pq.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 266 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 205 KiB

BIN
docs/src/assets/voxel.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 953 KiB

View File

@@ -23,7 +23,7 @@ We'll cover the basics of using LanceDB on your local machine in this section.
=== "Python"
```python
import lancedb
uri = "~/.lancedb"
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
```
@@ -35,7 +35,7 @@ We'll cover the basics of using LanceDB on your local machine in this section.
```javascript
const lancedb = require("vectordb");
const uri = "~./lancedb";
const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);
```
@@ -79,6 +79,18 @@ We'll cover the basics of using LanceDB on your local machine in this section.
??? info "Under the hood, LanceDB is converting the input data into an Apache Arrow table and persisting it to disk in [Lance format](https://www.github.com/lancedb/lance)."
### Creating an empty table
Sometimes you may not have the data to insert into the table at creation time.
In this case, you can create an empty table and specify the schema.
=== "Python"
```python
import pyarrow as pa
schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
tbl = db.create_table("empty_table", schema=schema)
```
## How to open an existing table
Once created, you can open a table using the following code:
@@ -102,7 +114,7 @@ Once created, you can open a table using the following code:
If you forget the name of your table, you can always get a listing of all table names:
```javascript
console.log(db.tableNames());
console.log(await db.tableNames());
```
## How to add data to a table
@@ -111,14 +123,20 @@ After a table has been created, you can always add more data to it using
=== "Python"
```python
df = pd.DataFrame([{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
{"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}])
tbl.add(df)
# Option 1: Add a list of dicts to a table
data = [{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
{"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}]
tbl.add(data)
# Option 2: Add a pandas DataFrame to a table
df = pd.DataFrame(data)
tbl.add(data)
```
=== "Javascript"
```javascript
await tbl.add([vector: [1.3, 1.4], item: "fizz", price: 100.0},
await tbl.add([{vector: [1.3, 1.4], item: "fizz", price: 100.0},
{vector: [9.5, 56.2], item: "buzz", price: 200.0}])
```
@@ -128,7 +146,7 @@ Once you've embedded the query, you can find its nearest neighbors using the fol
=== "Python"
```python
tbl.search([100, 100]).limit(2).to_df()
tbl.search([100, 100]).limit(2).to_pandas()
```
This returns a pandas DataFrame with the results.
@@ -138,8 +156,63 @@ Once you've embedded the query, you can find its nearest neighbors using the fol
const query = await tbl.search([100, 100]).limit(2).execute();
```
## How to delete rows from a table
Use the `delete()` method on tables to delete rows from a table. To choose
which rows to delete, provide a filter that matches on the metadata columns.
This can delete any number of rows that match the filter.
=== "Python"
```python
tbl.delete('item = "fizz"')
```
=== "Javascript"
```javascript
await tbl.delete('item = "fizz"')
```
The deletion predicate is a SQL expression that supports the same expressions
as the `where()` clause on a search. They can be as simple or complex as needed.
To see what expressions are supported, see the [SQL filters](sql.md) section.
=== "Python"
Read more: [lancedb.table.Table.delete][]
=== "Javascript"
Read more: [vectordb.Table.delete](javascript/interfaces/Table.md#delete)
## How to remove a table
Use the `drop_table()` method on the database to remove a table.
=== "Python"
```python
db.drop_table("my_table")
```
This permanently removes the table and is not recoverable, unlike deleting rows.
By default, if the table does not exist an exception is raised. To suppress this,
you can pass in `ignore_missing=True`.
## What's next
This section covered the very basics of the LanceDB API.
LanceDB supports many additional features when creating indices to speed up search and options for search.
These are contained in the next section of the documentation.
## Note: Bundling vectorDB apps with webpack
Since LanceDB contains a prebuilt Node binary, you must configure `next.config.js` to exclude it from webpack. This is required for both using Next.js and deploying on Vercel.
```javascript
/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
config.externals.push({ vectordb: 'vectordb' })
return config;
}
})
```

37
docs/src/cli_config.md Normal file
View File

@@ -0,0 +1,37 @@
## LanceDB CLI
Once lanceDB is installed, you can access the CLI using `lancedb` command on the console
```
lancedb
```
This lists out all the various command-line options available. You can get the usage or help for a particular command
```
lancedb {command} --help
```
## LanceDB config
LanceDB uses a global config file to store certain settings. These settings are configurable using the lanceDB cli.
To view your config settings, you can use:
```
lancedb config
```
These config parameters can be tuned using the cli.
```
lancedb {config_name} --{argument}
```
## LanceDB Opt-in Diagnostics
When enabled, LanceDB will send anonymous events to help us improve LanceDB. These diagnostics are used only for error reporting and no data is collected. Error & stats allow us to automate certain aspects of bug reporting, prioritization of fixes and feature requests.
These diagnostics are opt-in and can be enabled or disabled using the `lancedb diagnostics` command. These are enabled by default.
Get usage help.
```
lancedb diagnostics --help
```
Disable diagnostics
```
lancedb diagnostics --disabled
```
Enable diagnostics
```
lancedb diagnostics --enabled
```

213
docs/src/embeddings/api.md Normal file
View File

@@ -0,0 +1,213 @@
To use your own custom embedding function, you need to follow these 2 simple steps.
1. Create your embedding function by implementing the `EmbeddingFunction` interface
2. Register your embedding function in the global `EmbeddingFunctionRegistry`.
Let us see how this looks like in action.
![](../assets/embeddings_api.png)
`EmbeddingFunction` & `EmbeddingFunctionRegistry` handle low-level details for serializing schema and model information as metadata. To build a custom embdding function, you don't need to worry about those details and simply focus on setting up the model.
## `TextEmbeddingFunction` Interface
There is another optional layer of abstraction provided in form of `TextEmbeddingFunction`. You can use this if your model isn't multi-modal in nature and only operates on text. In such case both source and vector fields will have the same pathway for vectorization, so you simply just need to setup the model and rest is handled by `TextEmbeddingFunction`. You can read more about the class and its attributes in the class reference.
Let's implement `SentenceTransformerEmbeddings` class. All you need to do is implement the `generate_embeddings()` and `ndims` function to handle the input types you expect and register the class in the global `EmbeddingFunctionRegistry`
```python
from lancedb.embeddings import register
@register("sentence-transformers")
class SentenceTransformerEmbeddings(TextEmbeddingFunction):
name: str = "all-MiniLM-L6-v2"
# set more default instance vars like device, etc.
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._ndims = None
def generate_embeddings(self, texts):
return self._embedding_model().encode(list(texts), ...).tolist()
def ndims(self):
if self._ndims is None:
self._ndims = len(self.generate_embeddings("foo")[0])
return self._ndims
@cached(cache={})
def _embedding_model(self):
return sentence_transformers.SentenceTransformer(name)
```
This is a stripped down version of our implementation of `SentenceTransformerEmbeddings` that removes certain optimizations and defaul settings.
Now you can use this embedding function to create your table schema and that's it! you can then ingest data and run queries without manually vectorizing the inputs.
```python
from lancedb.pydantic import LanceModel, Vector
registry = EmbeddingFunctionRegistry.get_instance()
stransformer = registry.get("sentence-transformers").create()
class TextModelSchema(LanceModel):
vector: Vector(stransformer.ndims) = stransformer.VectorField()
text: str = stransformer.SourceField()
tbl = db.create_table("table", schema=TextModelSchema)
tbl.add(pd.DataFrame({"text": ["halo", "world"]}))
result = tbl.search("world").limit(5)
```
NOTE:
You can always implement the `EmbeddingFunction` interface directly if you want or need to, `TextEmbeddingFunction` just makes it much simpler and faster for you to do so, by setting up the boiler plat for text-specific use case
## Multi-modal embedding function example
You can also use the `EmbeddingFunction` interface to implement more complex workflows such as multi-modal embedding function support. LanceDB implements `OpenClipEmeddingFunction` class that suppports multi-modal seach. Here's the implementation that you can use as a reference to build your own multi-modal embedding functions.
```python
@register("open-clip")
class OpenClipEmbeddings(EmbeddingFunction):
name: str = "ViT-B-32"
pretrained: str = "laion2b_s34b_b79k"
device: str = "cpu"
batch_size: int = 64
normalize: bool = True
_model = PrivateAttr()
_preprocess = PrivateAttr()
_tokenizer = PrivateAttr()
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
open_clip = self.safe_import("open_clip", "open-clip") # EmbeddingFunction util to import external libs and raise if not found
model, _, preprocess = open_clip.create_model_and_transforms(
self.name, pretrained=self.pretrained
)
model.to(self.device)
self._model, self._preprocess = model, preprocess
self._tokenizer = open_clip.get_tokenizer(self.name)
self._ndims = None
def ndims(self):
if self._ndims is None:
self._ndims = self.generate_text_embeddings("foo").shape[0]
return self._ndims
def compute_query_embeddings(
self, query: Union[str, "PIL.Image.Image"], *args, **kwargs
) -> List[np.ndarray]:
"""
Compute the embeddings for a given user query
Parameters
----------
query : Union[str, PIL.Image.Image]
The query to embed. A query can be either text or an image.
"""
if isinstance(query, str):
return [self.generate_text_embeddings(query)]
else:
PIL = self.safe_import("PIL", "pillow")
if isinstance(query, PIL.Image.Image):
return [self.generate_image_embedding(query)]
else:
raise TypeError("OpenClip supports str or PIL Image as query")
def generate_text_embeddings(self, text: str) -> np.ndarray:
torch = self.safe_import("torch")
text = self.sanitize_input(text)
text = self._tokenizer(text)
text.to(self.device)
with torch.no_grad():
text_features = self._model.encode_text(text.to(self.device))
if self.normalize:
text_features /= text_features.norm(dim=-1, keepdim=True)
return text_features.cpu().numpy().squeeze()
def sanitize_input(self, images: IMAGES) -> Union[List[bytes], np.ndarray]:
"""
Sanitize the input to the embedding function.
"""
if isinstance(images, (str, bytes)):
images = [images]
elif isinstance(images, pa.Array):
images = images.to_pylist()
elif isinstance(images, pa.ChunkedArray):
images = images.combine_chunks().to_pylist()
return images
def compute_source_embeddings(
self, images: IMAGES, *args, **kwargs
) -> List[np.array]:
"""
Get the embeddings for the given images
"""
images = self.sanitize_input(images)
embeddings = []
for i in range(0, len(images), self.batch_size):
j = min(i + self.batch_size, len(images))
batch = images[i:j]
embeddings.extend(self._parallel_get(batch))
return embeddings
def _parallel_get(self, images: Union[List[str], List[bytes]]) -> List[np.ndarray]:
"""
Issue concurrent requests to retrieve the image data
"""
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(self.generate_image_embedding, image)
for image in images
]
return [future.result() for future in futures]
def generate_image_embedding(
self, image: Union[str, bytes, "PIL.Image.Image"]
) -> np.ndarray:
"""
Generate the embedding for a single image
Parameters
----------
image : Union[str, bytes, PIL.Image.Image]
The image to embed. If the image is a str, it is treated as a uri.
If the image is bytes, it is treated as the raw image bytes.
"""
torch = self.safe_import("torch")
# TODO handle retry and errors for https
image = self._to_pil(image)
image = self._preprocess(image).unsqueeze(0)
with torch.no_grad():
return self._encode_and_normalize_image(image)
def _to_pil(self, image: Union[str, bytes]):
PIL = self.safe_import("PIL", "pillow")
if isinstance(image, bytes):
return PIL.Image.open(io.BytesIO(image))
if isinstance(image, PIL.Image.Image):
return image
elif isinstance(image, str):
parsed = urlparse.urlparse(image)
# TODO handle drive letter on windows.
if parsed.scheme == "file":
return PIL.Image.open(parsed.path)
elif parsed.scheme == "":
return PIL.Image.open(image if os.name == "nt" else parsed.path)
elif parsed.scheme.startswith("http"):
return PIL.Image.open(io.BytesIO(url_retrieve(image)))
else:
raise NotImplementedError("Only local and http(s) urls are supported")
def _encode_and_normalize_image(self, image_tensor: "torch.Tensor"):
"""
encode a single image tensor and optionally normalize the output
"""
image_features = self._model.encode_image(image_tensor)
if self.normalize:
image_features /= image_features.norm(dim=-1, keepdim=True)
return image_features.cpu().numpy().squeeze()
```

View File

@@ -0,0 +1,208 @@
There are various Embedding functions available out of the box with lancedb. We're working on supporting other popular embedding APIs.
## Text Embedding Functions
Here are the text embedding functions registered by default.
Embedding functions have inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential standoff.
Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the deafult value of 7.
### Sentence Transformers
Here are the parameters that you can set when registering a `sentence-transformers` object, and their default values:
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| `name` | `str` | `"all-MiniLM-L6-v2"` | The name of the model. |
| `device` | `str` | `"cpu"` | The device to run the model on. Can be `"cpu"` or `"gpu"`. |
| `normalize` | `bool` | `True` | Whether to normalize the input text before feeding it to the model. |
```python
db = lancedb.connect("/tmp/db")
registry = EmbeddingFunctionRegistry.get_instance()
func = registry.get("sentence-transformers").create(device="cpu")
class Words(LanceModel):
text: str = func.SourceField()
vector: Vector(func.ndims()) = func.VectorField()
table = db.create_table("words", schema=Words)
table.add(
[
{"text": "hello world"}
{"text": "goodbye world"}
]
)
query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
print(actual.text)
```
### OpenAIEmbeddings
LanceDB has OpenAI embeddings function in the registry by default. It is registered as `openai` and here are the parameters that you can customize when creating the instances
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| `name` | `str` | `"text-embedding-ada-002"` | The name of the model. |
```python
db = lancedb.connect("/tmp/db")
registry = EmbeddingFunctionRegistry.get_instance()
func = registry.get("openai").create()
class Words(LanceModel):
text: str = func.SourceField()
vector: Vector(func.ndims()) = func.VectorField()
table = db.create_table("words", schema=Words)
table.add(
[
{"text": "hello world"}
{"text": "goodbye world"}
]
)
query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
print(actual.text)
```
### Instructor Embeddings
Instructor is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning
If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:
Represent the `domain` `text_type` for `task_objective`:
* `domain` is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc.
* `text_type` is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc.
* `task_objective` is optional, and it specifies the objective of embedding, e.g., retrieve a document, classify the sentence, etc.
More information about the model can be found here - https://github.com/xlang-ai/instructor-embedding
| Argument | Type | Default | Description |
|---|---|---|---|
| `name` | `str` | "hkunlp/instructor-base" | The name of the model to use |
| `batch_size` | `int` | `32` | The batch size to use when generating embeddings |
| `device` | `str` | `"cpu"` | The device to use when generating embeddings |
| `show_progress_bar` | `bool` | `True` | Whether to show a progress bar when generating embeddings |
| `normalize_embeddings` | `bool` | `True` | Whether to normalize the embeddings |
| `quantize` | `bool` | `False` | Whether to quantize the model |
| `source_instruction` | `str` | `"represent the docuement for retreival"` | The instruction for the source column |
| `query_instruction` | `str` | `"represent the document for retreiving the most similar documents"` | The instruction for the query |
```python
import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry, InstuctorEmbeddingFunction
instructor = get_registry().get("instructor").create(
source_instruction="represent the docuement for retreival",
query_instruction="represent the document for retreiving the most similar documents"
)
class Schema(LanceModel):
vector: Vector(instructor.ndims()) = instructor.VectorField()
text: str = instructor.SourceField()
db = lancedb.connect("~/.lancedb")
tbl = db.create_table("test", schema=Schema, mode="overwrite")
texts = [{"text": "Capitalism has been dominant in the Western world since the end of feudalism, but most feel[who?] that..."},
{"text": "The disparate impact theory is especially controversial under the Fair Housing Act because the Act..."},
{"text": "Disparate impact in United States labor law refers to practices in employment, housing, and other areas that.."}]
tbl.add(texts)
```
## Multi-modal embedding functions
Multi-modal embedding functions allow you query your table using both images and text.
### OpenClipEmbeddings
We support CLIP model embeddings using the open souce alternbative, open-clip which support various customizations. It is registered as `open-clip` and supports following customizations.
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| `name` | `str` | `"ViT-B-32"` | The name of the model. |
| `pretrained` | `str` | `"laion2b_s34b_b79k"` | The name of the pretrained model to load. |
| `device` | `str` | `"cpu"` | The device to run the model on. Can be `"cpu"` or `"gpu"`. |
| `batch_size` | `int` | `64` | The number of images to process in a batch. |
| `normalize` | `bool` | `True` | Whether to normalize the input images before feeding them to the model. |
This embedding function supports ingesting images as both bytes and urls. You can query them using both test and other images.
NOTE:
LanceDB supports ingesting images directly from accessible links.
```python
db = lancedb.connect(tmp_path)
registry = EmbeddingFunctionRegistry.get_instance()
func = registry.get("open-clip").create()
class Images(LanceModel):
label: str
image_uri: str = func.SourceField() # image uri as the source
image_bytes: bytes = func.SourceField() # image bytes as the source
vector: Vector(func.ndims()) = func.VectorField() # vector column
vec_from_bytes: Vector(func.ndims()) = func.VectorField() # Another vector column
table = db.create_table("images", schema=Images)
labels = ["cat", "cat", "dog", "dog", "horse", "horse"]
uris = [
"http://farm1.staticflickr.com/53/167798175_7c7845bbbd_z.jpg",
"http://farm1.staticflickr.com/134/332220238_da527d8140_z.jpg",
"http://farm9.staticflickr.com/8387/8602747737_2e5c2a45d4_z.jpg",
"http://farm5.staticflickr.com/4092/5017326486_1f46057f5f_z.jpg",
"http://farm9.staticflickr.com/8216/8434969557_d37882c42d_z.jpg",
"http://farm6.staticflickr.com/5142/5835678453_4f3a4edb45_z.jpg",
]
# get each uri as bytes
image_bytes = [requests.get(uri).content for uri in uris]
table.add(
[{"label": labels, "image_uri": uris, "image_bytes": image_bytes}]
)
```
Now we can search using text from both the default vector column and the custom vector column
```python
# text search
actual = table.search("man's best friend").limit(1).to_pydantic(Images)[0]
print(actual.label) # prints "dog"
frombytes = (
table.search("man's best friend", vector_column_name="vec_from_bytes")
.limit(1)
.to_pydantic(Images)[0]
)
print(frombytes.label)
```
Because we're using a multi-modal embedding function, we can also search using images
```python
# image search
query_image_uri = "http://farm1.staticflickr.com/200/467715466_ed4a31801f_z.jpg"
image_bytes = requests.get(query_image_uri).content
query_image = Image.open(io.BytesIO(image_bytes))
actual = table.search(query_image).limit(1).to_pydantic(Images)[0]
print(actual.label == "dog")
# image search using a custom vector column
other = (
table.search(query_image, vector_column_name="vec_from_bytes")
.limit(1)
.to_pydantic(Images)[0]
)
print(actual.label)
```
If you have any questions about the embeddings API, supported models, or see a relevant model missing, please raise an issue.

View File

@@ -0,0 +1,95 @@
Representing multi-modal data as vector embeddings is becoming a standard practice. Embedding functions themselves be thought of as a part of the processing pipeline that each request(input) has to be passed through. After initial setup these components are not expected to change for a particular project.
This is main motivation behind our new embedding functions API, that allow you simply set it up once and the table remembers it, effectively making the **embedding functions disappear in the background** so you don't have to worry about modelling and simply focus on the DB aspects of VectorDB.
You can simply follow these steps and forget about the details of your embedding functions as long as you don't intend to change it.
### Step 1 - Define the embedding function
We have some pre-defined embedding functions in the global registry with more coming soon. Here's let's an implementation of CLIP as example.
```
registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()
```
You can also define your own embedding function by implementing the `EmbeddingFunction` abstract base interface. It subclasses PyDantic Model which can be utilized to write complex schemas simply as we'll see next!
### Step 2 - Define the Data Model or Schema
Our embedding function from the previous section abstracts away all the details about the models and dimensions required to define the schema. You can simply set a feild as **source** or **vector** column. Here's how
```python
class Pets(LanceModel):
vector: Vector(clip.ndims) = clip.VectorField()
image_uri: str = clip.SourceField()
```
`VectorField` tells LanceDB to use the clip embedding function to generate query embeddings for `vector` column & `SourceField` tells that when adding data, automatically use the embedding function to encode `image_uri`.
### Step 3 - Create LanceDB Table
Now that we have chosen/defined our embedding function and the schema, we can create the table
```python
db = lancedb.connect("~/lancedb")
table = db.create_table("pets", schema=Pets)
```
That's it! We have ingested all the information needed to embed source and query inputs. We can now forget about the model and dimension details and start to build or VectorDB
### Step 4 - Ingest lots of data and run vector search!
Now you can just add the data and it'll be vectorized automatically
```python
table.add([{"image_uri": u} for u in uris])
```
Our OpenCLIP query embedding function support querying via both text and images.
```python
result = table.search("dog")
```
Let's query an image
```python
p = Path("path/to/images/samoyed_100.jpg")
query_image = Image.open(p)
table.search(query_image)
```
### Rate limit Handling
`EmbeddingFunction` class wraps the calls for source and query embedding generation inside a rate limit handler that retries the requests with exponential backoff after successive failures. By default the maximum retires is set to 7. You can tune it by setting it to a different number or disable it by setting it to 0.
Example
----
```python
clip = registry.get("open-clip").create() # Defaults to 7 max retries
clip = registry.get("open-clip").create(max_retries=10) # Increase max retries to 10
clip = registry.get("open-clip").create(max_retries=0) # Retries disabled
````
NOTE:
Embedding functions can also fail due to other errors that have nothing to do with rate limits. This is why the error is also logged.
### A little fun with PyDantic
LanceDB is integrated with PyDantic. Infact we've used the integration in the above example to define the schema. It is also being used behing the scene by the embdding function API to ingest useful information as table metadata.
You can also use it for adding utility operations in the schema. For example, in our multi-modal example, you can search images using text or another image. Let us define a utility function to plot the image.
```python
class Pets(LanceModel):
vector: Vector(clip.ndims) = clip.VectorField()
image_uri: str = clip.SourceField()
@property
def image(self):
return Image.open(self.image_uri)
```
Now, you can covert your search results to pydantic model and use this property.
```python
rs = table.search(query_image).limit(3).to_pydantic(Pets)
rs[2].image
```
![](../assets/dog_clip_output.png)
Now that you've the basic idea about LanceDB embedding function, let us now dive deeper into the API that you can use to implement your own embedding functions!

View File

@@ -1,13 +1,20 @@
# Embedding Functions
# Embedding
Embeddings are high dimensional floating-point vector representations of your data or query.
Anything can be embedded using some embedding model or function.
For a given embedding function, the output will always have the same number of dimensions.
Embeddings are high dimensional floating-point vector representations of your data or query. Anything can be embedded using some embedding model or function. Position of embedding in a high dimensional vector space has semantic significance to a degree that depends on the type of modal and training. These embeddings when projected in a 2-D space generally group similar entities close-by forming groups.
## Creating an embedding function
![](../assets/embedding_intro.png)
Any function that takes as input a batch (list) of data and outputs a batch (list) of embeddings
can be used by LanceDB as an embedding function. The input and output batch sizes should be the same.
# Creating an embedding function
LanceDB supports 2 major ways of vectorizing your data, explicit and implicit.
1. By manually embedding the data before ingesting in the table
2. By automatically embedding the data and query as they come, by ingesting embedding function information in the table itself! Covered in [Next Section](embedding_functions.md)
Whatever workflow you prefer, we have the tools to support you.
## Explicit Vectorization
In this workflow, you can create your embedding function and vectorize your data using lancedb's `with_embedding` function. Let's look at some examples.
### HuggingFace example
@@ -66,7 +73,7 @@ You can also use an external API like OpenAI to generate embeddings
to generate embeddings for each row.
Say if you have a pandas DataFrame with a `text` column that you want to be embedded,
you can use the [with_embeddings](https://lancedb.github.io/lancedb/python/#lancedb.embeddings.with_embeddings)
you can use the [with_embeddings](https://lancedb.github.io/lancedb/python/python/#lancedb.embeddings.with_embeddings)
function to generate embeddings and add create a combined pyarrow table:
@@ -98,7 +105,7 @@ You can also use an external API like OpenAI to generate embeddings
embededings for your data.
```javascript
const db = await lancedb.connect("/tmp/lancedb");
const db = await lancedb.connect("data/sample-lancedb");
const data = [
{ text: 'pepperoni' },
{ text: 'pineapple' }
@@ -118,7 +125,7 @@ belong in the same latent space and your results will be nonsensical.
```python
query = "What's the best pizza topping?"
query_vector = embed_func([query])[0]
tbl.search(query_vector).limit(10).to_df()
tbl.search(query_vector).limit(10).to_pandas()
```
The above snippet returns a pandas DataFrame with the 10 closest vectors to the query.
@@ -126,7 +133,7 @@ belong in the same latent space and your results will be nonsensical.
=== "Javascript"
```javascript
const results = await table
.search('What's the best pizza topping?')
.search("What's the best pizza topping?")
.limit(10)
.execute()
```
@@ -134,9 +141,9 @@ belong in the same latent space and your results will be nonsensical.
The above snippet returns an array of records with the 10 closest vectors to the query.
## Roadmap
## Implicit vectorization / Ingesting embedding functions
Representing multi-modal data as vector embeddings is becoming a standard practice. Embedding functions themselves be thought of as a part of the processing pipeline that each request(input) has to be passed through. After initial setup these components are not expected to change for a particular project.
In the near future, we'll be integrating the embedding functions deeper into LanceDB<br/>.
The goal is that you just have to configure the function once when you create the table,
and then you'll never have to deal with embeddings / vectors after that unless you want to.
We'll also integrate more popular models and APIs.
This is main motivation behind our new embedding functions API, that allow you simply set it up once and the table remembers it, effectively making the **embedding functions disappear in the background** so you don't have to worry about modelling and simply focus on the DB aspects of VectorDB.
Learn more in the Next Section

View File

@@ -0,0 +1,23 @@
# Examples
Here are some of the examples, projects and applications using LanceDB python library. Some examples are covered in detail in the next sections. You can find more on [VectorDB Recipes](https://github.com/lancedb/vectordb-recipes)
| Example | Interactive Envs | Scripts |
|-------- | ---------------- | ------ |
| | | |
| [Youtube transcript search bot](https://github.com/lancedb/vectordb-recipes/tree/main/examples/youtube_bot/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/youtube_bot/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/youtube_bot/main.py)|
| [Langchain: Code Docs QA bot](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Code-Documentation-QA-Bot/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Code-Documentation-QA-Bot/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Code-Documentation-QA-Bot/main.py) |
| [AI Agents: Reducing Hallucination](https://github.com/lancedb/vectordb-recipes/tree/main/examples/reducing_hallucinations_ai_agents/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/reducing_hallucinations_ai_agents/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/reducing_hallucinations_ai_agents/main.py)|
| [Multimodal CLIP: DiffusionDB](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_clip/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multimodal_clip/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_clip/main.py) |
| [Multimodal CLIP: Youtube videos](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_video_search/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multimodal_video_search/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_video_search/main.py) |
| [Movie Recommender](https://github.com/lancedb/vectordb-recipes/tree/main/examples/movie-recommender/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/movie-recommender/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/movie-recommender/main.py) |
| [Audio Search](https://github.com/lancedb/vectordb-recipes/tree/main/examples/audio_search/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/audio_search/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/audio_search/main.py) |
| [Multimodal Image + Text Search](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_search/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multimodal_search/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multimodal_search/main.py) |
| [Evaluating Prompts with Prompttools](https://github.com/lancedb/vectordb-recipes/tree/main/examples/prompttools-eval-prompts/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/prompttools-eval-prompts/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> | |
## Projects & Applications powered by LanceDB
| Project Name | Description | Screenshot |
|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| [YOLOExplorer](https://github.com/lancedb/yoloexplorer) | Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds | ![YOLOExplorer](https://github.com/lancedb/vectordb-recipes/assets/15766192/ae513a29-8f15-4e0b-99a1-ccd8272b6131) |
| [Website Chatbot (Deployable Vercel Template)](https://github.com/lancedb/lancedb-vercel-chatbot) | Create a chatbot from the sitemap of any website/docs of your choice. Built using vectorDB serverless native javascript package. | ![Chatbot](../assets/vercel-template.gif) |

View File

@@ -0,0 +1,19 @@
# Examples
Here are some of the examples, projects and applications using vectordb native javascript library.
Some examples are covered in detail in the next sections. You can find more on [VectorDB Recipes](https://github.com/lancedb/vectordb-recipes)
| Example | Scripts |
|-------- | ------ |
| | |
| [Youtube transcript search bot](https://github.com/lancedb/vectordb-recipes/tree/main/examples/youtube_bot/) | [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/youtube_bot/index.js)|
| [Langchain: Code Docs QA bot](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Code-Documentation-QA-Bot/) | [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Code-Documentation-QA-Bot/index.js)|
| [AI Agents: Reducing Hallucination](https://github.com/lancedb/vectordb-recipes/tree/main/examples/reducing_hallucinations_ai_agents/) | [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/reducing_hallucinations_ai_agents/index.js)|
| [TransformersJS Embedding example](https://github.com/lancedb/vectordb-recipes/tree/main/examples/js-transformers/) | [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](https://github.com/lancedb/vectordb-recipes/tree/main/examples/js-transformers/index.js) |
## Projects & Applications
| Project Name | Description | Screenshot |
|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| [YOLOExplorer](https://github.com/lancedb/yoloexplorer) | Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds | ![YOLOExplorer](https://github.com/lancedb/vectordb-recipes/assets/15766192/ae513a29-8f15-4e0b-99a1-ccd8272b6131) |
| [Website Chatbot (Deployable Vercel Template)](https://github.com/lancedb/lancedb-vercel-chatbot) | Create a chatbot from the sitemap of any website/docs of your choice. Built using vectorDB serverless native javascript package. | ![Chatbot](../assets/vercel-template.gif) |

View File

@@ -79,10 +79,7 @@ def qanda_langchain(query):
download_docs()
docs = store_docs()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200,)
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()

View File

@@ -80,14 +80,14 @@ def handler(event, context):
# Shape of SIFT is (128,1M), d=float32
query_vector = np.array(event['query_vector'], dtype=np.float32)
rs = table.search(query_vector).limit(2).to_df()
rs = table.search(query_vector).limit(2).to_list()
return {
"statusCode": status_code,
"headers": {
"Content-Type": "application/json"
},
"body": rs.to_json()
"body": json.dumps(rs)
}
```

View File

@@ -0,0 +1,61 @@
# LanceDB Chatbot - Vercel Next.js Template
Use an AI chatbot with website context retrieved from a vector store like LanceDB. LanceDB is lightweight and can be embedded directly into Next.js, with data stored on-prem.
## One click deploy on Vercel
[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Flancedb%2Flancedb-vercel-chatbot&env=OPENAI_API_KEY&envDescription=OpenAI%20API%20Key%20for%20chat%20completion.&project-name=lancedb-vercel-chatbot&repository-name=lancedb-vercel-chatbot&demo-title=LanceDB%20Chatbot%20Demo&demo-description=Demo%20website%20chatbot%20with%20LanceDB.&demo-url=https%3A%2F%2Flancedb.vercel.app&demo-image=https%3A%2F%2Fi.imgur.com%2FazVJtvr.png)
![Demo website landing page](../assets/vercel-template.gif)
## Development
First, rename `.env.example` to `.env.local`, and fill out `OPENAI_API_KEY` with your OpenAI API key. You can get one [here](https://openai.com/blog/openai-api).
Run the development server:
```bash
npm run dev
# or
yarn dev
# or
pnpm dev
```
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
This project uses [`next/font`](https://nextjs.org/docs/basic-features/font-optimization) to automatically optimize and load Inter, a custom Google Font.
## Learn More
To learn more about LanceDB or Next.js, take a look at the following resources:
- [LanceDB Documentation](https://lancedb.github.io/lancedb/) - learn about LanceDB, the developer-friendly serverless vector database.
- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
## LanceDB on Next.js and Vercel
FYI: these configurations have been pre-implemented in this template.
Since LanceDB contains a prebuilt Node binary, you must configure `next.config.js` to exclude it from webpack. This is required for both using Next.js and deploying on Vercel.
```js
/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
config.externals.push({ vectordb: 'vectordb' })
return config;
}
})
```
To deploy on Vercel, we need to make sure that the NodeJS runtime static file analysis for Vercel can find the binary, since LanceDB uses dynamic imports by default. We can do this by modifying `package.json` in the `scripts` section.
```json
{
...
"scripts": {
...
"vercel-build": "sed -i 's/nativeLib = require(`@lancedb\\/vectordb-\\${currentTarget()}`);/nativeLib = require(`@lancedb\\/vectordb-linux-x64-gnu`);/' node_modules/vectordb/native.js && next build",
...
},
...
}
```

View File

@@ -0,0 +1,121 @@
# Vector embedding search using TransformersJS
## Embed and query data from LanceDB using TransformersJS
<img id="splash" width="400" alt="transformersjs" src="https://github.com/lancedb/lancedb/assets/43097991/88a31e30-3d6f-4eef-9216-4b7c688f1b4f">
This example shows how to use the [transformers.js](https://github.com/xenova/transformers.js) library to perform vector embedding search using LanceDB's Javascript API.
### Setting up
First, install the dependencies:
```bash
npm install vectordb
npm i @xenova/transformers
```
We will also be using the [all-MiniLM-L6-v2](https://huggingface.co/Xenova/all-MiniLM-L6-v2) model to make it compatible with Transformers.js
Within our `index.js` file we will import the necessary libraries and define our model and database:
```javascript
const lancedb = require('vectordb')
const { pipeline } = await import('@xenova/transformers')
const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
```
### Creating the embedding function
Next, we will create a function that will take in a string and return the vector embedding of that string. We will use the `pipe` function we defined earlier to get the vector embedding of the string.
```javascript
// Define the function. `sourceColumn` is required for LanceDB to know
// which column to use as input.
const embed_fun = {}
embed_fun.sourceColumn = 'text'
embed_fun.embed = async function (batch) {
let result = []
// Given a batch of strings, we will use the `pipe` function to get
// the vector embedding of each string.
for (let text of batch) {
// 'mean' pooling and normalizing allows the embeddings to share the
// same length.
const res = await pipe(text, { pooling: 'mean', normalize: true })
result.push(Array.from(res['data']))
}
return (result)
}
```
### Creating the database
Now, we will create the LanceDB database and add the embedding function we defined earlier.
```javascript
// Link a folder and create a table with data
const db = await lancedb.connect('data/sample-lancedb')
// You can also import any other data, but make sure that you have a column
// for the embedding function to use.
const data = [
{ id: 1, text: 'Cherry', type: 'fruit' },
{ id: 2, text: 'Carrot', type: 'vegetable' },
{ id: 3, text: 'Potato', type: 'vegetable' },
{ id: 4, text: 'Apple', type: 'fruit' },
{ id: 5, text: 'Banana', type: 'fruit' }
]
// Create the table with the embedding function
const table = await db.createTable('food_table', data, "create", embed_fun)
```
### Performing the search
Now, we can perform the search using the `search` function. LanceDB automatically uses the embedding function we defined earlier to get the vector embedding of the query string.
```javascript
// Query the table
const results = await table
.search("a sweet fruit to eat")
.metricType("cosine")
.limit(2)
.execute()
console.log(results.map(r => r.text))
```
```bash
[ 'Banana', 'Cherry' ]
```
Output of `results`:
```bash
[
{
vector: Float32Array(384) [
-0.057455405592918396,
0.03617725893855095,
-0.0367760956287384,
... 381 more items
],
id: 5,
text: 'Banana',
type: 'fruit',
_distance: 0.4919965863227844
},
{
vector: Float32Array(384) [
0.0009714411571621895,
0.008223623037338257,
0.009571489877998829,
... 381 more items
],
id: 1,
text: 'Cherry',
type: 'fruit',
_distance: 0.5540297031402588
}
]
```
### Wrapping it up
In this example, we showed how to use the `transformers.js` library to perform vector embedding search using LanceDB's Javascript API. You can find the full code for this example on [Github](https://github.com/lancedb/lancedb/blob/main/node/examples/js-transformers/index.js)!

View File

@@ -4,4 +4,10 @@
<img id="splash" width="400" alt="youtube transcript search" src="https://user-images.githubusercontent.com/917119/236965568-def7394d-171c-45f2-939d-8edfeaadd88c.png">
<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/youtube_bot/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
Scripts - [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/lancedb/vectordb-recipesexamples/youtube_bot/main.py) [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](https://github.com/lancedb/vectordb-recipes/examples/youtube_bot/index.js)
This example is in a [notebook](https://github.com/lancedb/lancedb/blob/main/docs/src/notebooks/youtube_transcript_search.ipynb)

View File

@@ -6,17 +6,33 @@ to make this available for JS as well.
## Installation
To use full text search, you must install optional dependency tantivy-py:
To use full text search, you must install the dependency `tantivy-py`:
# tantivy 0.19.2
pip install tantivy@git+https://github.com/quickwit-oss/tantivy-py#164adc87e1a033117001cf70e38c82a53014d985
# tantivy 0.20.1
```sh
pip install tantivy==0.20.1
```
## Quickstart
Assume:
1. `table` is a LanceDB Table
2. `text` is the name of the Table column that we want to index
2. `text` is the name of the `Table` column that we want to index
For example,
```python
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "text": "Frodo was a happy puppy"},
{"vector": [5.9, 26.5], "text": "There are several kittens playing"}])
```
To create the index:
@@ -27,7 +43,13 @@ table.create_fts_index("text")
To search:
```python
df = table.search("puppy").limit(10).select(["text"]).to_df()
table.search("puppy").limit(10).select(["text"]).to_list()
```
Which returns a list of dictionaries:
```python
[{'text': 'Frodo was a happy puppy', 'score': 0.6931471824645996}]
```
LanceDB automatically looks for an FTS index if the input is str.

408
docs/src/guides/tables.md Normal file
View File

@@ -0,0 +1,408 @@
<a href="https://colab.research.google.com/github/lancedb/lancedb/blob/main/docs/src/notebooks/tables_guide.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a><br/>
A Table is a collection of Records in a LanceDB Database. You can follow along on colab!
## Creating a LanceDB Table
=== "Python"
### LanceDB Connection
```python
import lancedb
db = lancedb.connect("./.lancedb")
```
LanceDB allows ingesting data from various sources - `dict`, `list[dict]`, `pd.DataFrame`, `pa.Table` or a `Iterator[pa.RecordBatch]`. Let's take a look at some of the these.
### From list of tuples or dictionaries
```python
import lancedb
db = lancedb.connect("./.lancedb")
data = [{"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
{"vector": [0.2, 1.8], "lat": 40.1, "long": -74.1}]
db.create_table("my_table", data)
db["my_table"].head()
```
!!! info "Note"
If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you can pass in mode="overwrite" to the createTable function.
```python
db.create_table("name", data, mode="overwrite")
```
### From pandas DataFrame
```python
import pandas as pd
data = pd.DataFrame({
"vector": [[1.1, 1.2, 1.3, 1.4], [0.2, 1.8, 0.4, 3.6]],
"lat": [45.5, 40.1],
"long": [-122.7, -74.1]
})
db.create_table("table2", data)
db["table2"].head()
```
!!! info "Note"
Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.
```python
custom_schema = pa.schema([
pa.field("vector", pa.list_(pa.float32(), 4)),
pa.field("lat", pa.float32()),
pa.field("long", pa.float32())
])
table = db.create_table("table3", data, schema=custom_schema)
```
### From PyArrow Tables
You can also create LanceDB tables directly from pyarrow tables
```python
table = pa.Table.from_arrays(
[
pa.array([[3.1, 4.1, 5.1, 6.1], [5.9, 26.5, 4.7, 32.8]],
pa.list_(pa.float32(), 4)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
)
db = lancedb.connect("db")
tbl = db.create_table("test1", table)
```
### From Pydantic Models
When you create an empty table without data, you must specify the table schema.
LanceDB supports creating tables by specifying a pyarrow schema or a specialized
pydantic model called `LanceModel`.
For example, the following Content model specifies a table with 5 columns:
movie_id, vector, genres, title, and imdb_id. When you create a table, you can
pass the class as the value of the `schema` parameter to `create_table`.
The `vector` column is a `Vector` type, which is a specialized pydantic type that
can be configured with the vector dimensions. It is also important to note that
LanceDB only understands subclasses of `lancedb.pydantic.LanceModel`
(which itself derives from `pydantic.BaseModel`).
```python
from lancedb.pydantic import Vector, LanceModel
class Content(LanceModel):
movie_id: int
vector: Vector(128)
genres: str
title: str
imdb_id: int
@property
def imdb_url(self) -> str:
return f"https://www.imdb.com/title/tt{self.imdb_id}"
import pyarrow as pa
db = lancedb.connect("~/.lancedb")
table_name = "movielens_small"
table = db.create_table(table_name, schema=Content)
```
### Using Iterators / Writing Large Datasets
It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`
LanceDB additionally supports pyarrow's `RecordBatch` Iterators or other generators producing supported data types.
Here's an example using using `RecordBatch` iterator for creating tables.
```python
import pyarrow as pa
def make_batches():
for i in range(5):
yield pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1, 5.1, 6.1], [5.9, 26.5, 4.7, 32.8]],
pa.list_(pa.float32(), 4)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
)
schema = pa.schema([
pa.field("vector", pa.list_(pa.float32(), 4)),
pa.field("item", pa.utf8()),
pa.field("price", pa.float32()),
])
db.create_table("table4", make_batches(), schema=schema)
```
You can also use iterators of other types like Pandas dataframe or Pylists directly in the above example.
## Creating Empty Table
You can also create empty tables in python. Initialize it with schema and later ingest data into it.
```python
import lancedb
import pyarrow as pa
schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), 2)),
pa.field("item", pa.string()),
pa.field("price", pa.float32()),
])
tbl = db.create_table("table5", schema=schema)
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
]
tbl.add(data=data)
```
You can also use Pydantic to specify the schema
```python
import lancedb
from lancedb.pydantic import LanceModel, vector
class Model(LanceModel):
vector: Vector(2)
tbl = db.create_table("table5", schema=Model.to_arrow_schema())
```
=== "Javascript/Typescript"
### VectorDB Connection
```javascript
const lancedb = require("vectordb");
const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);
```
### Creating a Table
You can create a LanceDB table in javascript using an array of records.
```javascript
data
const tb = await db.createTable("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
```
!!! info "Note"
If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you need to specify the `WriteMode` in the createTable function.
```javascript
const table = await con.createTable(tableName, data, { writeMode: WriteMode.Overwrite })
```
## Open existing tables
If you forget the name of your table, you can always get a listing of all table names:
=== "Python"
### Get a list of existing Tables
```python
print(db.table_names())
```
=== "Javascript/Typescript"
```javascript
console.log(await db.tableNames());
```
Then, you can open any existing tables
=== "Python"
```python
tbl = db.open_table("my_table")
```
=== "Javascript/Typescript"
```javascript
const tbl = await db.openTable("my_table");
```
## Adding to a Table
After a table has been created, you can always add more data to it using
=== "Python"
You can add any of the valid data structures accepted by LanceDB table, i.e, `dict`, `list[dict]`, `pd.DataFrame`, or a `Iterator[pa.RecordBatch]`. Here are some examples.
### Adding Pandas DataFrame
```python
df = pd.DataFrame({
"vector": [[1.3, 1.4], [9.5, 56.2]], "item": ["fizz", "buzz"], "price": [100.0, 200.0]
})
tbl.add(df)
```
You can also add a large dataset batch in one go using Iterator of any supported data types.
### Adding to table using Iterator
```python
def make_batches():
for i in range(5):
yield [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
tbl.add(make_batches())
```
The other arguments accepted:
| Name | Type | Description | Default |
|---|---|---|---|
| data | DATA | The data to insert into the table. | required |
| mode | str | The mode to use when writing the data. Valid values are "append" and "overwrite". | append |
| on_bad_vectors | str | What to do if any of the vectors are not the same size or contains NaNs. One of "error", "drop", "fill". | drop |
| fill value | float | The value to use when filling vectors: Only used if on_bad_vectors="fill". | 0.0 |
=== "Javascript/Typescript"
```javascript
await tbl.add([{vector: [1.3, 1.4], item: "fizz", price: 100.0},
{vector: [9.5, 56.2], item: "buzz", price: 200.0}])
```
## Deleting from a Table
Use the `delete()` method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter.
=== "Python"
```python
tbl.delete('item = "fizz"')
```
### Deleting row with specific column value
```python
import lancedb
data = [{"x": 1, "vector": [1, 2]},
{"x": 2, "vector": [3, 4]},
{"x": 3, "vector": [5, 6]}]
db = lancedb.connect("./.lancedb")
table = db.create_table("my_table", data)
table.to_pandas()
# x vector
# 0 1 [1.0, 2.0]
# 1 2 [3.0, 4.0]
# 2 3 [5.0, 6.0]
table.delete("x = 2")
table.to_pandas()
# x vector
# 0 1 [1.0, 2.0]
# 1 3 [5.0, 6.0]
```
### Delete from a list of values
```python
to_remove = [1, 5]
to_remove = ", ".join(str(v) for v in to_remove)
table.delete(f"x IN ({to_remove})")
table.to_pandas()
# x vector
# 0 3 [5.0, 6.0]
```
=== "Javascript/Typescript"
```javascript
await tbl.delete('item = "fizz"')
```
### Deleting row with specific column value
```javascript
const con = await lancedb.connect("./.lancedb")
const data = [
{id: 1, vector: [1, 2]},
{id: 2, vector: [3, 4]},
{id: 3, vector: [5, 6]},
];
const tbl = await con.createTable("my_table", data)
await tbl.delete("id = 2")
await tbl.countRows() // Returns 2
```
### Delete from a list of values
```javascript
const to_remove = [1, 5];
await tbl.delete(`id IN (${to_remove.join(",")})`)
await tbl.countRows() // Returns 1
```
### Updating a Table [Experimental]
EXPERIMENTAL: Update rows in the table (not threadsafe).
This can be used to update zero to all rows depending on how many rows match the where clause.
| Parameter | Type | Description |
|---|---|---|
| `where` | `str` | The SQL where clause to use when updating rows. For example, `'x = 2'` or `'x IN (1, 2, 3)'`. The filter must not be empty, or it will error. |
| `values` | `dict` | The values to update. The keys are the column names and the values are the values to set. |
=== "Python"
```python
import lancedb
import pandas as pd
# Create a lancedb connection
db = lancedb.connect("./.lancedb")
# Create a table from a pandas DataFrame
data = pd.DataFrame({"x": [1, 2, 3], "vector": [[1, 2], [3, 4], [5, 6]]})
table = db.create_table("my_table", data)
# Update the table where x = 2
table.update(where="x = 2", values={"vector": [10, 10]})
# Get the updated table as a pandas DataFrame
df = table.to_pandas()
# Print the DataFrame
print(df)
```
Output
```shell
x vector
0 1 [1.0, 2.0]
1 3 [5.0, 6.0]
2 2 [10.0, 10.0]
```
## What's Next?
Learn how to Query your tables and create indices

View File

@@ -1,20 +1,23 @@
# Welcome to LanceDB's Documentation
# LanceDB
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
![Illustration](/lancedb/assets/ecosystem-illustration.png)
The key features of LanceDB include:
* Production-scale vector search with no servers to manage.
* Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
* Support for vector similarity search, full-text search and SQL.
* Support for production-scale vector similarity search, full-text search and SQL, with no servers to manage.
* Native Python and Javascript/Typescript support.
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
* Ecosystem integrations with [LangChain 🦜️🔗](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lancedb.html), [LlamaIndex 🦙](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html), Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
* Persisted on HDD, allowing scalability without breaking the bank.
* Ingest your favorite data formats directly, like pandas DataFrames, Pydantic objects and more.
LanceDB's core is written in Rust 🦀 and is built using <a href="https://github.com/lancedb/lance">Lance</a>, an open-source columnar format designed for performant ML workloads.
@@ -28,12 +31,12 @@ LanceDB's core is written in Rust 🦀 and is built using <a href="https://githu
```python
import lancedb
uri = "/tmp/lancedb"
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_df()
result = table.search([100, 100]).limit(2).to_list()
```
=== "Javascript"
@@ -44,7 +47,7 @@ LanceDB's core is written in Rust 🦀 and is built using <a href="https://githu
```javascript
const lancedb = require("vectordb");
const uri = "/tmp/lancedb";
const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);
const table = await db.createTable("my_table",
[{ id: 1, vector: [3.1, 4.1], item: "foo", price: 10.0 },
@@ -64,9 +67,9 @@ LanceDB's core is written in Rust 🦀 and is built using <a href="https://githu
## Documentation Quick Links
* [`Basic Operations`](basic.md) - basic functionality of LanceDB.
* [`Embedding Functions`](embedding.md) - functions for working with embeddings.
* [`Embedding Functions`](embeddings/index.md) - functions for working with embeddings.
* [`Indexing`](ann_indexes.md) - create vector indexes to speed up queries.
* [`Full text search`](fts.md) - [EXPERIMENTAL] full-text search API
* [`Ecosystem Integrations`](integrations.md) - integrating LanceDB with python data tooling ecosystem.
* [`Ecosystem Integrations`](python/integration.md) - integrating LanceDB with python data tooling ecosystem.
* [`Python API Reference`](python/python.md) - detailed documentation for the LanceDB Python SDK.
* [`Node API Reference`](javascript/modules.md) - detailed documentation for the LanceDB Python SDK.
* [`Node API Reference`](javascript/modules.md) - detailed documentation for the LanceDB Node SDK.

View File

@@ -1,108 +0,0 @@
# Integrations
Built on top of Apache Arrow, `LanceDB` is easy to integrate with the Python ecosystem, including Pandas, PyArrow and DuckDB.
## Pandas and PyArrow
First, we need to connect to a `LanceDB` database.
``` py
import lancedb
db = lancedb.connect("/tmp/lancedb")
```
And write a `Pandas DataFrame` to LanceDB directly.
```py
import pandas as pd
data = pd.DataFrame({
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0]
})
table = db.create_table("pd_table", data=data)
```
You will find detailed instructions of creating dataset and index in [Basic Operations](basic.md) and [Indexing](indexing.md)
sections.
We can now perform similarity searches via `LanceDB`.
```py
# Open the table previously created.
table = db.open_table("pd_table")
query_vector = [100, 100]
# Pandas DataFrame
df = table.search(query_vector).limit(1).to_df()
print(df)
```
```
vector item price score
0 [5.9, 26.5] bar 20.0 14257.05957
```
If you have a simple filter, it's faster to provide a where clause to `LanceDB`'s search query.
If you have more complex criteria, you can always apply the filter to the resulting pandas `DataFrame` from the search query.
```python
# Apply the filter via LanceDB
results = table.search([100, 100]).where("price < 15").to_df()
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
# Apply the filter via Pandas
df = results = table.search([100, 100]).to_df()
results = df[df.price < 15]
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
```
## DuckDB
`LanceDB` works with `DuckDB` via [PyArrow integration](https://duckdb.org/docs/guides/python/sql_on_arrow).
Let us start with installing `duckdb` and `lancedb`.
```shell
pip install duckdb lancedb
```
We will re-use the dataset created previously
```python
import lancedb
db = lancedb.connect("/tmp/lancedb")
table = db.open_table("pd_table")
arrow_table = table.to_arrow()
```
`DuckDB` can directly query the `arrow_table`:
```python
In [15]: duckdb.query("SELECT * FROM t")
Out[15]:
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
In [16]: duckdb.query("SELECT mean(price) FROM t")
Out[16]:
┌─────────────┐
│ mean(price) │
│ double │
├─────────────┤
│ 15.0 │
└─────────────┘
```

View File

@@ -0,0 +1,21 @@
# Integrations
## Data Formats
LanceDB supports ingesting from your favorite data tools.
![Illustration](/lancedb/assets/ecosystem-illustration.png)
## Tools
LanceDB is integrated with most of the popular AI tools, with more coming soon.
Get started using these examples and quick links.
| Integrations | |
|---|---:|
| <h3> LlamaIndex </h3>LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. Llama index integrates with LanceDB as the serverless VectorDB. <h3>[Lean More](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html) </h3> |<img src="../assets/llama-index.jpg" alt="image" width="150" height="auto">|
| <h3>Langchain</h3>Langchain allows building applications with LLMs through composability <h3>[Lean More](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lancedb.html) | <img src="../assets/langchain.png" alt="image" width="150" height="auto">|
| <h3>Langchain TS</h3> Javascript bindings for Langchain. It integrates with LanceDB's serverless vectordb allowing you to build powerful AI applications through composibility using only serverless functions. <h3>[Learn More]( https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/lancedb) | <img src="../assets/langchain.png" alt="image" width="150" height="auto">|
| <h3>Voxel51</h3> It is an open source toolkit that enables you to build better computer vision workflows by improving the quality of your datasets and delivering insights about your models.<h3>[Learn More](./voxel51.md) | <img src="../assets/voxel.gif" alt="image" width="150" height="auto">|
| <h3>PromptTools</h3> Offers a set of free, open-source tools for testing and experimenting with models, prompts, and configurations. The core idea is to enable developers to evaluate prompts using familiar interfaces like code and notebooks. You can use it to experiment with different configurations of LanceDB, and test how LanceDB integrates with the LLM of your choice.<h3>[Learn More](./prompttools.md) | <img src="../assets/prompttools.jpeg" alt="image" width="150" height="auto">|

View File

@@ -0,0 +1,7 @@
[PromptTools](https://github.com/hegelai/prompttools) offers a set of free, open-source tools for testing and experimenting with models, prompts, and configurations. The core idea is to enable developers to evaluate prompts using familiar interfaces like code and notebooks. You can use it to experiment with different configurations of LanceDB, and test how LanceDB integrates with the LLM of your choice.
[Evaluating Prompts with PromptTools](./examples/prompttools-eval-prompts/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/prompttools-eval-prompts/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
![Alt text](https://prompttools.readthedocs.io/en/latest/_images/demo.gif "a title")

View File

@@ -0,0 +1,71 @@
![example](/assets/voxel.gif)
Basic recipe
____________
The basic workflow to use LanceDB to create a similarity index on your FiftyOne
datasets and use this to query your data is as follows:
1) Load a dataset into FiftyOne
2) Compute embedding vectors for samples or patches in your dataset, or select
a model to use to generate embeddings
3) Use the `compute_similarity()`
method to generate a LanceDB table for the samples or object
patches embeddings in a dataset by setting the parameter `backend="lancedb"` and
specifying a `brain_key` of your choice
4) Use this LanceDB table to query your data with
`sort_by_similarity()`
5) If desired, delete the table
The example below demonstrates this workflow.
!!! Note
You must install the LanceDB Python client to run this
```
pip install lancedb
```
```python
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Step 1: Load your data into FiftyOne
dataset = foz.load_zoo_dataset("quickstart")
# Steps 2 and 3: Compute embeddings and create a similarity index
lancedb_index = fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="lancedb_index",
backend="lancedb",
)
```
Once the similarity index has been generated, we can query our data in FiftyOne
by specifying the `brain_key`:
```python
# Step 4: Query your data
query = dataset.first().id # query by sample ID
view = dataset.sort_by_similarity(
query,
brain_key="lancedb_index",
k=10, # limit to 10 most similar samples
)
# Step 5 (optional): Cleanup
# Delete the LanceDB table
lancedb_index.cleanup()
# Delete run record from FiftyOne
dataset.delete_brain_run("lancedb_index")
```
More in depth walkthrough of the integration, visit the LanceDB guide on Voxel51 - [LaceDB x Voxel51](https://docs.voxel51.com/integrations/lancedb.html)

View File

@@ -10,15 +10,21 @@ A JavaScript / Node.js library for [LanceDB](https://github.com/lancedb/lancedb)
npm install vectordb
```
This will download the appropriate native library for your platform. We currently
support x86_64 Linux, aarch64 Linux, Intel MacOS, and ARM (M1/M2) MacOS. We do not
yet support Windows or musl-based Linux (such as Alpine Linux).
## Usage
### Basic Example
```javascript
const lancedb = require('vectordb');
const db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>');
const table = await db.openTable('my_table');
const query = await table.search([0.1, 0.3]).setLimit(20).execute();
const db = await lancedb.connect('data/sample-lancedb');
const table = await db.createTable("my_table",
[{ id: 1, vector: [0.1, 1.0], item: "foo", price: 10.0 },
{ id: 2, vector: [3.9, 0.5], item: "bar", price: 20.0 }])
const results = await table.search([0.1, 0.3]).limit(20).execute();
console.log(results);
```
@@ -26,17 +32,33 @@ The [examples](./examples) folder contains complete examples.
## Development
The LanceDB javascript is built with npm:
To build everything fresh:
```bash
npm install
npm run tsc
npm run build
```
Then you should be able to run the tests with:
```bash
npm test
```
### Rebuilding Rust library
```bash
npm run build
```
### Rebuilding Typescript
```bash
npm run tsc
```
Run the tests with
```bash
npm test
```
### Fix lints
To run the linter and have it automatically fix all errors

View File

@@ -1,211 +0,0 @@
[vectordb](../README.md) / [Exports](../modules.md) / Connection
# Class: Connection
A connection to a LanceDB database.
## Table of contents
### Constructors
- [constructor](Connection.md#constructor)
### Properties
- [\_db](Connection.md#_db)
- [\_uri](Connection.md#_uri)
### Accessors
- [uri](Connection.md#uri)
### Methods
- [createTable](Connection.md#createtable)
- [createTableArrow](Connection.md#createtablearrow)
- [openTable](Connection.md#opentable)
- [tableNames](Connection.md#tablenames)
## Constructors
### constructor
**new Connection**(`db`, `uri`)
#### Parameters
| Name | Type |
| :------ | :------ |
| `db` | `any` |
| `uri` | `string` |
#### Defined in
[index.ts:46](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L46)
## Properties
### \_db
`Private` `Readonly` **\_db**: `any`
#### Defined in
[index.ts:44](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L44)
___
### \_uri
`Private` `Readonly` **\_uri**: `string`
#### Defined in
[index.ts:43](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L43)
## Accessors
### uri
`get` **uri**(): `string`
#### Returns
`string`
#### Defined in
[index.ts:51](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L51)
## Methods
### createTable
**createTable**(`name`, `data`): `Promise`<[`Table`](Table.md)<`number`[]\>\>
Creates a new Table and initialize it with new data.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the Table |
#### Returns
`Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Defined in
[index.ts:91](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L91)
**createTable**<`T`\>(`name`, `data`, `embeddings`): `Promise`<[`Table`](Table.md)<`T`\>\>
Creates a new Table and initialize it with new data.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the Table |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](Table.md)<`T`\>\>
#### Defined in
[index.ts:99](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L99)
___
### createTableArrow
**createTableArrow**(`name`, `table`): `Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `table` | `Table`<`any`\> |
#### Returns
`Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Defined in
[index.ts:109](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L109)
___
### openTable
**openTable**(`name`): `Promise`<[`Table`](Table.md)<`number`[]\>\>
Open a table in the database.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
#### Returns
`Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Defined in
[index.ts:67](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L67)
**openTable**<`T`\>(`name`, `embeddings`): `Promise`<[`Table`](Table.md)<`T`\>\>
Open a table in the database.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](Table.md)<`T`\>\>
#### Defined in
[index.ts:74](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L74)
___
### tableNames
**tableNames**(): `Promise`<`string`[]\>
Get the names of all tables in the database.
#### Returns
`Promise`<`string`[]\>
#### Defined in
[index.ts:58](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L58)

View File

@@ -0,0 +1,350 @@
[vectordb](../README.md) / [Exports](../modules.md) / LocalConnection
# Class: LocalConnection
A connection to a LanceDB database.
## Implements
- [`Connection`](../interfaces/Connection.md)
## Table of contents
### Constructors
- [constructor](LocalConnection.md#constructor)
### Properties
- [\_db](LocalConnection.md#_db)
- [\_options](LocalConnection.md#_options)
### Accessors
- [uri](LocalConnection.md#uri)
### Methods
- [createTable](LocalConnection.md#createtable)
- [createTableArrow](LocalConnection.md#createtablearrow)
- [dropTable](LocalConnection.md#droptable)
- [openTable](LocalConnection.md#opentable)
- [tableNames](LocalConnection.md#tablenames)
## Constructors
### constructor
**new LocalConnection**(`db`, `options`)
#### Parameters
| Name | Type |
| :------ | :------ |
| `db` | `any` |
| `options` | [`ConnectionOptions`](../interfaces/ConnectionOptions.md) |
#### Defined in
[index.ts:184](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L184)
## Properties
### \_db
`Private` `Readonly` **\_db**: `any`
#### Defined in
[index.ts:182](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L182)
___
### \_options
`Private` `Readonly` **\_options**: [`ConnectionOptions`](../interfaces/ConnectionOptions.md)
#### Defined in
[index.ts:181](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L181)
## Accessors
### uri
`get` **uri**(): `string`
#### Returns
`string`
#### Implementation of
[Connection](../interfaces/Connection.md).[uri](../interfaces/Connection.md#uri)
#### Defined in
[index.ts:189](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L189)
## Methods
### createTable
**createTable**(`name`, `data`, `mode?`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
Creates a new Table and initialize it with new data.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the Table |
| `mode?` | [`WriteMode`](../enums/WriteMode.md) | The write mode to use when creating the table. |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
[Connection](../interfaces/Connection.md).[createTable](../interfaces/Connection.md#createtable)
#### Defined in
[index.ts:230](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L230)
**createTable**(`name`, `data`, `mode`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `data` | `Record`<`string`, `unknown`\>[] |
| `mode` | [`WriteMode`](../enums/WriteMode.md) |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
Connection.createTable
#### Defined in
[index.ts:231](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L231)
**createTable**<`T`\>(`name`, `data`, `mode`, `embeddings`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
Creates a new Table and initialize it with new data.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the Table |
| `mode` | [`WriteMode`](../enums/WriteMode.md) | The write mode to use when creating the table. |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.createTable
#### Defined in
[index.ts:241](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L241)
**createTable**<`T`\>(`name`, `data`, `mode`, `embeddings?`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `data` | `Record`<`string`, `unknown`\>[] |
| `mode` | [`WriteMode`](../enums/WriteMode.md) |
| `embeddings?` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.createTable
#### Defined in
[index.ts:242](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L242)
___
### createTableArrow
**createTableArrow**(`name`, `table`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `table` | `Table`<`any`\> |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
[Connection](../interfaces/Connection.md).[createTableArrow](../interfaces/Connection.md#createtablearrow)
#### Defined in
[index.ts:266](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L266)
___
### dropTable
**dropTable**(`name`): `Promise`<`void`\>
Drop an existing table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table to drop. |
#### Returns
`Promise`<`void`\>
#### Implementation of
[Connection](../interfaces/Connection.md).[dropTable](../interfaces/Connection.md#droptable)
#### Defined in
[index.ts:276](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L276)
___
### openTable
**openTable**(`name`): `Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
Open a table in the database.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`number`[]\>\>
#### Implementation of
[Connection](../interfaces/Connection.md).[openTable](../interfaces/Connection.md#opentable)
#### Defined in
[index.ts:205](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L205)
**openTable**<`T`\>(`name`, `embeddings`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
Open a table in the database.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use on this Table |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.openTable
#### Defined in
[index.ts:212](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L212)
**openTable**<`T`\>(`name`, `embeddings?`): `Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `embeddings?` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> |
#### Returns
`Promise`<[`Table`](../interfaces/Table.md)<`T`\>\>
#### Implementation of
Connection.openTable
#### Defined in
[index.ts:213](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L213)
___
### tableNames
**tableNames**(): `Promise`<`string`[]\>
Get the names of all tables in the database.
#### Returns
`Promise`<`string`[]\>
#### Implementation of
[Connection](../interfaces/Connection.md).[tableNames](../interfaces/Connection.md#tablenames)
#### Defined in
[index.ts:196](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L196)

View File

@@ -0,0 +1,302 @@
[vectordb](../README.md) / [Exports](../modules.md) / LocalTable
# Class: LocalTable<T\>
A LanceDB Table is the collection of Records. Each Record has one or more vector fields.
## Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
## Implements
- [`Table`](../interfaces/Table.md)<`T`\>
## Table of contents
### Constructors
- [constructor](LocalTable.md#constructor)
### Properties
- [\_embeddings](LocalTable.md#_embeddings)
- [\_name](LocalTable.md#_name)
- [\_options](LocalTable.md#_options)
- [\_tbl](LocalTable.md#_tbl)
### Accessors
- [name](LocalTable.md#name)
### Methods
- [add](LocalTable.md#add)
- [countRows](LocalTable.md#countrows)
- [createIndex](LocalTable.md#createindex)
- [delete](LocalTable.md#delete)
- [overwrite](LocalTable.md#overwrite)
- [search](LocalTable.md#search)
## Constructors
### constructor
**new LocalTable**<`T`\>(`tbl`, `name`, `options`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type |
| :------ | :------ |
| `tbl` | `any` |
| `name` | `string` |
| `options` | [`ConnectionOptions`](../interfaces/ConnectionOptions.md) |
#### Defined in
[index.ts:287](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L287)
**new LocalTable**<`T`\>(`tbl`, `name`, `options`, `embeddings`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `tbl` | `any` | |
| `name` | `string` | |
| `options` | [`ConnectionOptions`](../interfaces/ConnectionOptions.md) | |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use when interacting with this table |
#### Defined in
[index.ts:294](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L294)
## Properties
### \_embeddings
`Private` `Optional` `Readonly` **\_embeddings**: [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\>
#### Defined in
[index.ts:284](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L284)
___
### \_name
`Private` `Readonly` **\_name**: `string`
#### Defined in
[index.ts:283](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L283)
___
### \_options
`Private` `Readonly` **\_options**: [`ConnectionOptions`](../interfaces/ConnectionOptions.md)
#### Defined in
[index.ts:285](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L285)
___
### \_tbl
`Private` `Readonly` **\_tbl**: `any`
#### Defined in
[index.ts:282](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L282)
## Accessors
### name
`get` **name**(): `string`
#### Returns
`string`
#### Implementation of
[Table](../interfaces/Table.md).[name](../interfaces/Table.md#name)
#### Defined in
[index.ts:302](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L302)
## Methods
### add
**add**(`data`): `Promise`<`number`\>
Insert records into this Table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Implementation of
[Table](../interfaces/Table.md).[add](../interfaces/Table.md#add)
#### Defined in
[index.ts:320](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L320)
___
### countRows
**countRows**(): `Promise`<`number`\>
Returns the number of rows in this table.
#### Returns
`Promise`<`number`\>
#### Implementation of
[Table](../interfaces/Table.md).[countRows](../interfaces/Table.md#countrows)
#### Defined in
[index.ts:362](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L362)
___
### createIndex
**createIndex**(`indexParams`): `Promise`<`any`\>
Create an ANN index on this Table vector index.
**`See`**
VectorIndexParams.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `indexParams` | [`IvfPQIndexConfig`](../interfaces/IvfPQIndexConfig.md) | The parameters of this Index, |
#### Returns
`Promise`<`any`\>
#### Implementation of
[Table](../interfaces/Table.md).[createIndex](../interfaces/Table.md#createindex)
#### Defined in
[index.ts:355](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L355)
___
### delete
**delete**(`filter`): `Promise`<`void`\>
Delete rows from this table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `filter` | `string` | A filter in the same format used by a sql WHERE clause. |
#### Returns
`Promise`<`void`\>
#### Implementation of
[Table](../interfaces/Table.md).[delete](../interfaces/Table.md#delete)
#### Defined in
[index.ts:371](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L371)
___
### overwrite
**overwrite**(`data`): `Promise`<`number`\>
Insert records into this Table, replacing its contents.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Implementation of
[Table](../interfaces/Table.md).[overwrite](../interfaces/Table.md#overwrite)
#### Defined in
[index.ts:338](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L338)
___
### search
**search**(`query`): [`Query`](Query.md)<`T`\>
Creates a search query to find the nearest neighbors of the given search term
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `query` | `T` | The query search term |
#### Returns
[`Query`](Query.md)<`T`\>
#### Implementation of
[Table](../interfaces/Table.md).[search](../interfaces/Table.md#search)
#### Defined in
[index.ts:310](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L310)

View File

@@ -40,7 +40,7 @@ An embedding function that automatically creates vector representation for a giv
#### Defined in
[embedding/openai.ts:21](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/openai.ts#L21)
[embedding/openai.ts:21](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/openai.ts#L21)
## Properties
@@ -50,7 +50,7 @@ An embedding function that automatically creates vector representation for a giv
#### Defined in
[embedding/openai.ts:19](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/openai.ts#L19)
[embedding/openai.ts:19](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/openai.ts#L19)
___
@@ -60,7 +60,7 @@ ___
#### Defined in
[embedding/openai.ts:18](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/openai.ts#L18)
[embedding/openai.ts:18](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/openai.ts#L18)
___
@@ -76,7 +76,7 @@ The name of the column that will be used as input for the Embedding Function.
#### Defined in
[embedding/openai.ts:50](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/openai.ts#L50)
[embedding/openai.ts:50](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/openai.ts#L50)
## Methods
@@ -102,4 +102,4 @@ Creates a vector representation for the given values.
#### Defined in
[embedding/openai.ts:38](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/openai.ts#L38)
[embedding/openai.ts:38](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/openai.ts#L38)

View File

@@ -18,7 +18,6 @@ A builder for nearest neighbor queries for LanceDB.
### Properties
- [\_columns](Query.md#_columns)
- [\_embeddings](Query.md#_embeddings)
- [\_filter](Query.md#_filter)
- [\_limit](Query.md#_limit)
@@ -27,7 +26,9 @@ A builder for nearest neighbor queries for LanceDB.
- [\_query](Query.md#_query)
- [\_queryVector](Query.md#_queryvector)
- [\_refineFactor](Query.md#_refinefactor)
- [\_select](Query.md#_select)
- [\_tbl](Query.md#_tbl)
- [where](Query.md#where)
### Methods
@@ -37,6 +38,7 @@ A builder for nearest neighbor queries for LanceDB.
- [metricType](Query.md#metrictype)
- [nprobes](Query.md#nprobes)
- [refineFactor](Query.md#refinefactor)
- [select](Query.md#select)
## Constructors
@@ -60,27 +62,17 @@ A builder for nearest neighbor queries for LanceDB.
#### Defined in
[index.ts:241](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L241)
[index.ts:448](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L448)
## Properties
### \_columns
`Private` `Optional` `Readonly` **\_columns**: `string`[]
#### Defined in
[index.ts:236](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L236)
___
### \_embeddings
`Private` `Optional` `Readonly` **\_embeddings**: [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\>
#### Defined in
[index.ts:239](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L239)
[index.ts:446](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L446)
___
@@ -90,7 +82,7 @@ ___
#### Defined in
[index.ts:237](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L237)
[index.ts:444](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L444)
___
@@ -100,7 +92,7 @@ ___
#### Defined in
[index.ts:233](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L233)
[index.ts:440](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L440)
___
@@ -110,7 +102,7 @@ ___
#### Defined in
[index.ts:238](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L238)
[index.ts:445](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L445)
___
@@ -120,7 +112,7 @@ ___
#### Defined in
[index.ts:235](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L235)
[index.ts:442](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L442)
___
@@ -130,7 +122,7 @@ ___
#### Defined in
[index.ts:231](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L231)
[index.ts:438](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L438)
___
@@ -140,7 +132,7 @@ ___
#### Defined in
[index.ts:232](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L232)
[index.ts:439](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L439)
___
@@ -150,7 +142,17 @@ ___
#### Defined in
[index.ts:234](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L234)
[index.ts:441](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L441)
___
### \_select
`Private` `Optional` **\_select**: `string`[]
#### Defined in
[index.ts:443](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L443)
___
@@ -160,7 +162,33 @@ ___
#### Defined in
[index.ts:230](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L230)
[index.ts:437](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L437)
___
### where
**where**: (`value`: `string`) => [`Query`](Query.md)<`T`\>
#### Type declaration
▸ (`value`): [`Query`](Query.md)<`T`\>
A filter statement to be applied to this query.
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `value` | `string` | A filter in the same format used by a sql WHERE clause. |
##### Returns
[`Query`](Query.md)<`T`\>
#### Defined in
[index.ts:496](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L496)
## Methods
@@ -182,7 +210,7 @@ Execute the query and return the results as an Array of Objects
#### Defined in
[index.ts:301](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L301)
[index.ts:519](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L519)
___
@@ -204,7 +232,7 @@ A filter statement to be applied to this query.
#### Defined in
[index.ts:284](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L284)
[index.ts:491](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L491)
___
@@ -226,7 +254,7 @@ Sets the number of results that will be returned
#### Defined in
[index.ts:257](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L257)
[index.ts:464](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L464)
___
@@ -252,7 +280,7 @@ MetricType for the different options
#### Defined in
[index.ts:293](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L293)
[index.ts:511](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L511)
___
@@ -274,7 +302,7 @@ The number of probes used. A higher number makes search more accurate but also s
#### Defined in
[index.ts:275](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L275)
[index.ts:482](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L482)
___
@@ -296,4 +324,26 @@ Refine the results by reading extra elements and re-ranking them in memory.
#### Defined in
[index.ts:266](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L266)
[index.ts:473](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L473)
___
### select
**select**(`value`): [`Query`](Query.md)<`T`\>
Return only the specified columns.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `value` | `string`[] | Only select the specified columns. If not specified, all columns will be returned. |
#### Returns
[`Query`](Query.md)<`T`\>
#### Defined in
[index.ts:502](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L502)

View File

@@ -1,215 +0,0 @@
[vectordb](../README.md) / [Exports](../modules.md) / Table
# Class: Table<T\>
## Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
## Table of contents
### Constructors
- [constructor](Table.md#constructor)
### Properties
- [\_embeddings](Table.md#_embeddings)
- [\_name](Table.md#_name)
- [\_tbl](Table.md#_tbl)
### Accessors
- [name](Table.md#name)
### Methods
- [add](Table.md#add)
- [create\_index](Table.md#create_index)
- [overwrite](Table.md#overwrite)
- [search](Table.md#search)
## Constructors
### constructor
**new Table**<`T`\>(`tbl`, `name`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type |
| :------ | :------ |
| `tbl` | `any` |
| `name` | `string` |
#### Defined in
[index.ts:121](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L121)
**new Table**<`T`\>(`tbl`, `name`, `embeddings`)
#### Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `tbl` | `any` | |
| `name` | `string` | |
| `embeddings` | [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\> | An embedding function to use when interacting with this table |
#### Defined in
[index.ts:127](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L127)
## Properties
### \_embeddings
`Private` `Optional` `Readonly` **\_embeddings**: [`EmbeddingFunction`](../interfaces/EmbeddingFunction.md)<`T`\>
#### Defined in
[index.ts:119](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L119)
___
### \_name
`Private` `Readonly` **\_name**: `string`
#### Defined in
[index.ts:118](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L118)
___
### \_tbl
`Private` `Readonly` **\_tbl**: `any`
#### Defined in
[index.ts:117](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L117)
## Accessors
### name
`get` **name**(): `string`
#### Returns
`string`
#### Defined in
[index.ts:134](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L134)
## Methods
### add
**add**(`data`): `Promise`<`number`\>
Insert records into this Table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Defined in
[index.ts:152](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L152)
___
### create\_index
**create_index**(`indexParams`): `Promise`<`any`\>
Create an ANN index on this Table vector index.
**`See`**
VectorIndexParams.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `indexParams` | `IvfPQIndexConfig` | The parameters of this Index, |
#### Returns
`Promise`<`any`\>
#### Defined in
[index.ts:171](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L171)
___
### overwrite
**overwrite**(`data`): `Promise`<`number`\>
Insert records into this Table, replacing its contents.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
#### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Defined in
[index.ts:162](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L162)
___
### search
**search**(`query`): [`Query`](Query.md)<`T`\>
Creates a search query to find the nearest neighbors of the given search term
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `query` | `T` | The query search term |
#### Returns
[`Query`](Query.md)<`T`\>
#### Defined in
[index.ts:142](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L142)

View File

@@ -9,6 +9,7 @@ Distance metrics type.
### Enumeration Members
- [Cosine](MetricType.md#cosine)
- [Dot](MetricType.md#dot)
- [L2](MetricType.md#l2)
## Enumeration Members
@@ -21,7 +22,19 @@ Cosine distance
#### Defined in
[index.ts:341](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L341)
[index.ts:567](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L567)
___
### Dot
• **Dot** = ``"dot"``
Dot product
#### Defined in
[index.ts:572](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L572)
___
@@ -33,4 +46,4 @@ Euclidean distance
#### Defined in
[index.ts:336](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L336)
[index.ts:562](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L562)

View File

@@ -2,11 +2,14 @@
# Enumeration: WriteMode
Write mode for writing a table.
## Table of contents
### Enumeration Members
- [Append](WriteMode.md#append)
- [Create](WriteMode.md#create)
- [Overwrite](WriteMode.md#overwrite)
## Enumeration Members
@@ -15,9 +18,23 @@
**Append** = ``"append"``
Append new data to the table.
#### Defined in
[index.ts:326](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L326)
[index.ts:552](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L552)
___
### Create
• **Create** = ``"create"``
Create a new [Table](../interfaces/Table.md).
#### Defined in
[index.ts:548](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L548)
___
@@ -25,6 +42,8 @@ ___
• **Overwrite** = ``"overwrite"``
Overwrite the existing [Table](../interfaces/Table.md) if presented.
#### Defined in
[index.ts:325](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L325)
[index.ts:550](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L550)

View File

@@ -0,0 +1,41 @@
[vectordb](../README.md) / [Exports](../modules.md) / AwsCredentials
# Interface: AwsCredentials
## Table of contents
### Properties
- [accessKeyId](AwsCredentials.md#accesskeyid)
- [secretKey](AwsCredentials.md#secretkey)
- [sessionToken](AwsCredentials.md#sessiontoken)
## Properties
### accessKeyId
**accessKeyId**: `string`
#### Defined in
[index.ts:31](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L31)
___
### secretKey
**secretKey**: `string`
#### Defined in
[index.ts:33](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L33)
___
### sessionToken
`Optional` **sessionToken**: `string`
#### Defined in
[index.ts:35](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L35)

View File

@@ -0,0 +1,152 @@
[vectordb](../README.md) / [Exports](../modules.md) / Connection
# Interface: Connection
A LanceDB Connection that allows you to open tables and create new ones.
Connection could be local against filesystem or remote against a server.
## Implemented by
- [`LocalConnection`](../classes/LocalConnection.md)
## Table of contents
### Properties
- [uri](Connection.md#uri)
### Methods
- [createTable](Connection.md#createtable)
- [createTableArrow](Connection.md#createtablearrow)
- [dropTable](Connection.md#droptable)
- [openTable](Connection.md#opentable)
- [tableNames](Connection.md#tablenames)
## Properties
### uri
**uri**: `string`
#### Defined in
[index.ts:70](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L70)
## Methods
### createTable
**createTable**<`T`\>(`name`, `data`, `mode?`, `embeddings?`): `Promise`<[`Table`](Table.md)<`T`\>\>
Creates a new Table and initialize it with new data.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `data` | `Record`<`string`, `unknown`\>[] | Non-empty Array of Records to be inserted into the table |
| `mode?` | [`WriteMode`](../enums/WriteMode.md) | The write mode to use when creating the table. |
| `embeddings?` | [`EmbeddingFunction`](EmbeddingFunction.md)<`T`\> | An embedding function to use on this table |
#### Returns
`Promise`<[`Table`](Table.md)<`T`\>\>
#### Defined in
[index.ts:90](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L90)
___
### createTableArrow
**createTableArrow**(`name`, `table`): `Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `name` | `string` |
| `table` | `Table`<`any`\> |
#### Returns
`Promise`<[`Table`](Table.md)<`number`[]\>\>
#### Defined in
[index.ts:92](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L92)
___
### dropTable
**dropTable**(`name`): `Promise`<`void`\>
Drop an existing table.
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table to drop. |
#### Returns
`Promise`<`void`\>
#### Defined in
[index.ts:98](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L98)
___
### openTable
**openTable**<`T`\>(`name`, `embeddings?`): `Promise`<[`Table`](Table.md)<`T`\>\>
Open a table in the database.
#### Type parameters
| Name |
| :------ |
| `T` |
#### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `name` | `string` | The name of the table. |
| `embeddings?` | [`EmbeddingFunction`](EmbeddingFunction.md)<`T`\> | An embedding function to use on this table |
#### Returns
`Promise`<[`Table`](Table.md)<`T`\>\>
#### Defined in
[index.ts:80](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L80)
___
### tableNames
**tableNames**(): `Promise`<`string`[]\>
#### Returns
`Promise`<`string`[]\>
#### Defined in
[index.ts:72](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L72)

View File

@@ -0,0 +1,30 @@
[vectordb](../README.md) / [Exports](../modules.md) / ConnectionOptions
# Interface: ConnectionOptions
## Table of contents
### Properties
- [awsCredentials](ConnectionOptions.md#awscredentials)
- [uri](ConnectionOptions.md#uri)
## Properties
### awsCredentials
`Optional` **awsCredentials**: [`AwsCredentials`](AwsCredentials.md)
#### Defined in
[index.ts:40](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L40)
___
### uri
**uri**: `string`
#### Defined in
[index.ts:39](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L39)

View File

@@ -45,7 +45,7 @@ Creates a vector representation for the given values.
#### Defined in
[embedding/embedding_function.ts:27](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/embedding_function.ts#L27)
[embedding/embedding_function.ts:27](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/embedding_function.ts#L27)
___
@@ -57,4 +57,4 @@ The name of the column that will be used as input for the Embedding Function.
#### Defined in
[embedding/embedding_function.ts:22](https://github.com/lancedb/lancedb/blob/31dab97/node/src/embedding/embedding_function.ts#L22)
[embedding/embedding_function.ts:22](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/embedding/embedding_function.ts#L22)

View File

@@ -0,0 +1,149 @@
[vectordb](../README.md) / [Exports](../modules.md) / IvfPQIndexConfig
# Interface: IvfPQIndexConfig
## Table of contents
### Properties
- [column](IvfPQIndexConfig.md#column)
- [index\_name](IvfPQIndexConfig.md#index_name)
- [max\_iters](IvfPQIndexConfig.md#max_iters)
- [max\_opq\_iters](IvfPQIndexConfig.md#max_opq_iters)
- [metric\_type](IvfPQIndexConfig.md#metric_type)
- [num\_bits](IvfPQIndexConfig.md#num_bits)
- [num\_partitions](IvfPQIndexConfig.md#num_partitions)
- [num\_sub\_vectors](IvfPQIndexConfig.md#num_sub_vectors)
- [replace](IvfPQIndexConfig.md#replace)
- [type](IvfPQIndexConfig.md#type)
- [use\_opq](IvfPQIndexConfig.md#use_opq)
## Properties
### column
`Optional` **column**: `string`
The column to be indexed
#### Defined in
[index.ts:382](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L382)
___
### index\_name
`Optional` **index\_name**: `string`
A unique name for the index
#### Defined in
[index.ts:387](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L387)
___
### max\_iters
`Optional` **max\_iters**: `number`
The max number of iterations for kmeans training.
#### Defined in
[index.ts:402](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L402)
___
### max\_opq\_iters
`Optional` **max\_opq\_iters**: `number`
Max number of iterations to train OPQ, if `use_opq` is true.
#### Defined in
[index.ts:421](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L421)
___
### metric\_type
`Optional` **metric\_type**: [`MetricType`](../enums/MetricType.md)
Metric type, L2 or Cosine
#### Defined in
[index.ts:392](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L392)
___
### num\_bits
`Optional` **num\_bits**: `number`
The number of bits to present one PQ centroid.
#### Defined in
[index.ts:416](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L416)
___
### num\_partitions
`Optional` **num\_partitions**: `number`
The number of partitions this index
#### Defined in
[index.ts:397](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L397)
___
### num\_sub\_vectors
`Optional` **num\_sub\_vectors**: `number`
Number of subvectors to build PQ code
#### Defined in
[index.ts:412](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L412)
___
### replace
`Optional` **replace**: `boolean`
Replace an existing index with the same name if it exists.
#### Defined in
[index.ts:426](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L426)
___
### type
**type**: ``"ivf_pq"``
#### Defined in
[index.ts:428](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L428)
___
### use\_opq
• `Optional` **use\_opq**: `boolean`
Train as optimized product quantization.
#### Defined in
[index.ts:407](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L407)

View File

@@ -0,0 +1,221 @@
[vectordb](../README.md) / [Exports](../modules.md) / Table
# Interface: Table<T\>
A LanceDB Table is the collection of Records. Each Record has one or more vector fields.
## Type parameters
| Name | Type |
| :------ | :------ |
| `T` | `number`[] |
## Implemented by
- [`LocalTable`](../classes/LocalTable.md)
## Table of contents
### Properties
- [add](Table.md#add)
- [countRows](Table.md#countrows)
- [createIndex](Table.md#createindex)
- [delete](Table.md#delete)
- [name](Table.md#name)
- [overwrite](Table.md#overwrite)
- [search](Table.md#search)
## Properties
### add
**add**: (`data`: `Record`<`string`, `unknown`\>[]) => `Promise`<`number`\>
#### Type declaration
▸ (`data`): `Promise`<`number`\>
Insert records into this Table.
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
##### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Defined in
[index.ts:120](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L120)
___
### countRows
**countRows**: () => `Promise`<`number`\>
#### Type declaration
▸ (): `Promise`<`number`\>
Returns the number of rows in this table.
##### Returns
`Promise`<`number`\>
#### Defined in
[index.ts:140](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L140)
___
### createIndex
**createIndex**: (`indexParams`: [`IvfPQIndexConfig`](IvfPQIndexConfig.md)) => `Promise`<`any`\>
#### Type declaration
▸ (`indexParams`): `Promise`<`any`\>
Create an ANN index on this Table vector index.
**`See`**
VectorIndexParams.
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `indexParams` | [`IvfPQIndexConfig`](IvfPQIndexConfig.md) | The parameters of this Index, |
##### Returns
`Promise`<`any`\>
#### Defined in
[index.ts:135](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L135)
___
### delete
**delete**: (`filter`: `string`) => `Promise`<`void`\>
#### Type declaration
▸ (`filter`): `Promise`<`void`\>
Delete rows from this table.
This can be used to delete a single row, many rows, all rows, or
sometimes no rows (if your predicate matches nothing).
**`Examples`**
```ts
const con = await lancedb.connect("./.lancedb")
const data = [
{id: 1, vector: [1, 2]},
{id: 2, vector: [3, 4]},
{id: 3, vector: [5, 6]},
];
const tbl = await con.createTable("my_table", data)
await tbl.delete("id = 2")
await tbl.countRows() // Returns 2
```
If you have a list of values to delete, you can combine them into a
stringified list and use the `IN` operator:
```ts
const to_remove = [1, 5];
await tbl.delete(`id IN (${to_remove.join(",")})`)
await tbl.countRows() // Returns 1
```
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `filter` | `string` | A filter in the same format used by a sql WHERE clause. The filter must not be empty. |
##### Returns
`Promise`<`void`\>
#### Defined in
[index.ts:174](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L174)
___
### name
**name**: `string`
#### Defined in
[index.ts:106](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L106)
___
### overwrite
**overwrite**: (`data`: `Record`<`string`, `unknown`\>[]) => `Promise`<`number`\>
#### Type declaration
▸ (`data`): `Promise`<`number`\>
Insert records into this Table, replacing its contents.
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `data` | `Record`<`string`, `unknown`\>[] | Records to be inserted into the Table |
##### Returns
`Promise`<`number`\>
The number of rows added to the table
#### Defined in
[index.ts:128](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L128)
___
### search
**search**: (`query`: `T`) => [`Query`](../classes/Query.md)<`T`\>
#### Type declaration
▸ (`query`): [`Query`](../classes/Query.md)<`T`\>
Creates a search query to find the nearest neighbors of the given search term
##### Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| `query` | `T` | The query search term |
##### Returns
[`Query`](../classes/Query.md)<`T`\>
#### Defined in
[index.ts:112](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L112)

View File

@@ -11,14 +11,19 @@
### Classes
- [Connection](classes/Connection.md)
- [LocalConnection](classes/LocalConnection.md)
- [LocalTable](classes/LocalTable.md)
- [OpenAIEmbeddingFunction](classes/OpenAIEmbeddingFunction.md)
- [Query](classes/Query.md)
- [Table](classes/Table.md)
### Interfaces
- [AwsCredentials](interfaces/AwsCredentials.md)
- [Connection](interfaces/Connection.md)
- [ConnectionOptions](interfaces/ConnectionOptions.md)
- [EmbeddingFunction](interfaces/EmbeddingFunction.md)
- [IvfPQIndexConfig](interfaces/IvfPQIndexConfig.md)
- [Table](interfaces/Table.md)
### Type Aliases
@@ -32,17 +37,17 @@
### VectorIndexParams
Ƭ **VectorIndexParams**: `IvfPQIndexConfig`
Ƭ **VectorIndexParams**: [`IvfPQIndexConfig`](interfaces/IvfPQIndexConfig.md)
#### Defined in
[index.ts:224](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L224)
[index.ts:431](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L431)
## Functions
### connect
**connect**(`uri`): `Promise`<[`Connection`](classes/Connection.md)\>
**connect**(`uri`): `Promise`<[`Connection`](interfaces/Connection.md)\>
Connect to a LanceDB instance at the given URI
@@ -54,8 +59,24 @@ Connect to a LanceDB instance at the given URI
#### Returns
`Promise`<[`Connection`](classes/Connection.md)\>
`Promise`<[`Connection`](interfaces/Connection.md)\>
#### Defined in
[index.ts:34](https://github.com/lancedb/lancedb/blob/31dab97/node/src/index.ts#L34)
[index.ts:47](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L47)
**connect**(`opts`): `Promise`<[`Connection`](interfaces/Connection.md)\>
#### Parameters
| Name | Type |
| :------ | :------ |
| `opts` | `Partial`<[`ConnectionOptions`](interfaces/ConnectionOptions.md)\> |
#### Returns
`Promise`<[`Connection`](interfaces/Connection.md)\>
#### Defined in
[index.ts:48](https://github.com/lancedb/lancedb/blob/b1eeb90/node/src/index.ts#L48)

File diff suppressed because one or more lines are too long

View File

@@ -10,7 +10,11 @@
"\n",
"This Q&A bot will allow you to query your own documentation easily using questions. We'll also demonstrate the use of LangChain and LanceDB using the OpenAI API. \n",
"\n",
"In this example we'll use Pandas 2.0 documentation, but, this could be replaced for your own docs as well"
"In this example we'll use Pandas 2.0 documentation, but, this could be replaced for your own docs as well\n",
"\n",
"<a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Code-Documentation-QA-Bot/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"\n",
"Scripts - [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/Code-Documentation-QA-Bot/main.py) [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](./examples/Code-Documentation-QA-Bot/index.js)"
]
},
{
@@ -140,7 +144,7 @@
"source": [
"# Pre-processing and loading the documentation\n",
"\n",
"Next, let's pre-process and load the documentation. To make sure we don't need to do this repeatedly if we were updating code, we're caching it using pickle so we can retrieve it again (this could take a few minutes to run the first time yyou do it). We'll also add some more metadata to the docs here such as the title and version of the code:"
"Next, let's pre-process and load the documentation. To make sure we don't need to do this repeatedly if we were updating code, we're caching it using pickle so we can retrieve it again (this could take a few minutes to run the first time you do it). We'll also add some more metadata to the docs here such as the title and version of the code:"
]
},
{
@@ -181,7 +185,7 @@
"id": "c3852dd3",
"metadata": {},
"source": [
"# Generating emebeddings from our docs\n",
"# Generating embeddings from our docs\n",
"\n",
"Now that we have our raw documents loaded, we need to pre-process them to generate embeddings:"
]
@@ -251,7 +255,7 @@
"id": "28d93b85",
"metadata": {},
"source": [
"And thats it! We're all setup. The next step is to run some queries, let's try a few:"
"And that's it! We're all set up. The next step is to run some queries, let's try a few:"
]
},
{

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,14 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![example](https://github.com/lancedb/vectordb-recipes/assets/15766192/799f94a1-a01d-4a5b-a627-2a733bbb4227)\n",
"\n",
" <a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multimodal_clip/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>| [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/multimodal_clip/main.py) |"
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -10,11 +19,11 @@
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip available: \u001B[0m\u001B[31;49m22.3.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.2\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip available: \u001B[0m\u001B[31;49m22.3.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.2\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n"
]
}
],
@@ -30,6 +39,7 @@
"outputs": [],
"source": [
"import io\n",
"\n",
"import PIL\n",
"import duckdb\n",
"import lancedb"
@@ -42,6 +52,19 @@
"## First run setup: Download data and pre-process"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Get dataset\n",
"\n",
"!wget https://eto-public.s3.us-west-2.amazonaws.com/datasets/diffusiondb_lance.tar.gz\n",
"!tar -xvf diffusiondb_lance.tar.gz\n",
"!mv diffusiondb_test rawdata.lance\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
@@ -136,18 +159,18 @@
" \"db = lancedb.connect('~/datasets/demo')\\n\"\n",
" \"tbl = db.open_table('diffusiondb')\\n\\n\"\n",
" f\"embedding = embed_func('{query}')\\n\"\n",
" \"tbl.search(embedding).limit(9).to_df()\"\n",
" \"tbl.search(embedding).limit(9).to_pandas()\"\n",
" )\n",
" return (_extract(tbl.search(emb).limit(9).to_df()), code)\n",
" return (_extract(tbl.search(emb).limit(9).to_pandas()), code)\n",
"\n",
"def find_image_keywords(query):\n",
" code = (\n",
" \"import lancedb\\n\"\n",
" \"db = lancedb.connect('~/datasets/demo')\\n\"\n",
" \"tbl = db.open_table('diffusiondb')\\n\\n\"\n",
" f\"tbl.search('{query}').limit(9).to_df()\"\n",
" f\"tbl.search('{query}').limit(9).to_pandas()\"\n",
" )\n",
" return (_extract(tbl.search(query).limit(9).to_df()), code)\n",
" return (_extract(tbl.search(query).limit(9).to_pandas()), code)\n",
"\n",
"def find_image_sql(query):\n",
" code = (\n",
@@ -247,7 +270,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3.11.4 64-bit",
"language": "python",
"name": "python3"
},
@@ -261,7 +284,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.11.4"
},
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,825 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d24eb4c6-e246-44ca-ba7c-6eae7923bd4c",
"metadata": {},
"source": [
"## LanceDB Tables\n",
"A Table is a collection of Records in a LanceDB Database.\n",
"\n",
"![illustration](../assets/ecosystem-illustration.png)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "c1b4e34b-a49c-471d-a343-a5940bb5138a",
"metadata": {},
"outputs": [],
"source": [
"!pip install lancedb -qq"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4e5a8d07-d9a1-48c1-913a-8e0629289579",
"metadata": {},
"outputs": [],
"source": [
"import lancedb\n",
"db = lancedb.connect(\"./.lancedb\")"
]
},
{
"cell_type": "markdown",
"id": "66fb93d5-3551-406b-99b2-488442d61d06",
"metadata": {},
"source": [
"LanceDB allows ingesting data from various sources - `dict`, `list[dict]`, `pd.DataFrame`, `pa.Table` or a `Iterator[pa.RecordBatch]`. Let's take a look at some of the these.\n",
"\n",
" ### From list of tuples or dictionaries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5df12f66-8d99-43ad-8d0b-22189ec0a6b9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pyarrow.Table\n",
"vector: fixed_size_list<item: float>[2]\n",
" child 0, item: float\n",
"lat: double\n",
"long: double\n",
"----\n",
"vector: [[[1.1,1.2],[0.2,1.8]]]\n",
"lat: [[45.5,40.1]]\n",
"long: [[-122.7,-74.1]]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import lancedb\n",
"\n",
"db = lancedb.connect(\"./.lancedb\")\n",
"\n",
"data = [{\"vector\": [1.1, 1.2], \"lat\": 45.5, \"long\": -122.7},\n",
" {\"vector\": [0.2, 1.8], \"lat\": 40.1, \"long\": -74.1}]\n",
"\n",
"db.create_table(\"my_table\", data)\n",
"\n",
"db[\"my_table\"].head()"
]
},
{
"cell_type": "markdown",
"id": "10ce802f-1a10-49ee-8ee3-a9bfb302d86c",
"metadata": {},
"source": [
"## From pandas DataFrame\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f4d87ae9-0ccb-48eb-b31d-bb8f2370e47e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pyarrow.Table\n",
"vector: fixed_size_list<item: float>[2]\n",
" child 0, item: float\n",
"lat: double\n",
"long: double\n",
"----\n",
"vector: [[[1.1,1.2],[0.2,1.8]]]\n",
"lat: [[45.5,40.1]]\n",
"long: [[-122.7,-74.1]]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = [\n",
" {\"vector\": [1.1, 1.2], \"lat\": 45.5, \"long\": -122.7},\n",
" {\"vector\": [0.2, 1.8], \"lat\": 40.1, \"long\": -74.1},\n",
"]\n",
"\n",
"db.create_table(\"table2\", data)\n",
"\n",
"db[\"table2\"].head() "
]
},
{
"cell_type": "markdown",
"id": "4be81469-5b57-4f78-9c72-3938c0378d9d",
"metadata": {},
"source": [
"Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "25f34bcf-fca0-4431-8601-eac95d1bd347",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"vector: fixed_size_list<item: float>[2]\n",
" child 0, item: float\n",
"lat: float\n",
"long: float"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pyarrow as pa\n",
"\n",
"custom_schema = pa.schema([\n",
"pa.field(\"vector\", pa.list_(pa.float32(), 2)),\n",
"pa.field(\"lat\", pa.float32()),\n",
"pa.field(\"long\", pa.float32())\n",
"])\n",
"\n",
"table = db.create_table(\"table3\", data, schema=custom_schema, mode=\"overwrite\")\n",
"table.schema"
]
},
{
"cell_type": "markdown",
"id": "4df51925-7ca2-4005-9c72-38b3d26240c6",
"metadata": {},
"source": [
"### From PyArrow Tables\n",
"\n",
"You can also create LanceDB tables directly from pyarrow tables"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "90a880f6-be43-4c9d-ba65-0b05197c0f6f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"vector: fixed_size_list<item: float>[2]\n",
" child 0, item: float\n",
"item: string\n",
"price: double"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = pa.Table.from_arrays(\n",
" [\n",
" pa.array([[3.1, 4.1], [5.9, 26.5]],\n",
" pa.list_(pa.float32(), 2)),\n",
" pa.array([\"foo\", \"bar\"]),\n",
" pa.array([10.0, 20.0]),\n",
" ],\n",
" [\"vector\", \"item\", \"price\"],\n",
" )\n",
"\n",
"db = lancedb.connect(\"db\")\n",
"\n",
"tbl = db.create_table(\"test1\", table, mode=\"overwrite\")\n",
"tbl.schema"
]
},
{
"cell_type": "markdown",
"id": "0f36c51c-d902-449d-8292-700e53990c32",
"metadata": {},
"source": [
"### From Pydantic Models\n",
"\n",
"LanceDB supports to create Apache Arrow Schema from a Pydantic BaseModel."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d81121d7-e4b7-447c-a48c-974b6ebb464a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"movie_id: int64 not null\n",
"vector: fixed_size_list<item: float>[128] not null\n",
" child 0, item: float\n",
"genres: string not null\n",
"title: string not null\n",
"imdb_id: int64 not null"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from lancedb.pydantic import Vector, LanceModel\n",
"\n",
"class Content(LanceModel):\n",
" movie_id: int\n",
" vector: Vector(128)\n",
" genres: str\n",
" title: str\n",
" imdb_id: int\n",
" \n",
" @property\n",
" def imdb_url(self) -> str:\n",
" return f\"https://www.imdb.com/title/tt{self.imdb_id}\"\n",
"\n",
"import pyarrow as pa\n",
"db = lancedb.connect(\"~/.lancedb\")\n",
"table_name = \"movielens_small\"\n",
"table = db.create_table(table_name, schema=Content)\n",
"table.schema"
]
},
{
"cell_type": "markdown",
"id": "860e1f77-e860-46a9-98b7-b2979092ccd6",
"metadata": {},
"source": [
"### Using Iterators / Writing Large Datasets\n",
"\n",
"It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`\n",
"\n",
"LanceDB additionally supports pyarrow's `RecordBatch` Iterators or other generators producing supported data types.\n",
"\n",
"## Here's an example using using `RecordBatch` iterator for creating tables."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "bc247142-4e3c-41a2-b94c-8e00d2c2a508",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LanceTable(table4)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pyarrow as pa\n",
"\n",
"def make_batches():\n",
" for i in range(5):\n",
" yield pa.RecordBatch.from_arrays(\n",
" [\n",
" pa.array([[3.1, 4.1], [5.9, 26.5]],\n",
" pa.list_(pa.float32(), 2)),\n",
" pa.array([\"foo\", \"bar\"]),\n",
" pa.array([10.0, 20.0]),\n",
" ],\n",
" [\"vector\", \"item\", \"price\"],\n",
" )\n",
"\n",
"schema = pa.schema([\n",
" pa.field(\"vector\", pa.list_(pa.float32(), 2)),\n",
" pa.field(\"item\", pa.utf8()),\n",
" pa.field(\"price\", pa.float32()),\n",
"])\n",
"\n",
"db.create_table(\"table4\", make_batches(), schema=schema)"
]
},
{
"cell_type": "markdown",
"id": "94f7dd2b-bae4-4bdf-8534-201437c31027",
"metadata": {},
"source": [
"### Using pandas `DataFrame` Iterator and Pydantic Schema\n",
"\n",
"You can set the schema via pyarrow schema object or using Pydantic object"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "25ad3523-e0c9-4c28-b3df-38189c4e0e5f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"vector: fixed_size_list<item: float>[2] not null\n",
" child 0, item: float\n",
"item: string not null\n",
"price: double not null"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pyarrow as pa\n",
"import pandas as pd\n",
"\n",
"class PydanticSchema(LanceModel):\n",
" vector: Vector(2)\n",
" item: str\n",
" price: float\n",
"\n",
"def make_batches():\n",
" for i in range(5):\n",
" yield pd.DataFrame(\n",
" {\n",
" \"vector\": [[3.1, 4.1], [1, 1]],\n",
" \"item\": [\"foo\", \"bar\"],\n",
" \"price\": [10.0, 20.0],\n",
" })\n",
"\n",
"tbl = db.create_table(\"table5\", make_batches(), schema=PydanticSchema)\n",
"tbl.schema"
]
},
{
"cell_type": "markdown",
"id": "4aa955e9-fcd0-4c99-b644-f218f3bb3f1a",
"metadata": {},
"source": [
"## Creating Empty Table\n",
"\n",
"You can create an empty table by just passing the schema and later add to it using `table.add()`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "2814173a-eacc-4dd8-a64d-6312b44582cc",
"metadata": {},
"outputs": [],
"source": [
"import lancedb\n",
"from lancedb.pydantic import LanceModel, Vector\n",
"\n",
"class Model(LanceModel):\n",
" vector: Vector(2)\n",
"\n",
"tbl = db.create_table(\"table6\", schema=Model.to_arrow_schema())"
]
},
{
"cell_type": "markdown",
"id": "1d1b0f5c-a1d9-459f-8614-8376b6f577e1",
"metadata": {},
"source": [
"## Open Existing Tables\n",
"\n",
"If you forget the name of your table, you can always get a listing of all table names:\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "df9e13c0-41f6-437f-9dfa-2fd71d3d9c45",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['table6', 'table4', 'table5', 'movielens_small']"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.table_names()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "9343f5ad-6024-42ee-ac2f-6c1471df8679",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>vector</th>\n",
" <th>item</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>[3.1, 4.1]</td>\n",
" <td>foo</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>[5.9, 26.5]</td>\n",
" <td>bar</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>[3.1, 4.1]</td>\n",
" <td>foo</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>[5.9, 26.5]</td>\n",
" <td>bar</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>[3.1, 4.1]</td>\n",
" <td>foo</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>[5.9, 26.5]</td>\n",
" <td>bar</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>[3.1, 4.1]</td>\n",
" <td>foo</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>[5.9, 26.5]</td>\n",
" <td>bar</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>[3.1, 4.1]</td>\n",
" <td>foo</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>[5.9, 26.5]</td>\n",
" <td>bar</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" vector item price\n",
"0 [3.1, 4.1] foo 10.0\n",
"1 [5.9, 26.5] bar 20.0\n",
"2 [3.1, 4.1] foo 10.0\n",
"3 [5.9, 26.5] bar 20.0\n",
"4 [3.1, 4.1] foo 10.0\n",
"5 [5.9, 26.5] bar 20.0\n",
"6 [3.1, 4.1] foo 10.0\n",
"7 [5.9, 26.5] bar 20.0\n",
"8 [3.1, 4.1] foo 10.0\n",
"9 [5.9, 26.5] bar 20.0"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tbl = db.open_table(\"table4\")\n",
"tbl.to_pandas()"
]
},
{
"cell_type": "markdown",
"id": "5019246f-12e3-4f78-88a8-9f4939802c76",
"metadata": {},
"source": [
"## Adding to table\n",
"After a table has been created, you can always add more data to it using\n",
"\n",
"You can add any of the valid data structures accepted by LanceDB table, i.e, `dict`, `list[dict]`, `pd.DataFrame`, or a `Iterator[pa.RecordBatch]`. Here are some examples."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "8a56250f-73a1-4c26-a6ad-5c7a0ce3a9ab",
"metadata": {},
"outputs": [],
"source": [
"data = [\n",
" {\"vector\": [1.3, 1.4], \"item\": \"fizz\", \"price\": 100.0},\n",
" {\"vector\": [9.5, 56.2], \"item\": \"buzz\", \"price\": 200.0}\n",
"]\n",
"tbl.add(data)"
]
},
{
"cell_type": "markdown",
"id": "9985f6ee-67e1-45a9-b233-94e3d121ecbf",
"metadata": {},
"source": [
"You can also add a large dataset batch in one go using Iterator of supported data types\n",
"\n",
"### Adding via Iterator\n",
"\n",
"here, we'll use pandas DataFrame Iterator"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "030c7057-b98e-4e2f-be14-b8c1f927f83c",
"metadata": {},
"outputs": [],
"source": [
"def make_batches():\n",
" for i in range(5):\n",
" yield [\n",
" {\"vector\": [3.1, 4.1], \"item\": \"foo\", \"price\": 10.0},\n",
" {\"vector\": [1, 1], \"item\": \"bar\", \"price\": 20.0},\n",
" ]\n",
"tbl.add(make_batches())"
]
},
{
"cell_type": "markdown",
"id": "b8316d5d-0a23-4675-b0ee-178711db873a",
"metadata": {},
"source": [
"## Deleting from a Table\n",
"\n",
"Use the `delete()` method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter, like:\n",
"\n",
"\n",
"```python\n",
"tbl.delete('item = \"fizz\"')\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e7a17de2-08d2-41b7-bd05-f63d1045ab1f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"32\n"
]
},
{
"data": {
"text/plain": [
"17"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(len(tbl))\n",
" \n",
"tbl.delete(\"price = 20.0\")\n",
" \n",
"len(tbl)"
]
},
{
"cell_type": "markdown",
"id": "74ac180b-5432-4c14-b1a8-22c35ac83af8",
"metadata": {},
"source": [
"### Delete from a list of values"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "fe3310bd-08f4-4a22-a63b-b3127d22f9f7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" vector item price\n",
"0 [3.1, 4.1] foo 10.0\n",
"1 [3.1, 4.1] foo 10.0\n",
"2 [3.1, 4.1] foo 10.0\n",
"3 [3.1, 4.1] foo 10.0\n",
"4 [3.1, 4.1] foo 10.0\n",
"5 [1.3, 1.4] fizz 100.0\n",
"6 [9.5, 56.2] buzz 200.0\n",
"7 [3.1, 4.1] foo 10.0\n",
"8 [3.1, 4.1] foo 10.0\n",
"9 [3.1, 4.1] foo 10.0\n",
"10 [3.1, 4.1] foo 10.0\n",
"11 [3.1, 4.1] foo 10.0\n",
"12 [3.1, 4.1] foo 10.0\n",
"13 [3.1, 4.1] foo 10.0\n",
"14 [3.1, 4.1] foo 10.0\n",
"15 [3.1, 4.1] foo 10.0\n",
"16 [3.1, 4.1] foo 10.0\n"
]
},
{
"ename": "OSError",
"evalue": "LanceError(IO): Error during planning: column foo does not exist",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[30], line 4\u001b[0m\n\u001b[1;32m 2\u001b[0m to_remove \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(\u001b[38;5;28mstr\u001b[39m(v) \u001b[38;5;28;01mfor\u001b[39;00m v \u001b[38;5;129;01min\u001b[39;00m to_remove)\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(tbl\u001b[38;5;241m.\u001b[39mto_pandas())\n\u001b[0;32m----> 4\u001b[0m \u001b[43mtbl\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mitem IN (\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mto_remove\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m)\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 5\u001b[0m tbl\u001b[38;5;241m.\u001b[39mto_pandas()\n",
"File \u001b[0;32m~/Documents/lancedb/lancedb/python/lancedb/table.py:610\u001b[0m, in \u001b[0;36mLanceTable.delete\u001b[0;34m(self, where)\u001b[0m\n\u001b[1;32m 609\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdelete\u001b[39m(\u001b[38;5;28mself\u001b[39m, where: \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m--> 610\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_dataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/Documents/lancedb/lancedb/env/lib/python3.11/site-packages/lance/dataset.py:489\u001b[0m, in \u001b[0;36mLanceDataset.delete\u001b[0;34m(self, predicate)\u001b[0m\n\u001b[1;32m 487\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(predicate, pa\u001b[38;5;241m.\u001b[39mcompute\u001b[38;5;241m.\u001b[39mExpression):\n\u001b[1;32m 488\u001b[0m predicate \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(predicate)\n\u001b[0;32m--> 489\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_ds\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpredicate\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mOSError\u001b[0m: LanceError(IO): Error during planning: column foo does not exist"
]
}
],
"source": [
"to_remove = [\"foo\", \"buzz\"]\n",
"to_remove = \", \".join(str(v) for v in to_remove)\n",
"print(tbl.to_pandas())\n",
"tbl.delete(f\"item IN ({to_remove})\")\n"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "87d5bc21-847f-4c81-b56e-f6dbe5d05aac",
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(\n",
" {\n",
" \"vector\": [[3.1, 4.1], [1, 1]],\n",
" \"item\": [\"foo\", \"bar\"],\n",
" \"price\": [10.0, 20.0],\n",
" })\n",
"\n",
"tbl = db.create_table(\"table7\", data=df, mode=\"overwrite\")"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "9cba4519-eb3a-4941-ab7e-873d762e750f",
"metadata": {},
"outputs": [],
"source": [
"to_remove = [10.0, 20.0]\n",
"to_remove = \", \".join(str(v) for v in to_remove)\n",
"\n",
"tbl.delete(f\"price IN ({to_remove})\")"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "5bdc9801-d5ed-4871-92d0-88b27108e788",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>vector</th>\n",
" <th>item</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [vector, item, price]\n",
"Index: []"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tbl.to_pandas()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "752d33d4-ce1c-48e5-90d2-c85f0982182d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -8,7 +8,12 @@
"source": [
"# Youtube Transcript Search QA Bot\n",
"\n",
"This Q&A bot will allow you to search through youtube transcripts using natural language! By going through this notebook, we'll introduce how you can use LanceDB to store and manage your data easily."
"This Q&A bot will allow you to search through youtube transcripts using natural language! By going through this notebook, we'll introduce how you can use LanceDB to store and manage your data easily.\n",
"\n",
"\n",
"<a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/youtube_bot/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\">\n",
"\n",
"Scripts - [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/youtube_bot/main.py) [![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](./examples/youtube_bot/index.js)\n"
]
},
{
@@ -22,11 +27,11 @@
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.1\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.1.1\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n"
]
}
],
@@ -179,7 +184,7 @@
"df = (contextualize(data.to_pandas())\n",
" .groupby(\"title\").text_col(\"text\")\n",
" .window(20).stride(4)\n",
" .to_df())\n",
" .to_pandas())\n",
"df.head(1)"
]
},
@@ -598,7 +603,7 @@
"outputs": [],
"source": [
"# Use LanceDB to get top 3 most relevant context\n",
"context = tbl.search(emb).limit(3).to_df()"
"context = tbl.search(emb).limit(3).to_pandas()"
]
},
{

100
docs/src/python/arrow.md Normal file
View File

@@ -0,0 +1,100 @@
# Pandas and PyArrow
Built on top of [Apache Arrow](https://arrow.apache.org/),
`LanceDB` is easy to integrate with the Python ecosystem, including [Pandas](https://pandas.pydata.org/)
and PyArrow.
## Create dataset
First, we need to connect to a `LanceDB` database.
```py
import lancedb
db = lancedb.connect("data/sample-lancedb")
```
Afterwards, we write a `Pandas DataFrame` to LanceDB directly.
```py
import pandas as pd
data = pd.DataFrame({
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0]
})
table = db.create_table("pd_table", data=data)
```
Similar to [`pyarrow.write_dataset()`](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html),
[db.create_table()](../python/#lancedb.db.DBConnection.create_table) accepts a wide-range of forms of data.
For example, if you have a dataset that is larger than memory size, you can create table with `Iterator[pyarrow.RecordBatch]`,
to lazily generate data:
```py
from typing import Iterable
import pyarrow as pa
def make_batches() -> Iterable[pa.RecordBatch]:
for i in range(5):
yield pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]]),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"])
schema=pa.schema([
pa.field("vector", pa.list_(pa.float32())),
pa.field("item", pa.utf8()),
pa.field("price", pa.float32()),
])
table = db.create_table("iterable_table", data=make_batches(), schema=schema)
```
You will find detailed instructions of creating dataset in
[Basic Operations](../basic.md) and [API](../python/#lancedb.db.DBConnection.create_table)
sections.
## Vector Search
We can now perform similarity search via `LanceDB` Python API.
```py
# Open the table previously created.
table = db.open_table("pd_table")
query_vector = [100, 100]
# Pandas DataFrame
df = table.search(query_vector).limit(1).to_pandas()
print(df)
```
```
vector item price _distance
0 [5.9, 26.5] bar 20.0 14257.05957
```
If you have a simple filter, it's faster to provide a `where clause` to `LanceDB`'s search query.
If you have more complex criteria, you can always apply the filter to the resulting Pandas `DataFrame`.
```python
# Apply the filter via LanceDB
results = table.search([100, 100]).where("price < 15").to_pandas()
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
# Apply the filter via Pandas
df = results = table.search([100, 100]).to_pandas()
results = df[df.price < 15]
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
```

54
docs/src/python/duckdb.md Normal file
View File

@@ -0,0 +1,54 @@
# DuckDB
`LanceDB` works with `DuckDB` via [PyArrow integration](https://duckdb.org/docs/guides/python/sql_on_arrow).
Let us start with installing `duckdb` and `lancedb`.
```shell
pip install duckdb lancedb
```
We will re-use [the dataset created previously](./arrow.md):
```python
import lancedb
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
table = db.create_table("pd_table", data=data)
arrow_table = table.to_arrow()
```
`DuckDB` can directly query the `arrow_table`:
```python
import duckdb
duckdb.query("SELECT * FROM arrow_table")
```
```
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
```
```py
duckdb.query("SELECT mean(price) FROM arrow_table")
```
```
┌─────────────┐
│ mean(price) │
│ double │
├─────────────┤
│ 15.0 │
└─────────────┘
```

View File

@@ -0,0 +1,7 @@
# Integration
Built on top of [Apache Arrow](https://arrow.apache.org/),
`LanceDB` is very easy to be integrate with Python ecosystems.
* [Pandas and Arrow Integration](./arrow.md)
* [DuckDB Integration](./duckdb.md)

View File

@@ -0,0 +1,36 @@
# Pydantic
[Pydantic](https://docs.pydantic.dev/latest/) is a data validation library in Python.
LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting.
## Schema
LanceDB supports to create Apache Arrow Schema from a
[Pydantic BaseModel](https://docs.pydantic.dev/latest/api/main/#pydantic.main.BaseModel)
via [pydantic_to_schema()](python.md##lancedb.pydantic.pydantic_to_schema) method.
::: lancedb.pydantic.pydantic_to_schema
## Vector Field
LanceDB provides a [`Vector(dim)`](python.md#lancedb.pydantic.Vector) method to define a
vector Field in a Pydantic Model.
::: lancedb.pydantic.Vector
## Type Conversion
LanceDB automatically convert Pydantic fields to
[Apache Arrow DataType](https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType).
Current supported type conversions:
| Pydantic Field Type | PyArrow Data Type |
| ------------------- | ----------------- |
| `int` | `pyarrow.int64` |
| `float` | `pyarrow.float64` |
| `bool` | `pyarrow.bool` |
| `str` | `pyarrow.utf8()` |
| `list` | `pyarrow.List` |
| `BaseModel` | `pyarrow.Struct` |
| `Vector(n)` | `pyarrow.FixedSizeList(float32, n)` |

View File

@@ -10,23 +10,33 @@ pip install lancedb
::: lancedb.connect
::: lancedb.LanceDBConnection
::: lancedb.db.DBConnection
## Table
::: lancedb.table.LanceTable
::: lancedb.table.Table
## Querying
::: lancedb.query.LanceQueryBuilder
::: lancedb.query.Query
::: lancedb.query.LanceFtsQueryBuilder
::: lancedb.query.LanceQueryBuilder
## Embeddings
::: lancedb.embeddings.with_embeddings
::: lancedb.embeddings.registry.EmbeddingFunctionRegistry
::: lancedb.embeddings.EmbeddingFunction
::: lancedb.embeddings.base.EmbeddingFunction
::: lancedb.embeddings.base.TextEmbeddingFunction
::: lancedb.embeddings.sentence_transformers.SentenceTransformerEmbeddings
::: lancedb.embeddings.openai.OpenAIEmbeddings
::: lancedb.embeddings.open_clip.OpenClipEmbeddings
::: lancedb.embeddings.with_embeddings
## Context
@@ -41,3 +51,17 @@ pip install lancedb
::: lancedb.fts.populate_index
::: lancedb.fts.search_index
## Utilities
::: lancedb.schema.vector
## Integrations
### Pydantic
::: lancedb.pydantic.pydantic_to_schema
::: lancedb.pydantic.vector
::: lancedb.pydantic.LanceModel

1
docs/src/robots.txt Normal file
View File

@@ -0,0 +1 @@
User-agent: *

View File

@@ -0,0 +1,4 @@
window.addEventListener("DOMContentLoaded", (event) => {
!function(t,e){var o,n,p,r;e.__SV||(window.posthog=e,e._i=[],e.init=function(i,s,a){function g(t,e){var o=e.split(".");2==o.length&&(t=t[o[0]],e=o[1]),t[e]=function(){t.push([e].concat(Array.prototype.slice.call(arguments,0)))}}(p=t.createElement("script")).type="text/javascript",p.async=!0,p.src=s.api_host+"/static/array.js",(r=t.getElementsByTagName("script")[0]).parentNode.insertBefore(p,r);var u=e;for(void 0!==a?u=e[a]=[]:a="posthog",u.people=u.people||[],u.toString=function(t){var e="posthog";return"posthog"!==a&&(e+="."+a),t||(e+=" (stub)"),e},u.people.toString=function(){return u.toString(1)+".people (stub)"},o="capture identify alias people.set people.set_once set_config register register_once unregister opt_out_capturing has_opted_out_capturing opt_in_capturing reset isFeatureEnabled onFeatureFlags getFeatureFlag getFeatureFlagPayload reloadFeatureFlags group updateEarlyAccessFeatureEnrollment getEarlyAccessFeatures getActiveMatchingSurveys getSurveys".split(" "),n=0;n<o.length;n++)g(u,o[n]);e._i.push([i,s,a])},e.__SV=1)}(document,window.posthog||[]);
posthog.init('phc_oENDjGgHtmIDrV6puUiFem2RB4JA8gGWulfdulmMdZP',{api_host:'https://app.posthog.com'})
});

View File

@@ -4,7 +4,7 @@
In a recommendation system or search engine, you can find similar products from
the one you searched.
In LLM and other AI applications,
each data point can be [presented by the embeddings generated from some models](embedding.md),
each data point can be [presented by the embeddings generated from some models](embeddings/index.md),
it returns the most relevant features.
A search in high-dimensional vector space, is to find `K-Nearest-Neighbors (KNN)` of the query vector.
@@ -18,27 +18,56 @@ Currently, we support the following metrics:
| ----------- | ------------------------------------ |
| `L2` | [Euclidean / L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
| `Cosine` | [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)|
| `Dot` | [Dot Production](https://en.wikipedia.org/wiki/Dot_product) |
## Search
### Flat Search
If you do not create a vector index, LanceDB would need to exhaustively scan the entire vector column (via `Flat Search`)
and compute the distance for *every* vector in order to find the closest matches. This is effectively a KNN search.
If there is no [vector index is created](ann_indexes.md), LanceDB will just brute-force scan
the vector column and compute the distance.
<!-- Setup Code
```python
import lancedb
import numpy as np
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
data = [{"vector": row, "item": f"item {i}"}
for i, row in enumerate(np.random.random((10_000, 1536)).astype('float32'))]
db.create_table("my_vectors", data=data)
```
-->
<!-- Setup Code
```javascript
const vectordb_setup = require('vectordb')
const db_setup = await vectordb_setup.connect('data/sample-lancedb')
let data = []
for (let i = 0; i < 10_000; i++) {
data.push({vector: Array(1536).fill(i), id: `${i}`, content: "", longId: `${i}`},)
}
await db_setup.createTable('my_vectors', data)
```
-->
=== "Python"
```python
import lancedb
import numpy as np
db = lancedb.connect("data/sample-lancedb")
tbl = db.open_table("my_vectors")
df = tbl.search(np.random.random((768)))
.limit(10)
.to_df()
df = tbl.search(np.random.random((1536))) \
.limit(10) \
.to_list()
```
=== "JavaScript"
@@ -47,10 +76,10 @@ the vector column and compute the distance.
const vectordb = require('vectordb')
const db = await vectordb.connect('data/sample-lancedb')
tbl = db.open_table("my_vectors")
const tbl = await db.openTable("my_vectors")
const results = await tbl.search(Array(768))
.limit(20)
const results_1 = await tbl.search(Array(1536).fill(1.2))
.limit(10)
.execute()
```
@@ -60,26 +89,33 @@ as well.
=== "Python"
```python
df = tbl.search(np.random.random((768)))
.metric("cosine")
.limit(10)
.to_df()
df = tbl.search(np.random.random((1536))) \
.metric("cosine") \
.limit(10) \
.to_list()
```
=== "JavaScript"
```javascript
const vectordb = require('vectordb')
const db = await vectordb.connect('data/sample-lancedb')
tbl = db.open_table("my_vectors")
const results = await tbl.search(Array(768))
.metric("cosine")
.limit(20)
const results_2 = await tbl.search(Array(1536).fill(1.2))
.metricType("cosine")
.limit(10)
.execute()
```
### Search with Vector Index.
### Approximate Nearest Neighbor (ANN) Search with Vector Index.
To accelerate vector retrievals, it is common to build vector indices.
A vector index is a data structure specifically designed to efficiently organize and
search vector data based on their similarity via the chosen distance metric.
By constructing a vector index, you can reduce the search space and avoid the need
for brute-force scanning of the entire vector column.
However, fast vector search using indices often entails making a trade-off with accuracy to some extent.
This is why it is often called **Approximate Nearest Neighbors (ANN)** search, while the Flat Search (KNN)
always returns 100% recall.
See [ANN Index](ann_indexes.md) for more details.

120
docs/src/sql.md Normal file
View File

@@ -0,0 +1,120 @@
# SQL filters
LanceDB embraces the utilization of standard SQL expressions as predicates for hybrid
filters. It can be used during hybrid vector search and deletion operations.
Currently, Lance supports a growing list of expressions.
* ``>``, ``>=``, ``<``, ``<=``, ``=``
* ``AND``, ``OR``, ``NOT``
* ``IS NULL``, ``IS NOT NULL``
* ``IS TRUE``, ``IS NOT TRUE``, ``IS FALSE``, ``IS NOT FALSE``
* ``IN``
* ``LIKE``, ``NOT LIKE``
* ``CAST``
* ``regexp_match(column, pattern)``
For example, the following filter string is acceptable:
<!-- Setup Code
```python
import lancedb
import numpy as np
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
data = [{"vector": row, "item": f"item {i}"}
for i, row in enumerate(np.random.random((10_000, 2)).astype('int'))]
tbl = db.create_table("my_vectors", data=data)
```
-->
<!-- Setup Code
```javascript
const vectordb = require('vectordb')
const db = await vectordb.connect('data/sample-lancedb')
let data = []
for (let i = 0; i < 10_000; i++) {
data.push({vector: Array(1536).fill(i), id: `${i}`, content: "", longId: `${i}`},)
}
const tbl = await db.createTable('my_vectors', data)
```
-->
=== "Python"
```python
tbl.search([100, 102]) \
.where("""(
(label IN [10, 20])
AND
(note.email IS NOT NULL)
) OR NOT note.created
""")
```
=== "Javascript"
```javascript
tbl.search([100, 102])
.where(`(
(label IN [10, 20])
AND
(note.email IS NOT NULL)
) OR NOT note.created
`)
```
If your column name contains special characters or is a [SQL Keyword](https://docs.rs/sqlparser/latest/sqlparser/keywords/index.html),
you can use backtick (`` ` ``) to escape it. For nested fields, each segment of the
path must be wrapped in backticks.
=== "SQL"
```sql
`CUBE` = 10 AND `column name with space` IS NOT NULL
AND `nested with space`.`inner with space` < 2
```
!!! warning
Field names containing periods (``.``) are not supported.
Literals for dates, timestamps, and decimals can be written by writing the string
value after the type name. For example
=== "SQL"
```sql
date_col = date '2021-01-01'
and timestamp_col = timestamp '2021-01-01 00:00:00'
and decimal_col = decimal(8,3) '1.000'
```
For timestamp columns, the precision can be specified as a number in the type
parameter. Microsecond precision (6) is the default.
| SQL | Time unit |
|------------------|--------------|
| ``timestamp(0)`` | Seconds |
| ``timestamp(3)`` | Milliseconds |
| ``timestamp(6)`` | Microseconds |
| ``timestamp(9)`` | Nanoseconds |
LanceDB internally stores data in [Apache Arrow](https://arrow.apache.org/) format.
The mapping from SQL types to Arrow types is:
| SQL type | Arrow type |
|----------|------------|
| ``boolean`` | ``Boolean`` |
| ``tinyint`` / ``tinyint unsigned`` | ``Int8`` / ``UInt8`` |
| ``smallint`` / ``smallint unsigned`` | ``Int16`` / ``UInt16`` |
| ``int`` or ``integer`` / ``int unsigned`` or ``integer unsigned`` | ``Int32`` / ``UInt32`` |
| ``bigint`` / ``bigint unsigned`` | ``Int64`` / ``UInt64`` |
| ``float`` | ``Float32`` |
| ``double`` | ``Float64`` |
| ``decimal(precision, scale)`` | ``Decimal128`` |
| ``date`` | ``Date32`` |
| ``timestamp`` | ``Timestamp`` [^1] |
| ``string`` | ``Utf8`` |
| ``binary`` | ``Binary`` |
[^1]: See precision mapping in previous table.

View File

@@ -4,3 +4,12 @@
--md-text-font: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
--md-code-font: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
}
.md-nav__item, .md-tabs__item {
font-size: large;
}
/* Maximum space for text block */
.md-grid {
max-width: 90%;
}

53
docs/test/md_testing.js Normal file
View File

@@ -0,0 +1,53 @@
const glob = require("glob");
const fs = require("fs");
const path = require("path");
const globString = "../src/**/*.md";
const excludedGlobs = [
"../src/fts.md",
"../src/embedding.md",
"../src/examples/*.md",
"../src/guides/tables.md",
"../src/embeddings/*.md",
];
const nodePrefix = "javascript";
const nodeFile = ".js";
const nodeFolder = "node";
const asyncPrefix = "(async () => {\n";
const asyncSuffix = "})();";
function* yieldLines(lines, prefix, suffix) {
let inCodeBlock = false;
for (const line of lines) {
if (line.trim().startsWith(prefix + nodePrefix)) {
inCodeBlock = true;
} else if (inCodeBlock && line.trim().startsWith(suffix)) {
inCodeBlock = false;
yield "\n";
} else if (inCodeBlock) {
yield line;
}
}
}
const files = glob.sync(globString, { recursive: true });
const excludedFiles = glob.sync(excludedGlobs, { recursive: true });
for (const file of files.filter((file) => !excludedFiles.includes(file))) {
const lines = [];
const data = fs.readFileSync(file, "utf-8");
const fileLines = data.split("\n");
for (const line of yieldLines(fileLines, "```", "```")) {
lines.push(line);
}
if (lines.length > 0) {
const fileName = path.basename(file, ".md");
const outPath = path.join(nodeFolder, fileName, `${fileName}${nodeFile}`);
console.log(outPath)
fs.mkdirSync(path.dirname(outPath), { recursive: true });
fs.writeFileSync(outPath, asyncPrefix + "\n" + lines.join("\n") + asyncSuffix);
}
}

62
docs/test/md_testing.py Normal file
View File

@@ -0,0 +1,62 @@
import glob
from typing import Iterator
from pathlib import Path
glob_string = "../src/**/*.md"
excluded_globs = [
"../src/fts.md",
"../src/embedding.md",
"../src/examples/*.md",
"../src/integrations/voxel51.md",
"../src/guides/tables.md",
"../src/python/duckdb.md",
"../src/embeddings/*.md",
]
python_prefix = "py"
python_file = ".py"
python_folder = "python"
files = glob.glob(glob_string, recursive=True)
excluded_files = [
f
for excluded_glob in excluded_globs
for f in glob.glob(excluded_glob, recursive=True)
]
def yield_lines(lines: Iterator[str], prefix: str, suffix: str):
in_code_block = False
# Python code has strict indentation
strip_length = 0
skip_test = False
for line in lines:
if "skip-test" in line:
skip_test = True
if line.strip().startswith(prefix + python_prefix):
in_code_block = True
strip_length = len(line) - len(line.lstrip())
elif in_code_block and line.strip().startswith(suffix):
in_code_block = False
if not skip_test:
yield "\n"
skip_test = False
elif in_code_block:
if not skip_test:
yield line[strip_length:]
for file in filter(lambda file: file not in excluded_files, files):
with open(file, "r") as f:
lines = list(yield_lines(iter(f), "```", "```"))
if len(lines) > 0:
print(lines)
out_path = (
Path(python_folder)
/ Path(file).name.strip(".md")
/ (Path(file).name.strip(".md") + python_file)
)
print(out_path)
out_path.parent.mkdir(exist_ok=True, parents=True)
with open(out_path, "w") as out:
out.writelines(lines)

13
docs/test/package.json Normal file
View File

@@ -0,0 +1,13 @@
{
"name": "lancedb-docs-test",
"version": "1.0.0",
"description": "",
"author": "",
"license": "ISC",
"dependencies": {
"fs": "^0.0.1-security",
"glob": "^10.2.7",
"path": "^0.12.7",
"vectordb": "https://gitpkg.now.sh/lancedb/lancedb/node?main"
}
}

View File

@@ -0,0 +1,8 @@
-e ../../python
numpy
pandas
pylance
duckdb
--extra-index-url https://download.pytorch.org/whl/cpu
torch

View File

@@ -12,5 +12,6 @@ module.exports = {
sourceType: 'module'
},
rules: {
"@typescript-eslint/method-signature-style": "off",
}
}

View File

@@ -10,7 +10,7 @@ npm install vectordb
This will download the appropriate native library for your platform. We currently
support x86_64 Linux, aarch64 Linux, Intel MacOS, and ARM (M1/M2) MacOS. We do not
yet support Windows or musl-based Linux (such as Alpine Linux).
yet support musl-based Linux (such as Alpine Linux).
## Usage
@@ -18,9 +18,11 @@ yet support Windows or musl-based Linux (such as Alpine Linux).
```javascript
const lancedb = require('vectordb');
const db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>');
const table = await db.openTable('my_table');
const query = await table.search([0.1, 0.3]).setLimit(20).execute();
const db = await lancedb.connect('data/sample-lancedb');
const table = await db.createTable("my_table",
[{ id: 1, vector: [0.1, 1.0], item: "foo", price: 10.0 },
{ id: 2, vector: [3.9, 0.5], item: "bar", price: 20.0 }])
const results = await table.search([0.1, 0.3]).limit(20).execute();
console.log(results);
```

View File

@@ -0,0 +1,66 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
'use strict'
async function example() {
const lancedb = require('vectordb')
// Import transformers and the all-MiniLM-L6-v2 model (https://huggingface.co/Xenova/all-MiniLM-L6-v2)
const { pipeline } = await import('@xenova/transformers')
const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
// Create embedding function from pipeline which returns a list of vectors from batch
// sourceColumn is the name of the column in the data to be embedded
//
// Output of pipe is a Tensor { data: Float32Array(384) }, so filter for the vector
const embed_fun = {}
embed_fun.sourceColumn = 'text'
embed_fun.embed = async function (batch) {
let result = []
for (let text of batch) {
const res = await pipe(text, { pooling: 'mean', normalize: true })
result.push(Array.from(res['data']))
}
return (result)
}
// Link a folder and create a table with data
const db = await lancedb.connect('data/sample-lancedb')
const data = [
{ id: 1, text: 'Cherry', type: 'fruit' },
{ id: 2, text: 'Carrot', type: 'vegetable' },
{ id: 3, text: 'Potato', type: 'vegetable' },
{ id: 4, text: 'Apple', type: 'fruit' },
{ id: 5, text: 'Banana', type: 'fruit' }
]
const table = await db.createTable('food_table', data, embed_fun)
// Query the table
const results = await table
.search("a sweet fruit to eat")
.metricType("cosine")
.limit(2)
.execute()
console.log(results.map(r => r.text))
}
example().then(_ => { console.log("Done!") })

View File

@@ -0,0 +1,16 @@
{
"name": "vectordb-example-js-transformers",
"version": "1.0.0",
"description": "Example for using transformers.js with lancedb",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"dependencies": {
"@xenova/transformers": "^2.4.1",
"vectordb": "file:../.."
}
}

View File

@@ -12,26 +12,25 @@
// See the License for the specific language governing permissions and
// limitations under the License.
const { currentTarget } = require('@neon-rs/load');
const { currentTarget } = require('@neon-rs/load')
let nativeLib;
let nativeLib
try {
nativeLib = require(`vectordb-${currentTarget()}`);
} catch (e) {
// When developing locally, give preference to the local built library
nativeLib = require('./index.node')
} catch {
try {
// Might be developing locally, so try that. But don't expose that error
// to the user.
nativeLib = require("./index.node");
} catch {
nativeLib = require(`@lancedb/vectordb-${currentTarget()}`)
} catch (e) {
throw new Error(`vectordb: failed to load native library.
You may need to run \`npm install vectordb-${currentTarget()}\`.
You may need to run \`npm install @lancedb/vectordb-${currentTarget()}\`.
If that does not work, please file a bug report at https://github.com/lancedb/lancedb/issues
Source error: ${e}`);
Source error: ${e}`)
}
}
// Dynamic require for runtime.
module.exports = nativeLib;
module.exports = nativeLib

Some files were not shown because too many files have changed in this diff Show More