* `table.add` requires `data` parameter on the docs page regarding use
of embedding models from HF
* also changed the name of example class from `TextModel` to `Words`
since that is what is used as parameter in the `db.create_table` call
* Per
https://lancedb.github.io/lancedb/python/python/#lancedb.table.Table.add
This allows users to specify URIs like:
```
s3+ddb://my_bucket/path?ddbTableName=myCommitTable
```
and it will support concurrent writes in S3.
* [x] Add dynamodb integration tests
* [x] Add modifications to get it working in Python sync API
* [x] Added section in documentation describing how to configure.
Closes#534
---------
Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
Added the ability to specify tokenizer_name, when creating a full text
search index using tantivy. This enables the use of language specific
stemming.
Also updated the [guide on full text
search](https://lancedb.github.io/lancedb/fts/) with a short section on
choosing tokenizer.
Fixes#1315
- fix some clippy errors from ci running a different toolchain.
- add some saftey notes about some unsafe blocks.
- locks the toolchain so that it is consistent across dev and CI.
- Tried to address some onboarding feedbacks listed in
https://github.com/lancedb/lancedb/issues/1224
- Improve visibility of pydantic integration and embedding API. (Based
on onboarding feedback - Many ways of ingesting data, defining schema
but not sure what to use in a specific use-case)
- Add a guide that takes users through testing and improving retriever
performance using built-in utilities like hybrid-search and reranking
- Add some benchmarks for the above
- Add missing cohere docs
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
most of the time we don't need to reload. Locking the write lock and
performing IO is not an ideal pattern.
This PR tries to make the critical section of `.write()` happen less
frequently.
This isn't the most ideal solution. The most ideal solution should not
lock until the new dataset has been loaded. But that would require too
much refactoring.
- changed the error msg for table.search with wrong query vector dim
- added missing fields for listIndices and indexStats to be consistent
with Python API - will make changes in node integ test
part of https://github.com/lancedb/lancedb/issues/994.
Adds the ability to use the openai embedding functions.
the example can be run by the following
```sh
> EXPORT OPENAI_API_KEY="sk-..."
> cargo run --example openai --features=openai
```
which should output
```
Closest match: Winter Parka
```
This doesn't actually block a python-only release since this step runs
after the version bump has been pushed but it still would be nice for
the git job to finish successfully.