It's inconvenient to always require data at table creation time.
Here we enable you to create an empty table and add data and set schema
later.
---------
Co-authored-by: Chang She <chang@lancedb.com>
`Table::add(.., mode: WriteMode)`, which is a public API, currently uses
the WriteMode exported from `lance`. Re-export it to lancedb so that the
pub API looks better.
Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.
Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length
---------
Co-authored-by: Chang She <chang@lancedb.com>
- Creates testing files `md_testing.py` and `md_testing.js` for testing
python and nodejs code in markdown files in the documentation
This listens for HTML tags as well: `<!--[language] code code
code...-->` will create a set-up file to create some mock tables or to
fulfill some assumptions in the documentation.
- Creates a github action workflow that triggers every push/pr to
`docs/**`
- Modifies documentation so tests run (mostly indentation, some small
syntax errors and some missing imports)
A list of excluded files that we need to take a closer look at later on:
```javascript
const excludedFiles = [
"../src/fts.md",
"../src/embedding.md",
"../src/examples/serverless_lancedb_with_s3_and_lambda.md",
"../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
"../src/examples/youtube_transcript_bot_with_nodejs.md",
];
```
Many of them can't be done because we need the OpenAI API key :(.
`fts.md` has some issues with the library, I believe this is still
experimental?
Closes#170
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
* Refactors the Node module to load the shared library from a separate
package. When a user does `npm install vectordb`, the correct optional
dependency is automatically downloaded by npm.
* Add scripts and instructions to build Linux and MacOS node artifacts
locally.
* Add instructions for publishing the npm module and crates.
Co-authored-by: Will Jones <willjones127@gmail.com>
* to_df() is now async, added `to_df_blocking` to convenience
* add remote lancedb client to public lancedb
* make lancedb connection class understand url scheme
`lancedb+<connection_type>://<host>:<port>`.
Changes include:
- Contexts of sizes less than window param to be included as well
- Added optional threshold parameter to to_df in Contextualizer
This should close#165
- If maintainers are satisfied with the implementation will add more
examples and test cases and update the documentations as well.
---------
Co-authored-by: Nithin PS <47279496+Nithinps021@users.noreply.github.com>
Co-authored-by: Will Jones <willjones127@gmail.com>