- Change k value to `10` for js search to keep it consistent with python
docs
- Uncomment now that cosine metrix is fixed in lance:
https://github.com/lancedb/lance/pull/1035
- Creates testing files `md_testing.py` and `md_testing.js` for testing
python and nodejs code in markdown files in the documentation
This listens for HTML tags as well: `<!--[language] code code
code...-->` will create a set-up file to create some mock tables or to
fulfill some assumptions in the documentation.
- Creates a github action workflow that triggers every push/pr to
`docs/**`
- Modifies documentation so tests run (mostly indentation, some small
syntax errors and some missing imports)
A list of excluded files that we need to take a closer look at later on:
```javascript
const excludedFiles = [
"../src/fts.md",
"../src/embedding.md",
"../src/examples/serverless_lancedb_with_s3_and_lambda.md",
"../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
"../src/examples/youtube_transcript_bot_with_nodejs.md",
];
```
Many of them can't be done because we need the OpenAI API key :(.
`fts.md` has some issues with the library, I believe this is still
experimental?
Closes#170
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
Adds:
* Make `mkdocstrings` aware we are using numpy-style docstrings
* Fixes broken link on `index.md` to Python API docs (and added link to
node ones)
* Added examples to various classes.
* Added doctest to verify examples work.
Changed the link to the Youtube Transcripts jupyter notebook path on the
documentation.
Previously it went inside docs/notebooks (which does not exist). I've
modified it to go inside the notebooks folder instead.
pypi does not allow packages to be uploaded that has a direct reference
for now we'll just ask the user to install tantivy separately
---------
Co-authored-by: Chang She <chang@lancedb.com>
This is v1 of integrating full text search index into LanceDB.
# API
The query API is roughly the same as before, except if the input is text
instead of a vector we assume that its fts search.
## Example
If `table` is a LanceDB LanceTable, then:
Build index: `table.create_fts_index("text")`
Query: `df = table.search("puppy").limit(10).select(["text"]).to_df()`
# Implementation
Here we use the tantivy-py package to build the index. We then use the
row id's as the full-text-search index's doc id then we just do a Take
operation to fetch the rows.
# Limitations
1. don't support incremental row appends yet. New data won't show up in
search
2. local filesystem only
3. requires building tantivy explicitly
---------
Co-authored-by: Chang She <chang@lancedb.com>