This PR makes incremental changes to the documentation.
* Closes#697
* Closes#698
- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.
- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes#716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes#746
- [x] Add simple polars example to the integrations section. Closes#756
and closes#153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes#781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes#782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes#783 and closes#784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes#785
- [ ] Add a benchmark section that also discusses some best practices.
Closes#787
---------
Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Fix broken link to embedding functions
testing: broken link was verified after local docs build to have been
repaired
---------
Co-authored-by: Chang She <chang@lancedb.com>
Add `to_list` to return query results as list of python dict (so we're
not too pandas-centric). Closes#555
Add `to_pandas` API and add deprecation warning on `to_df`. Closes#545
Co-authored-by: Chang She <chang@lancedb.com>
- Creates testing files `md_testing.py` and `md_testing.js` for testing
python and nodejs code in markdown files in the documentation
This listens for HTML tags as well: `<!--[language] code code
code...-->` will create a set-up file to create some mock tables or to
fulfill some assumptions in the documentation.
- Creates a github action workflow that triggers every push/pr to
`docs/**`
- Modifies documentation so tests run (mostly indentation, some small
syntax errors and some missing imports)
A list of excluded files that we need to take a closer look at later on:
```javascript
const excludedFiles = [
"../src/fts.md",
"../src/embedding.md",
"../src/examples/serverless_lancedb_with_s3_and_lambda.md",
"../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
"../src/examples/youtube_transcript_bot_with_nodejs.md",
];
```
Many of them can't be done because we need the OpenAI API key :(.
`fts.md` has some issues with the library, I believe this is still
experimental?
Closes#170
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
Adds:
* Make `mkdocstrings` aware we are using numpy-style docstrings
* Fixes broken link on `index.md` to Python API docs (and added link to
node ones)
* Added examples to various classes.
* Added doctest to verify examples work.
Changed the link to the Youtube Transcripts jupyter notebook path on the
documentation.
Previously it went inside docs/notebooks (which does not exist). I've
modified it to go inside the notebooks folder instead.
This is v1 of integrating full text search index into LanceDB.
# API
The query API is roughly the same as before, except if the input is text
instead of a vector we assume that its fts search.
## Example
If `table` is a LanceDB LanceTable, then:
Build index: `table.create_fts_index("text")`
Query: `df = table.search("puppy").limit(10).select(["text"]).to_df()`
# Implementation
Here we use the tantivy-py package to build the index. We then use the
row id's as the full-text-search index's doc id then we just do a Take
operation to fetch the rows.
# Limitations
1. don't support incremental row appends yet. New data won't show up in
search
2. local filesystem only
3. requires building tantivy explicitly
---------
Co-authored-by: Chang She <chang@lancedb.com>