mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-07 12:22:59 +00:00
docs: Updates and refactor (#683)
This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>
This commit is contained in:
committed by
Weston Pace
parent
33ab68c790
commit
4d5d748acd
@@ -1,20 +1,12 @@
|
||||
# SQL filters
|
||||
# Filtering
|
||||
|
||||
LanceDB embraces the utilization of standard SQL expressions as predicates for hybrid
|
||||
filters. It can be used during hybrid vector search, update, and deletion operations.
|
||||
## Pre and post-filtering
|
||||
|
||||
Currently, Lance supports a growing list of expressions.
|
||||
LanceDB supports filtering of query results based on metadata fields. By default, post-filtering is
|
||||
performed on the top-k results returned by the vector search. However, pre-filtering is also an
|
||||
option that performs the filter prior to vector search. This can be useful to narrow down on
|
||||
the search space on a very large dataset to reduce query latency.
|
||||
|
||||
* ``>``, ``>=``, ``<``, ``<=``, ``=``
|
||||
* ``AND``, ``OR``, ``NOT``
|
||||
* ``IS NULL``, ``IS NOT NULL``
|
||||
* ``IS TRUE``, ``IS NOT TRUE``, ``IS FALSE``, ``IS NOT FALSE``
|
||||
* ``IN``
|
||||
* ``LIKE``, ``NOT LIKE``
|
||||
* ``CAST``
|
||||
* ``regexp_match(column, pattern)``
|
||||
|
||||
For example, the following filter string is acceptable:
|
||||
<!-- Setup Code
|
||||
```python
|
||||
import lancedb
|
||||
@@ -40,6 +32,45 @@ for (let i = 0; i < 10_000; i++) {
|
||||
const tbl = await db.createTable('myVectors', data)
|
||||
```
|
||||
-->
|
||||
|
||||
=== "Python"
|
||||
```py
|
||||
result = (
|
||||
tbl.search([0.5, 0.2])
|
||||
.where("id = 10", prefilter=True)
|
||||
.limit(1)
|
||||
.to_arrow()
|
||||
)
|
||||
```
|
||||
|
||||
=== "JavaScript"
|
||||
```javascript
|
||||
let result = await tbl.search(Array(1536).fill(0.5))
|
||||
.limit(1)
|
||||
.filter("id = 10")
|
||||
.prefilter(true)
|
||||
.execute()
|
||||
```
|
||||
|
||||
## SQL filters
|
||||
|
||||
Because it's built on top of [DataFusion](https://github.com/apache/arrow-datafusion), LanceDB
|
||||
embraces the utilization of standard SQL expressions as predicates for filtering operations.
|
||||
It can be used during vector search, update, and deletion operations.
|
||||
|
||||
Currently, Lance supports a growing list of SQL expressions.
|
||||
|
||||
* ``>``, ``>=``, ``<``, ``<=``, ``=``
|
||||
* ``AND``, ``OR``, ``NOT``
|
||||
* ``IS NULL``, ``IS NOT NULL``
|
||||
* ``IS TRUE``, ``IS NOT TRUE``, ``IS FALSE``, ``IS NOT FALSE``
|
||||
* ``IN``
|
||||
* ``LIKE``, ``NOT LIKE``
|
||||
* ``CAST``
|
||||
* ``regexp_match(column, pattern)``
|
||||
|
||||
For example, the following filter string is acceptable:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
@@ -117,12 +148,12 @@ You can also filter your data without search.
|
||||
|
||||
=== "Python"
|
||||
```python
|
||||
tbl.search().where("id=10").limit(10).to_arrow()
|
||||
tbl.search().where("id = 10").limit(10).to_arrow()
|
||||
```
|
||||
|
||||
=== "JavaScript"
|
||||
```javascript
|
||||
await tbl.where('id=10').limit(10).execute()
|
||||
await tbl.where('id = 10').limit(10).execute()
|
||||
```
|
||||
|
||||
!!! warning
|
||||
|
||||
Reference in New Issue
Block a user