# Filtering ## Pre and post-filtering LanceDB supports filtering of query results based on metadata fields. By default, post-filtering is performed on the top-k results returned by the vector search. However, pre-filtering is also an option that performs the filter prior to vector search. This can be useful to narrow down on the search space on a very large dataset to reduce query latency. Note that both pre-filtering and post-filtering can yield false positives. For pre-filtering, if the filter is too selective, it might eliminate relevant items that the vector search would have otherwise identified as a good match. In this case, increasing `nprobes` parameter will help reduce such false positives. It is recommended to set `use_index=false` if you know that the filter is highly selective. Similarly, a highly selective post-filter can lead to false positives. Increasing both `nprobes` and `refine_factor` can mitigate this issue. When deciding between pre-filtering and post-filtering, pre-filtering is generally the safer choice if you're uncertain. === "Python" ```py result = ( tbl.search([0.5, 0.2]) .where("id = 10", prefilter=True) .limit(1) .to_arrow() ) ``` === "TypeScript" === "@lancedb/lancedb" ```ts --8<-- "nodejs/examples/filtering.test.ts:search" ``` === "vectordb (deprecated)" ```ts --8<-- "docs/src/sql_legacy.ts:search" ``` !!! note Creating a [scalar index](guides/scalar_index.md) accelerates filtering ## SQL filters Because it's built on top of [DataFusion](https://github.com/apache/arrow-datafusion), LanceDB embraces the utilization of standard SQL expressions as predicates for filtering operations. It can be used during vector search, update, and deletion operations. Currently, Lance supports a growing list of SQL expressions. - `>`, `>=`, `<`, `<=`, `=` - `AND`, `OR`, `NOT` - `IS NULL`, `IS NOT NULL` - `IS TRUE`, `IS NOT TRUE`, `IS FALSE`, `IS NOT FALSE` - `IN` - `LIKE`, `NOT LIKE` - `CAST` - `regexp_match(column, pattern)` - [DataFusion Functions](https://arrow.apache.org/datafusion/user-guide/sql/scalar_functions.html) For example, the following filter string is acceptable: === "Python" ```python tbl.search([100, 102]) \ .where("(item IN ('item 0', 'item 2')) AND (id > 10)") \ .to_arrow() ``` === "TypeScript" === "@lancedb/lancedb" ```ts --8<-- "nodejs/examples/filtering.test.ts:vec_search" ``` === "vectordb (deprecated)" ```ts --8<-- "docs/src/sql_legacy.ts:vec_search" ``` If your column name contains special characters or is a [SQL Keyword](https://docs.rs/sqlparser/latest/sqlparser/keywords/index.html), you can use backtick (`` ` ``) to escape it. For nested fields, each segment of the path must be wrapped in backticks. === "SQL" ```sql `CUBE` = 10 AND `column name with space` IS NOT NULL AND `nested with space`.`inner with space` < 2 ``` !!!warning "Field names containing periods (`.`) are not supported." Literals for dates, timestamps, and decimals can be written by writing the string value after the type name. For example === "SQL" ```sql date_col = date '2021-01-01' and timestamp_col = timestamp '2021-01-01 00:00:00' and decimal_col = decimal(8,3) '1.000' ``` For timestamp columns, the precision can be specified as a number in the type parameter. Microsecond precision (6) is the default. | SQL | Time unit | | -------------- | ------------ | | `timestamp(0)` | Seconds | | `timestamp(3)` | Milliseconds | | `timestamp(6)` | Microseconds | | `timestamp(9)` | Nanoseconds | LanceDB internally stores data in [Apache Arrow](https://arrow.apache.org/) format. The mapping from SQL types to Arrow types is: | SQL type | Arrow type | | --------------------------------------------------------- | ------------------ | | `boolean` | `Boolean` | | `tinyint` / `tinyint unsigned` | `Int8` / `UInt8` | | `smallint` / `smallint unsigned` | `Int16` / `UInt16` | | `int` or `integer` / `int unsigned` or `integer unsigned` | `Int32` / `UInt32` | | `bigint` / `bigint unsigned` | `Int64` / `UInt64` | | `float` | `Float32` | | `double` | `Float64` | | `decimal(precision, scale)` | `Decimal128` | | `date` | `Date32` | | `timestamp` | `Timestamp` [^1] | | `string` | `Utf8` | | `binary` | `Binary` | [^1]: See precision mapping in previous table. ## Filtering without Vector Search You can also filter your data without search. === "Python" ```python tbl.search().where("id = 10").limit(10).to_arrow() ``` === "TypeScript" === "@lancedb/lancedb" ```ts --8<-- "nodejs/examples/filtering.test.ts:sql_search" ``` === "vectordb (deprecated)" ```ts --8<---- "docs/src/sql_legacy.ts:sql_search" ``` !!!warning "If your table is large, this could potentially return a very large amount of data. Please be sure to use a `limit` clause unless you're sure you want to return the whole result set."