# Filtering ## Pre and post-filtering LanceDB supports filtering of query results based on metadata fields. By default, post-filtering is performed on the top-k results returned by the vector search. However, pre-filtering is also an option that performs the filter prior to vector search. This can be useful to narrow down on the search space on a very large dataset to reduce query latency. === "Python" ```py result = ( tbl.search([0.5, 0.2]) .where("id = 10", prefilter=True) .limit(1) .to_arrow() ) ``` === "JavaScript" ```javascript let result = await tbl.search(Array(1536).fill(0.5)) .limit(1) .filter("id = 10") .prefilter(true) .execute() ``` ## SQL filters Because it's built on top of [DataFusion](https://github.com/apache/arrow-datafusion), LanceDB embraces the utilization of standard SQL expressions as predicates for filtering operations. It can be used during vector search, update, and deletion operations. Currently, Lance supports a growing list of SQL expressions. * ``>``, ``>=``, ``<``, ``<=``, ``=`` * ``AND``, ``OR``, ``NOT`` * ``IS NULL``, ``IS NOT NULL`` * ``IS TRUE``, ``IS NOT TRUE``, ``IS FALSE``, ``IS NOT FALSE`` * ``IN`` * ``LIKE``, ``NOT LIKE`` * ``CAST`` * ``regexp_match(column, pattern)`` For example, the following filter string is acceptable: === "Python" ```python tbl.search([100, 102]) \ .where("(item IN ('item 0', 'item 2')) AND (id > 10)") \ .to_arrow() ``` === "Javascript" ```javascript await tbl.search(Array(1536).fill(0)) .where("(item IN ('item 0', 'item 2')) AND (id > 10)") .execute() ``` If your column name contains special characters or is a [SQL Keyword](https://docs.rs/sqlparser/latest/sqlparser/keywords/index.html), you can use backtick (`` ` ``) to escape it. For nested fields, each segment of the path must be wrapped in backticks. === "SQL" ```sql `CUBE` = 10 AND `column name with space` IS NOT NULL AND `nested with space`.`inner with space` < 2 ``` !!! warning Field names containing periods (``.``) are not supported. Literals for dates, timestamps, and decimals can be written by writing the string value after the type name. For example === "SQL" ```sql date_col = date '2021-01-01' and timestamp_col = timestamp '2021-01-01 00:00:00' and decimal_col = decimal(8,3) '1.000' ``` For timestamp columns, the precision can be specified as a number in the type parameter. Microsecond precision (6) is the default. | SQL | Time unit | |------------------|--------------| | ``timestamp(0)`` | Seconds | | ``timestamp(3)`` | Milliseconds | | ``timestamp(6)`` | Microseconds | | ``timestamp(9)`` | Nanoseconds | LanceDB internally stores data in [Apache Arrow](https://arrow.apache.org/) format. The mapping from SQL types to Arrow types is: | SQL type | Arrow type | |----------|------------| | ``boolean`` | ``Boolean`` | | ``tinyint`` / ``tinyint unsigned`` | ``Int8`` / ``UInt8`` | | ``smallint`` / ``smallint unsigned`` | ``Int16`` / ``UInt16`` | | ``int`` or ``integer`` / ``int unsigned`` or ``integer unsigned`` | ``Int32`` / ``UInt32`` | | ``bigint`` / ``bigint unsigned`` | ``Int64`` / ``UInt64`` | | ``float`` | ``Float32`` | | ``double`` | ``Float64`` | | ``decimal(precision, scale)`` | ``Decimal128`` | | ``date`` | ``Date32`` | | ``timestamp`` | ``Timestamp`` [^1] | | ``string`` | ``Utf8`` | | ``binary`` | ``Binary`` | [^1]: See precision mapping in previous table. ## Filtering without Vector Search You can also filter your data without search. === "Python" ```python tbl.search().where("id = 10").limit(10).to_arrow() ``` === "JavaScript" ```javascript await tbl.where('id = 10').limit(10).execute() ``` !!! warning If your table is large, this could potentially return a very large amount of data. Please be sure to use a `limit` clause unless you're sure you want to return the whole result set.