feat: add take_offsets and take_row_ids (#2584)

These operations have existed in lance for a long while and many users
need to drop down to lance for this capability. This PR adds the API and
implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't
currently add any load to `BaseTable`. I'm not sure that is sustainable
as base table implementations may want to specialize how they handle
this method. However, I figure it is a good starting point.

In addition, unlike Lance, this API does not currently guarantee
anything about the order of the take results. This is necessary for the
fallback filter approach to work (SQL filters cannot guarantee result
order)
This commit is contained in:
Weston Pace
2025-08-15 06:48:24 -07:00
committed by GitHub
parent 296205ef96
commit ed640a76d9
24 changed files with 1488 additions and 381 deletions

View File

@@ -14,7 +14,7 @@ A builder for LanceDB queries.
## Extends
- [`QueryBase`](QueryBase.md)<`NativeQuery`>
- `StandardQueryBase`<`NativeQuery`>
## Properties
@@ -26,7 +26,7 @@ protected inner: Query | Promise<Query>;
#### Inherited from
[`QueryBase`](QueryBase.md).[`inner`](QueryBase.md#inner)
`StandardQueryBase.inner`
## Methods
@@ -73,7 +73,7 @@ AnalyzeExec verbose=true, metrics=[]
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
`StandardQueryBase.analyzePlan`
***
@@ -107,7 +107,7 @@ single query)
#### Inherited from
[`QueryBase`](QueryBase.md).[`execute`](QueryBase.md#execute)
`StandardQueryBase.execute`
***
@@ -143,7 +143,7 @@ const plan = await table.query().nearestTo([0.5, 0.2]).explainPlan();
#### Inherited from
[`QueryBase`](QueryBase.md).[`explainPlan`](QueryBase.md#explainplan)
`StandardQueryBase.explainPlan`
***
@@ -164,7 +164,7 @@ Use [Table#optimize](Table.md#optimize) to index all un-indexed data.
#### Inherited from
[`QueryBase`](QueryBase.md).[`fastSearch`](QueryBase.md#fastsearch)
`StandardQueryBase.fastSearch`
***
@@ -194,7 +194,7 @@ Use `where` instead
#### Inherited from
[`QueryBase`](QueryBase.md).[`filter`](QueryBase.md#filter)
`StandardQueryBase.filter`
***
@@ -216,7 +216,7 @@ fullTextSearch(query, options?): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`fullTextSearch`](QueryBase.md#fulltextsearch)
`StandardQueryBase.fullTextSearch`
***
@@ -241,7 +241,7 @@ called then every valid row from the table will be returned.
#### Inherited from
[`QueryBase`](QueryBase.md).[`limit`](QueryBase.md#limit)
`StandardQueryBase.limit`
***
@@ -325,6 +325,10 @@ nearestToText(query, columns?): Query
offset(offset): this
```
Set the number of rows to skip before returning results.
This is useful for pagination.
#### Parameters
* **offset**: `number`
@@ -335,7 +339,7 @@ offset(offset): this
#### Inherited from
[`QueryBase`](QueryBase.md).[`offset`](QueryBase.md#offset)
`StandardQueryBase.offset`
***
@@ -388,7 +392,7 @@ object insertion order is easy to get wrong and `Map` is more foolproof.
#### Inherited from
[`QueryBase`](QueryBase.md).[`select`](QueryBase.md#select)
`StandardQueryBase.select`
***
@@ -410,7 +414,7 @@ Collect the results as an array of objects.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArray`](QueryBase.md#toarray)
`StandardQueryBase.toArray`
***
@@ -436,7 +440,7 @@ ArrowTable.
#### Inherited from
[`QueryBase`](QueryBase.md).[`toArrow`](QueryBase.md#toarrow)
`StandardQueryBase.toArrow`
***
@@ -471,7 +475,7 @@ on the filter column(s).
#### Inherited from
[`QueryBase`](QueryBase.md).[`where`](QueryBase.md#where)
`StandardQueryBase.where`
***
@@ -493,4 +497,4 @@ order to perform hybrid search.
#### Inherited from
[`QueryBase`](QueryBase.md).[`withRowId`](QueryBase.md#withrowid)
`StandardQueryBase.withRowId`