feat: add take_offsets and take_row_ids (#2584)

These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)
2026-07-09 14:00:44 +00:00 · 2025-08-15 06:48:24 -07:00
parent 296205ef96
commit ed640a76d9
24 changed files with 1488 additions and 381 deletions
--- a/docs/src/js/classes/Session.md
+++ b/docs/src/js/classes/Session.md
@@ -9,7 +9,8 @@
 A session for managing caches and object stores across LanceDB operations.

 Sessions allow you to configure cache sizes for index and metadata caches,
-which can significantly impact performance for large datasets.
+which can significantly impact memory use and performance. They can
+also be re-used across multiple connections to share the same cache state.

 ## Constructors

@@ -24,8 +25,11 @@ Create a new session with custom cache sizes.
 # Parameters

 - `index_cache_size_bytes`: The size of the index cache in bytes.
+  Index data is stored in memory in this cache to speed up queries.
  Defaults to 6GB if not specified.
 - `metadata_cache_size_bytes`: The size of the metadata cache in bytes.
+  The metadata cache stores file metadata and schema information in memory.
+  This cache improves scan and write performance.
  Defaults to 1GB if not specified.

 #### Parameters