Fixed the issue on lance-namespace side to avoid pinning to a specific
lance version. This should fix the issue of the increased release
artifact size and build time.
Add a new test feature which allows for running the lancedb tests
against a remote server. Convert over a few tests in src/connection.rs
as a proof of concept.
To make local development easier, the remote tests can be run locally
from a Makefile. This file can also be used to run the feature tests,
with a single invocation of 'make'. (The feature tests require bringing
up a docker compose environment.)
This PR adds support for namespace-backed databases through
lance-namespace integration, enabling centralized table management
through namespace APIs.
---------
Co-authored-by: Claude <noreply@anthropic.com>
### Bug Fix: Undefined Values in Nullable Fields
**Issue**: When inserting data with `undefined` values into nullable
fields, LanceDB was incorrectly coercing them to default values (`false`
for booleans, `NaN` for numbers, `""` for strings) instead of `null`.
**Fix**: Modified the `makeVector()` function in `arrow.ts` to properly
convert `undefined` values to `null` for nullable fields before passing
data to Apache Arrow.
fixes: #2645
**Result**: Now `{ text: undefined, number: undefined, bool: undefined
}` correctly becomes `{ text: null, number: null, bool: null }` when
fields are marked as nullable in the schema.
**Files Changed**:
- `nodejs/lancedb/arrow.ts` (core fix)
- `nodejs/__test__/arrow.test.ts` (test coverage)
- This ensures proper null handling for nullable fields as expected by
users.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
### Solution
Added special handling in `makeVector` function for boolean arrays where
all values are null. The fix creates a proper null bitmap using
`makeData` and `arrowMakeVector` instead of relying on Apache Arrow's
`vectorFromArray` which doesn't handle this edge case correctly.
fixes: #2644
### Changes
- Added null value detection for boolean types in `makeVector` function
- Creates proper Arrow data structure with null bitmap when all boolean
values are null
- Preserves existing behavior for non-null boolean values and other data
types
- Fixes the boolean null value bug while maintaining backward
compatibility.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
The basic idea of MRR is this -
https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr
I've implemented a weighted version for allowing user to set weightage
between vector and fts.
The gist is something like this
### Scenario A: Document at rank 1 in one set, absent from another
```
# Assuming equal weights: weight_vector = 0.5, weight_fts = 0.5
vector_rr = 1.0 # rank 1 → 1/1 = 1.0
fts_rr = 0.0 # absent → 0.0
weighted_mrr = 0.5 × 1.0 + 0.5 × 0.0 = 0.5
```
### Scenario B: Document at rank 1 in one set, rank 2 in another
```
# Same weights: weight_vector = 0.5, weight_fts = 0.5
vector_rr = 1.0 # rank 1 → 1/1 = 1.0
fts_rr = 0.5 # rank 2 → 1/2 = 0.5
weighted_mrr = 0.5 × 1.0 + 0.5 × 0.5 = 0.5 + 0.25 = 0.75
```
And so with `return_score="all"` the result looks something like this
(this is from the reranker tests).
Because this is a weighted rank based reranker, some results might have
the same score
```
text vector _distance _rowid _score _relevance_score
0 I am your father [-0.010703234, 0.069315575, 0.030076642, 0.002... 8.149148e-13 8589934598 10.978719 1.000000
1 the ground beneath my feet [-0.09500901, 0.00092102867, 0.0755851, 0.0372... 1.376896e+00 8589934604 NaN 0.250000
2 I find your lack of faith disturbing [0.07525753, -0.0100010475, 0.09990541, 0.0209... NaN 8589934595 3.483394 0.250000
3 but I don't wanna die [0.033476487, -0.011235877, -0.057625435, -0.0... 1.538222e+00 8589934610 1.130355 0.238095
4 if you strike me down I shall become more powe... [0.00432201, 0.030120496, 5.3317923e-05, 0.033... 1.381086e+00 8589934594 0.715157 0.216667
5 I see a salty message written in the eves [-0.04213107, 0.0016004723, 0.061052393, -0.02... 1.638301e+00 8589934603 1.043785 0.133333
6 but his son was mortal [0.012462767, 0.049041674, -0.057339743, -0.04... 1.421566e+00 8589934620 NaN 0.125000
7 I've got a bad feeling about this [-0.06973199, -0.029960092, 0.02641632, -0.031... NaN 8589934596 1.043785 0.125000
8 now that's a name I haven't heard in a long time [-0.014374257, -0.013588792, -0.07487557, 0.03... 1.597573e+00 8589934593 0.848772 0.118056
9 he was a god [-0.0258895, 0.11925236, -0.029397793, 0.05888... 1.423147e+00 8589934618 NaN 0.100000
10 I wish they would make another one [-0.14737535, -0.015304729, 0.04318139, -0.061... NaN 8589934622 1.043785 0.100000
11 Kratos had a son [-0.057455737, 0.13734367, -0.03537109, -0.000... 1.488075e+00 8589934617 NaN 0.083333
12 I don't wanna live like this [-0.0028891307, 0.015214227, 0.025183653, 0.08... NaN 8589934609 1.043785 0.071429
13 I see a mansard roof through the trees [0.052383978, 0.087759204, 0.014739997, 0.0239... NaN 8589934602 1.043785 0.062500
14 great kid don't get cocky [-0.047043696, 0.054648954, -0.008509666, -0.0... 1.618125e+00 8589934592 NaN 0.055556
```
Support shallow cloning a dataset at a specific location to create a new
dataset, using the shallow_clone feature in Lance. Also introduce remote
`clone` API for remote tables for this functionality.
- Fixes issue where passing `{ vector: undefined }` with an embedding
function threw "Found field not in schema" error instead of calling the
embedding function like `null` or omitted fields.
**Changes:**
- Modified `rowPathsAndValues` to skip undefined values during schema
inference
- Added test case verifying undefined, null, and omitted vector fields
all work correctly
**Before:** `{ vector: undefined }` → Error
**After:** `{ vector: undefined }` → Calls embedding function
Closes#2647
We had previously prototyped a `Catalog` trait anticipating a
three-tiered Catalog-Database-Table structure. Now that we have
namespaces in the `Database` we can support any tiering scheme and the
`Catalog` trait is no longer needed.
## Summary
This PR introduces a `HeaderProvider` which is called for all remote
HTTP calls to get the latest headers to inject. This is useful for
features like adding the latest auth tokens where the header provider
can auto-refresh tokens internally and each request always set the
refreshed token.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Updates lance to 0.35.0-beta4, which also incurs a datafusion update.
This brings in a fix for a memory leak in index caching, resulting from
a cyclical reference.
This PR adds mTLS (mutual TLS) configuration support for the LanceDB
remote HTTP client, allowing users to authenticate with client
certificates and configure custom CA certificates for server
verification.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Some of the data fusion optimizers optimize based on data statistics
(e.g. total bytes, number of rows).
If those statistics are not supplied, optimizers cannot optimize on top.
One example is Anti Hash Join which can optimize from LeftAnti (Left:
big table, Right: small table) to RightAnti (Left: small table, Right:
big table). Left Anti requires reading the whole big & small table while
RightAnti only requires reading the whole left table and supports limit
push down to only read partial of big table