Fixes the breaking CI for nodejs, related to the documentation of the
new Permutation API in typescript.
- Expanded the generated typings in `nodejs/lancedb/native.d.ts` to
include `SplitCalculatedOptions`, `splitNames` fields, and the
persist/options-based `splitCalculated` methods so the permutation
exports match the native API.
- The previous block comment block had an inconsistency.
`splitCalculated` takes an options object (`SplitCalculatedOptions`) in
our bindings, not a bare string. The previous example showed
`builder.splitCalculated("user_id % 3");`, which doesn’t match the
actual signature and would fail TS typecheck. I updated the comment to
`builder.splitCalculated({ calculation: "user_id % 3" });` so the
example is now correct.
- Updated the `splitCalculated` example in
`nodejs/lancedb/permutation.ts` to use the options object.
- Ran `npm docs` to ensure docs build correctly.
> [!NOTE]
> **Disclaimer**: I used GPT-5.1-Codex-Max to make these updates, but I
have read the code and run `npm run docs` to verify that they work and
are correct to the best of my knowledge.
Did a full scan of all URLs that used to point to the old mkdocs pages,
and now links to the appropriate pages on lancedb.com/docs or lance.org
docs.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pipes the num_attempts field from lance's merge insert result
through lancedb. This allows callers of merge_insert to get a better
idea of whether transaction conflicts are occurring.
* Add `ci` profile for smaller build caches. This had a meaningful
impact in Lance, and I expect a similar impact here.
https://github.com/lancedb/lance/pull/5236
* Get caching working in Rust. Previously was not working due to
`workspaces: rust`.
* Get caching working in NodeJs lint job. Previously wasn't working
because we installed the toolchain **after** we called `- uses:
Swatinem/rust-cache@v2`, which invalidates the cache locally.
* Fix broken pytest from async io transition
(`pytest.PytestRemovedIn9Warning`)
* Altered `get_num_sub_vectors` to handle bug in case of 4-bit PQ. This
was cause of `rust future panicked: unknown error`. Raised an issue
upstream to change panic to error:
https://github.com/lancedb/lance/issues/5257
* Call `npm run docs` to fix doc issue.
* Disable flakey Windows test for consistency. It's just an OS-specific
timer issue, not our fault.
* Fix Windows absolute path handling in namespaces. Was causing CI
failure `OSError: [WinError 123] The filename, directory name, or volume
label syntax is incorrect: `
JS native Async Generator, more efficient asynchronous iteration, fewer
synthetic promises, and the ability to handle `catch` or `break` of
parent loop in `finally` block
I'm working on a lancedb version of pytorch data loading (and hopefully
addressing https://github.com/lancedb/lance/issues/3727).
However, rather than rely on pytorch for everything I'm moving some of
the things that pytorch does into rust. This gives us more control over
data loading (e.g. using shards or a hash-based split) and it allows
permutations to be persistent. In particular I hope to be able to:
* Create a persistent permutation
* This permutation can handle splits, filtering, shuffling, and sharding
* Create a rust data loader that can read a permutation (one or more
splits), or a subset of a permutation (for DDP)
* Create a python data loader that delegates to the rust data loader
Eventually create integrations for other data loading libraries,
including rust & node
The [`FieldLike` type in
arrow.ts](5ec12c9971/nodejs/lancedb/arrow.ts (L71-L78))
can have a `type: string` property, but before this change, actually
trying to create a table that has a schema that specifies field types by
name results in an error:
```
Error: Expected a Type but object was null/undefined
```
This change adds support for mapping some type name strings to arrow
`DataType`s, so that passing `FieldLike`s with a `type: string` property
to `sanitizeField` does not throw an error.
The type names that can be passed are upper/lowercase variations of the
keys of the `constructorsByTypeName` object. This does not support
mapping types that need parameters, such as timestamps which need
timezones.
With this, it is possible to create empty tables from `SchemaLike`
objects without instantiating arrow types, e.g.:
```
import { SchemaLike } from "../lancedb/arrow"
// ...
const schemaLike = {
fields: [
{
name: "id",
type: "int64",
nullable: true,
},
{
name: "vector",
type: "float64",
nullable: true,
},
],
// ...
} satisfies SchemaLike;
const table = await con.createEmptyTable("test", schemaLike);
```
This change also makes `FieldLike.nullable` required since the `sanitizeField` function throws if it is undefined.
**Problem**: When a vector field is marked as nullable, users should be
able to omit it or pass `undefined`, but this was throwing an error:
"Table has embeddings: 'vector', but no embedding function was provided"
fixes: #2646
**Solution**: Modified `validateSchemaEmbeddings` to check
`field.nullable` before treating `undefined` values as missing embedding
fields.
**Changes**:
- Fixed validation logic in `nodejs/lancedb/arrow.ts`
- Enabled previously skipped test for nullable fields
- Added reproduction test case
**Behavior**:
- ✅ `{ vector: undefined }` now works for nullable fields
- ✅ `{}` (omitted field) now works for nullable fields
- ✅ `{ vector: null }` still works (unchanged)
- ✅ Non-nullable fields still properly throw errors (unchanged)
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: neha <neha@posthog.com>
### Bug Fix: Undefined Values in Nullable Fields
**Issue**: When inserting data with `undefined` values into nullable
fields, LanceDB was incorrectly coercing them to default values (`false`
for booleans, `NaN` for numbers, `""` for strings) instead of `null`.
**Fix**: Modified the `makeVector()` function in `arrow.ts` to properly
convert `undefined` values to `null` for nullable fields before passing
data to Apache Arrow.
fixes: #2645
**Result**: Now `{ text: undefined, number: undefined, bool: undefined
}` correctly becomes `{ text: null, number: null, bool: null }` when
fields are marked as nullable in the schema.
**Files Changed**:
- `nodejs/lancedb/arrow.ts` (core fix)
- `nodejs/__test__/arrow.test.ts` (test coverage)
- This ensures proper null handling for nullable fields as expected by
users.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
### Solution
Added special handling in `makeVector` function for boolean arrays where
all values are null. The fix creates a proper null bitmap using
`makeData` and `arrowMakeVector` instead of relying on Apache Arrow's
`vectorFromArray` which doesn't handle this edge case correctly.
fixes: #2644
### Changes
- Added null value detection for boolean types in `makeVector` function
- Creates proper Arrow data structure with null bitmap when all boolean
values are null
- Preserves existing behavior for non-null boolean values and other data
types
- Fixes the boolean null value bug while maintaining backward
compatibility.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
Support shallow cloning a dataset at a specific location to create a new
dataset, using the shallow_clone feature in Lance. Also introduce remote
`clone` API for remote tables for this functionality.
- Fixes issue where passing `{ vector: undefined }` with an embedding
function threw "Found field not in schema" error instead of calling the
embedding function like `null` or omitted fields.
**Changes:**
- Modified `rowPathsAndValues` to skip undefined values during schema
inference
- Added test case verifying undefined, null, and omitted vector fields
all work correctly
**Before:** `{ vector: undefined }` → Error
**After:** `{ vector: undefined }` → Calls embedding function
Closes#2647