Compare commits

...

16 Commits

Author SHA1 Message Date
Lance Release
5cbbaa2e4a Bump version: 0.25.2-beta.3 → 0.25.2 2025-10-08 18:11:45 +00:00
Lance Release
1b6bd2498e Bump version: 0.25.2-beta.2 → 0.25.2-beta.3 2025-10-08 18:11:45 +00:00
Jack Ye
285da9db1d feat: upgrade lance to 0.38.2 (#2705) 2025-10-08 09:59:28 -07:00
Ayush Chaurasia
ad8306c96b docs: add custom redirect for storage page (#2706)
Expand the custom redirection links list to include storage page
2025-10-08 21:35:48 +05:30
Wyatt Alt
3594538509 fix: add name to index config and fix create_index typing (#2660)
Co-authored-by: Mark McCaskey <markm@harvey.ai>
2025-10-08 04:41:30 -07:00
Tom LaMarre
917aabd077 fix(node): support specifying arrow field types by name (#2704)
The [`FieldLike` type in
arrow.ts](5ec12c9971/nodejs/lancedb/arrow.ts (L71-L78))
can have a `type: string` property, but before this change, actually
trying to create a table that has a schema that specifies field types by
name results in an error:

```
Error: Expected a Type but object was null/undefined
```

This change adds support for mapping some type name strings to arrow
`DataType`s, so that passing `FieldLike`s with a `type: string` property
to `sanitizeField` does not throw an error.

The type names that can be passed are upper/lowercase variations of the
keys of the `constructorsByTypeName` object. This does not support
mapping types that need parameters, such as timestamps which need
timezones.

With this, it is possible to create empty tables from `SchemaLike`
objects without instantiating arrow types, e.g.:

```
    import { SchemaLike } from "../lancedb/arrow"
    // ...
    const schemaLike = {
      fields: [
        {
          name: "id",
          type: "int64",
          nullable: true,
        },
        {
          name: "vector",
          type: "float64",
          nullable: true,
        },
      ],
    // ...
    } satisfies SchemaLike;
    const table = await con.createEmptyTable("test", schemaLike);
 ```

This change also makes `FieldLike.nullable` required since the `sanitizeField` function throws if it is undefined.
2025-10-08 04:40:06 -07:00
Jack Ye
5ec12c9971 fix: federated database should not pass namesapce to listing database (#2702)
Fixes error that when converting a federated database operation to a
listing database operation, the namespace parameter is no longer correct
and should be dropped.

Note that with the testing infra we have today, we don't have a good way
to test these changes. I will do a quick follow up on
https://github.com/lancedb/lancedb/issues/2701 but would be great to get
this in first to resolve the related issues.
2025-10-06 14:12:41 -07:00
Ed Rogers
d0ce489b21 fix: use stdlib override when possible (#2699)
## Description of changes

Fixes #2698  

This PR uses
[`typing.override`](https://docs.python.org/3/library/typing.html#typing.override)
in favor of the [`overrides`](https://pypi.org/project/overrides/)
dependency when possible. As of Python 3.12, the standard library offers
`typing.override` to perform a static check on overridden methods.

### Motivation

Currently, `overrides` is incompatible with Python 3.14. As a result,
any package that attempts to import `overrides` using Python 3.14+ will
raise an `AttributeError`. An
[issue](https://github.com/mkorpela/overrides/issues/127) has been
raised and a [pull
request](https://github.com/mkorpela/overrides/pull/133) has been
submitted to the GitHub repo for the `overrides` project. But the
maintainer has been unresponsive.

To ensure readiness for Python 3.14, this package (and any other package
directly depending on `overrides`) should consider using
`typing.override` instead.

### Impact

The standard library added `typing.override` as of 3.12. As a result,
this change will affect only users of Python 3.12+. Previous versions
will continue to rely on `overrides`. Notably, the standard library
implementation is slightly different than that of `overrides`. A
thorough discussion of those differences is shown in [PEP
698](https://peps.python.org/pep-0698/), and it is also summarized
nicely by the maintainer of `overrides`
[here](https://github.com/mkorpela/overrides/issues/126#issuecomment-2401327116).

There are 2 main ways that switching from `overrides` to
`typing.override` will have an impact on developers of this repo.
1. `typing.override` does not implement any runtime checking. Instead,
it provides information to type checkers.
2. The stdlib does not provide a mixin class to enforce override
decorators on child classes. (Their reasoning for this is explained in
[the PEP](https://peps.python.org/pep-0698/).) This PR disables that
behavior entirely by replacing the `EnforceOverrides`.
2025-10-06 11:23:20 -07:00
Lance Release
d7e02c8181 Bump version: 0.22.2-beta.1 → 0.22.2-beta.2 2025-10-06 18:10:40 +00:00
Lance Release
70958f6366 Bump version: 0.25.2-beta.1 → 0.25.2-beta.2 2025-10-06 18:09:24 +00:00
Will Jones
1ac745eb18 ci: fix Python and Node CI on main (#2700)
Example failure:
https://github.com/lancedb/lancedb/actions/runs/18237024283/job/51932651993
2025-10-06 09:40:08 -07:00
Will Jones
1357fe8aa1 ci: run remote tests on PRs only if they aren't a fork (#2697) 2025-10-03 17:38:40 -07:00
LuQQiu
0d78929893 feat: upgrade lance to 0.38.0 (#2695)
https://github.com/lancedb/lance/releases/tag/v0.38.0

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-10-03 16:47:05 -07:00
Neha Prasad
9e2a68541e fix(node): allow undefined/omitted values for nullable vector fields (#2656)
**Problem**: When a vector field is marked as nullable, users should be
able to omit it or pass `undefined`, but this was throwing an error:
"Table has embeddings: 'vector', but no embedding function was provided"

fixes: #2646

**Solution**: Modified `validateSchemaEmbeddings` to check
`field.nullable` before treating `undefined` values as missing embedding
fields.

**Changes**:
- Fixed validation logic in `nodejs/lancedb/arrow.ts`
- Enabled previously skipped test for nullable fields
- Added reproduction test case

**Behavior**:
-  `{ vector: undefined }` now works for nullable fields
-  `{}` (omitted field) now works for nullable fields  
-  `{ vector: null }` still works (unchanged)
-  Non-nullable fields still properly throw errors (unchanged)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: neha <neha@posthog.com>
2025-10-02 10:53:05 -07:00
Will Jones
1aa0fd16e7 ci: automatic issue creation for failed publish workflows (#2694)
## Summary
- Created custom GitHub Action that creates issues when workflow jobs
fail
- Added report-failure jobs to cargo-publish.yml, java-publish.yml,
npm-publish.yml, and pypi-publish.yml
- Issues are created automatically with workflow name, failed job names,
and run URL

## Test plan
- Workflows will only create issues on actual release or
workflow_dispatch events
- Can be tested by triggering workflow_dispatch on a publish workflow

Based on lancedb/lance#4873

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-02 08:24:16 -07:00
Lance Release
fec2a05629 Bump version: 0.22.2-beta.0 → 0.22.2-beta.1 2025-09-30 19:31:44 +00:00
48 changed files with 1032 additions and 463 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.22.2-beta.0" current_version = "0.22.2-beta.2"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -0,0 +1,45 @@
name: Create Failure Issue
description: Creates a GitHub issue if any jobs in the workflow failed
inputs:
job-results:
description: 'JSON string of job results from needs context'
required: true
workflow-name:
description: 'Name of the workflow'
required: true
runs:
using: composite
steps:
- name: Check for failures and create issue
shell: bash
env:
JOB_RESULTS: ${{ inputs.job-results }}
WORKFLOW_NAME: ${{ inputs.workflow-name }}
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
GH_TOKEN: ${{ github.token }}
run: |
# Check if any job failed
if echo "$JOB_RESULTS" | jq -e 'to_entries | any(.value.result == "failure")' > /dev/null; then
echo "Detected job failures, creating issue..."
# Extract failed job names
FAILED_JOBS=$(echo "$JOB_RESULTS" | jq -r 'to_entries | map(select(.value.result == "failure")) | map(.key) | join(", ")')
# Create issue with workflow name, failed jobs, and run URL
gh issue create \
--title "$WORKFLOW_NAME Failed ($FAILED_JOBS)" \
--body "The workflow **$WORKFLOW_NAME** failed during execution.
**Failed jobs:** $FAILED_JOBS
**Run URL:** $RUN_URL
Please investigate the failed jobs and address any issues." \
--label "ci"
echo "Issue created successfully"
else
echo "No job failures detected, skipping issue creation"
fi

View File

@@ -38,3 +38,17 @@ jobs:
- name: Publish the package - name: Publish the package
run: | run: |
cargo publish -p lancedb --all-features --token ${{ steps.auth.outputs.token }} cargo publish -p lancedb --all-features --token ${{ steps.auth.outputs.token }}
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [build]
if: always() && (github.event_name == 'release' || github.event_name == 'workflow_dispatch')
permissions:
contents: read
issues: write
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/create-failure-issue
with:
job-results: ${{ toJSON(needs) }}
workflow-name: ${{ github.workflow }}

View File

@@ -43,7 +43,6 @@ jobs:
- uses: Swatinem/rust-cache@v2 - uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1 - uses: actions-rust-lang/setup-rust-toolchain@v1
with: with:
toolchain: "1.81.0"
cache-workspaces: "./java/core/lancedb-jni" cache-workspaces: "./java/core/lancedb-jni"
# Disable full debug symbol generation to speed up CI build and keep memory down # Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks. # "1" means line tables only, which is useful for panic tracebacks.
@@ -112,3 +111,17 @@ jobs:
env: env:
SONATYPE_USER: ${{ secrets.SONATYPE_USER }} SONATYPE_USER: ${{ secrets.SONATYPE_USER }}
SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }} SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }}
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [linux-arm64, linux-x86, macos-arm64]
if: always() && (github.event_name == 'release' || github.event_name == 'workflow_dispatch')
permissions:
contents: read
issues: write
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/create-failure-issue
with:
job-results: ${{ toJSON(needs) }}
workflow-name: ${{ github.workflow }}

View File

@@ -6,6 +6,7 @@ on:
- main - main
pull_request: pull_request:
paths: paths:
- Cargo.toml
- nodejs/** - nodejs/**
- .github/workflows/nodejs.yml - .github/workflows/nodejs.yml
- docker-compose.yml - docker-compose.yml

View File

@@ -365,3 +365,17 @@ jobs:
ARGS="$ARGS --tag preview" ARGS="$ARGS --tag preview"
fi fi
npm publish $ARGS npm publish $ARGS
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [build-lancedb, test-lancedb, publish]
if: always() && (github.event_name == 'release' || github.event_name == 'workflow_dispatch')
permissions:
contents: read
issues: write
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/create-failure-issue
with:
job-results: ${{ toJSON(needs) }}
workflow-name: ${{ github.workflow }}

View File

@@ -173,3 +173,17 @@ jobs:
generate_release_notes: false generate_release_notes: false
name: Python LanceDB v${{ steps.extract_version.outputs.version }} name: Python LanceDB v${{ steps.extract_version.outputs.version }}
body: ${{ steps.python_release_notes.outputs.changelog }} body: ${{ steps.python_release_notes.outputs.changelog }}
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [linux, mac, windows]
permissions:
contents: read
issues: write
if: always() && (github.event_name == 'release' || github.event_name == 'workflow_dispatch')
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/create-failure-issue
with:
job-results: ${{ toJSON(needs) }}
workflow-name: ${{ github.workflow }}

View File

@@ -6,6 +6,7 @@ on:
- main - main
pull_request: pull_request:
paths: paths:
- Cargo.toml
- python/** - python/**
- .github/workflows/python.yml - .github/workflows/python.yml

View File

@@ -125,6 +125,9 @@ jobs:
- name: Run examples - name: Run examples
run: cargo run --example simple --locked run: cargo run --example simple --locked
- name: Run remote tests - name: Run remote tests
# Running this requires access to secrets, so skip if this is
# a PR from a fork.
if: github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork
run: make -C ./lancedb remote-tests run: make -C ./lancedb remote-tests
macos: macos:

751
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -15,30 +15,30 @@ categories = ["database-implementations"]
rust-version = "1.78.0" rust-version = "1.78.0"
[workspace.dependencies] [workspace.dependencies]
lance = { "version" = "=0.37.0", default-features = false, "features" = ["dynamodb"], "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance = { "version" = "=0.38.2", default-features = false, "features" = ["dynamodb"] }
lance-io = { "version" = "=0.37.0", default-features = false, "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-io = { "version" = "=0.38.2", default-features = false }
lance-index = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-index = "=0.38.2"
lance-linalg = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-linalg = "=0.38.2"
lance-table = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-table = "=0.38.2"
lance-testing = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-testing = "=0.38.2"
lance-datafusion = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-datafusion = "=0.38.2"
lance-encoding = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } lance-encoding = "=0.38.2"
lance-namespace = "0.0.16" lance-namespace = "0.0.18"
# Note that this one does not include pyarrow # Note that this one does not include pyarrow
arrow = { version = "55.1", optional = false } arrow = { version = "56.2", optional = false }
arrow-array = "55.1" arrow-array = "56.2"
arrow-data = "55.1" arrow-data = "56.2"
arrow-ipc = "55.1" arrow-ipc = "56.2"
arrow-ord = "55.1" arrow-ord = "56.2"
arrow-schema = "55.1" arrow-schema = "56.2"
arrow-cast = "55.1" arrow-cast = "56.2"
async-trait = "0" async-trait = "0"
datafusion = { version = "49.0", default-features = false } datafusion = { version = "50.1", default-features = false }
datafusion-catalog = "49.0" datafusion-catalog = "50.1"
datafusion-common = { version = "49.0", default-features = false } datafusion-common = { version = "50.1", default-features = false }
datafusion-execution = "49.0" datafusion-execution = "50.1"
datafusion-expr = "49.0" datafusion-expr = "50.1"
datafusion-physical-plan = "49.0" datafusion-physical-plan = "50.1"
env_logger = "0.11" env_logger = "0.11"
half = { "version" = "2.6.0", default-features = false, features = [ half = { "version" = "2.6.0", default-features = false, features = [
"num-traits", "num-traits",
@@ -61,13 +61,14 @@ chrono = "=0.4.41"
# Workaround for: https://github.com/Lokathor/bytemuck/issues/306 # Workaround for: https://github.com/Lokathor/bytemuck/issues/306
bytemuck_derive = ">=1.8.1, <1.9.0" bytemuck_derive = ">=1.8.1, <1.9.0"
[patch.crates-io] # This is only needed when we reference preview releases of lance
# Force to use the same lance version as the rest of the project to avoid duplicate dependencies # [patch.crates-io]
lance = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # # Force to use the same lance version as the rest of the project to avoid duplicate dependencies
lance-io = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-index = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-io = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-linalg = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-index = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-table = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-linalg = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-testing = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-table = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-datafusion = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-testing = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
lance-encoding = { "version" = "=0.37.0", "tag" = "v0.37.1-beta.1", "git" = "https://github.com/lancedb/lance.git" } # lance-datafusion = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }
# lance-encoding = { "version" = "=0.38.0", "tag" = "v0.38.0", "git" = "https://github.com/lancedb/lance.git" }

View File

@@ -84,6 +84,7 @@ plugins:
'examples.md': 'https://lancedb.com/docs/tutorials/' 'examples.md': 'https://lancedb.com/docs/tutorials/'
'concepts/vector_search.md': 'https://lancedb.com/docs/search/vector-search/' 'concepts/vector_search.md': 'https://lancedb.com/docs/search/vector-search/'
'troubleshooting.md': 'https://lancedb.com/docs/troubleshooting/' 'troubleshooting.md': 'https://lancedb.com/docs/troubleshooting/'
'guides/storage.md': 'https://lancedb.com/docs/storage/integrations'

View File

@@ -52,6 +52,30 @@ the merge result
*** ***
### useIndex()
```ts
useIndex(useIndex): MergeInsertBuilder
```
Controls whether to use indexes for the merge operation.
When set to `true` (the default), the operation will use an index if available
on the join key for improved performance. When set to `false`, it forces a full
table scan even if an index exists. This can be useful for benchmarking or when
the query optimizer chooses a suboptimal path.
#### Parameters
* **useIndex**: `boolean`
Whether to use indices for the merge operation. Defaults to `true`.
#### Returns
[`MergeInsertBuilder`](MergeInsertBuilder.md)
***
### whenMatchedUpdateAll() ### whenMatchedUpdateAll()
```ts ```ts

View File

@@ -8,7 +8,7 @@
<parent> <parent>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.2-beta.0</version> <version>0.22.2-beta.2</version>
<relativePath>../pom.xml</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>

View File

@@ -8,7 +8,7 @@
<parent> <parent>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.2-beta.0</version> <version>0.22.2-beta.2</version>
<relativePath>../pom.xml</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.2-beta.0</version> <version>0.22.2-beta.2</version>
<packaging>pom</packaging> <packaging>pom</packaging>
<name>${project.artifactId}</name> <name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description> <description>LanceDB Java SDK Parent POM</description>

View File

@@ -1,7 +1,7 @@
[package] [package]
name = "lancedb-nodejs" name = "lancedb-nodejs"
edition.workspace = true edition.workspace = true
version = "0.22.2-beta.0" version = "0.22.2-beta.2"
license.workspace = true license.workspace = true
description.workspace = true description.workspace = true
repository.workspace = true repository.workspace = true

View File

@@ -0,0 +1,184 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
import * as arrow from "../lancedb/arrow";
import { sanitizeField, sanitizeType } from "../lancedb/sanitize";
describe("sanitize", function () {
describe("sanitizeType function", function () {
it("should handle type objects", function () {
const type = new arrow.Int32();
const result = sanitizeType(type);
expect(result.typeId).toBe(arrow.Type.Int);
expect((result as arrow.Int).bitWidth).toBe(32);
expect((result as arrow.Int).isSigned).toBe(true);
const floatType = {
typeId: 3, // Type.Float = 3
precision: 2,
toString: () => "Float",
isFloat: true,
isFixedWidth: true,
};
const floatResult = sanitizeType(floatType);
expect(floatResult).toBeInstanceOf(arrow.DataType);
expect(floatResult.typeId).toBe(arrow.Type.Float);
const floatResult2 = sanitizeType({ ...floatType, typeId: () => 3 });
expect(floatResult2).toBeInstanceOf(arrow.DataType);
expect(floatResult2.typeId).toBe(arrow.Type.Float);
});
const allTypeNameTestCases = [
["null", new arrow.Null()],
["binary", new arrow.Binary()],
["utf8", new arrow.Utf8()],
["bool", new arrow.Bool()],
["int8", new arrow.Int8()],
["int16", new arrow.Int16()],
["int32", new arrow.Int32()],
["int64", new arrow.Int64()],
["uint8", new arrow.Uint8()],
["uint16", new arrow.Uint16()],
["uint32", new arrow.Uint32()],
["uint64", new arrow.Uint64()],
["float16", new arrow.Float16()],
["float32", new arrow.Float32()],
["float64", new arrow.Float64()],
["datemillisecond", new arrow.DateMillisecond()],
["dateday", new arrow.DateDay()],
["timenanosecond", new arrow.TimeNanosecond()],
["timemicrosecond", new arrow.TimeMicrosecond()],
["timemillisecond", new arrow.TimeMillisecond()],
["timesecond", new arrow.TimeSecond()],
["intervaldaytime", new arrow.IntervalDayTime()],
["intervalyearmonth", new arrow.IntervalYearMonth()],
["durationnanosecond", new arrow.DurationNanosecond()],
["durationmicrosecond", new arrow.DurationMicrosecond()],
["durationmillisecond", new arrow.DurationMillisecond()],
["durationsecond", new arrow.DurationSecond()],
] as const;
it.each(allTypeNameTestCases)(
'should map type name "%s" to %s',
function (name, expected) {
const result = sanitizeType(name);
expect(result).toBeInstanceOf(expected.constructor);
},
);
const caseVariationTestCases = [
["NULL", new arrow.Null()],
["Utf8", new arrow.Utf8()],
["FLOAT32", new arrow.Float32()],
["DaTedAy", new arrow.DateDay()],
] as const;
it.each(caseVariationTestCases)(
'should be case insensitive for type name "%s" mapped to %s',
function (name, expected) {
const result = sanitizeType(name);
expect(result).toBeInstanceOf(expected.constructor);
},
);
it("should throw error for unrecognized type name", function () {
expect(() => sanitizeType("invalid_type")).toThrow(
"Unrecognized type name in schema: invalid_type",
);
});
});
describe("sanitizeField function", function () {
it("should handle field with string type name", function () {
const field = sanitizeField({
name: "string_field",
type: "utf8",
nullable: true,
metadata: new Map([["key", "value"]]),
});
expect(field).toBeInstanceOf(arrow.Field);
expect(field.name).toBe("string_field");
expect(field.type).toBeInstanceOf(arrow.Utf8);
expect(field.nullable).toBe(true);
expect(field.metadata?.get("key")).toBe("value");
});
it("should handle field with type object", function () {
const floatType = {
typeId: 3, // Float
precision: 32,
};
const field = sanitizeField({
name: "float_field",
type: floatType,
nullable: false,
});
expect(field).toBeInstanceOf(arrow.Field);
expect(field.name).toBe("float_field");
expect(field.type).toBeInstanceOf(arrow.DataType);
expect(field.type.typeId).toBe(arrow.Type.Float);
expect((field.type as arrow.Float64).precision).toBe(32);
expect(field.nullable).toBe(false);
});
it("should handle field with direct Type instance", function () {
const field = sanitizeField({
name: "bool_field",
type: new arrow.Bool(),
nullable: true,
});
expect(field).toBeInstanceOf(arrow.Field);
expect(field.name).toBe("bool_field");
expect(field.type).toBeInstanceOf(arrow.Bool);
expect(field.nullable).toBe(true);
});
it("should throw error for invalid field object", function () {
expect(() =>
sanitizeField({
type: "int32",
nullable: true,
}),
).toThrow(
"The field passed in is missing a `type`/`name`/`nullable` property",
);
// Invalid type
expect(() =>
sanitizeField({
name: "invalid",
type: { invalid: true },
nullable: true,
}),
).toThrow("Expected a Type to have a typeId property");
// Invalid nullable
expect(() =>
sanitizeField({
name: "invalid_nullable",
type: "int32",
nullable: "not a boolean",
}),
).toThrow("The field passed in had a non-boolean `nullable` property");
});
it("should report error for invalid type name", function () {
expect(() =>
sanitizeField({
name: "invalid_field",
type: "invalid_type",
nullable: true,
}),
).toThrow(
"Unable to sanitize type for field: invalid_field due to error: Error: Unrecognized type name in schema: invalid_type",
);
});
});
});

View File

@@ -10,7 +10,13 @@ import * as arrow16 from "apache-arrow-16";
import * as arrow17 from "apache-arrow-17"; import * as arrow17 from "apache-arrow-17";
import * as arrow18 from "apache-arrow-18"; import * as arrow18 from "apache-arrow-18";
import { MatchQuery, PhraseQuery, Table, connect } from "../lancedb"; import {
Connection,
MatchQuery,
PhraseQuery,
Table,
connect,
} from "../lancedb";
import { import {
Table as ArrowTable, Table as ArrowTable,
Field, Field,
@@ -21,6 +27,8 @@ import {
Int64, Int64,
List, List,
Schema, Schema,
SchemaLike,
Type,
Uint8, Uint8,
Utf8, Utf8,
makeArrowTable, makeArrowTable,
@@ -211,8 +219,7 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
}, },
); );
// TODO: https://github.com/lancedb/lancedb/issues/1832 it("should be able to omit nullable fields", async () => {
it.skip("should be able to omit nullable fields", async () => {
const db = await connect(tmpDir.name); const db = await connect(tmpDir.name);
const schema = new arrow.Schema([ const schema = new arrow.Schema([
new arrow.Field( new arrow.Field(
@@ -236,23 +243,36 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
await table.add([data3]); await table.add([data3]);
let res = await table.query().limit(10).toArray(); let res = await table.query().limit(10).toArray();
const resVector = res.map((r) => r.get("vector").toArray()); const resVector = res.map((r) =>
r.vector ? Array.from(r.vector) : null,
);
expect(resVector).toEqual([null, data2.vector, data3.vector]); expect(resVector).toEqual([null, data2.vector, data3.vector]);
const resItem = res.map((r) => r.get("item").toArray()); const resItem = res.map((r) => r.item);
expect(resItem).toEqual(["foo", null, "bar"]); expect(resItem).toEqual(["foo", null, "bar"]);
const resPrice = res.map((r) => r.get("price").toArray()); const resPrice = res.map((r) => r.price);
expect(resPrice).toEqual([10.0, 2.0, 3.0]); expect(resPrice).toEqual([10.0, 2.0, 3.0]);
const data4 = { item: "foo" }; const data4 = { item: "foo" };
// We can't omit a column if it's not nullable // We can't omit a column if it's not nullable
await expect(table.add([data4])).rejects.toThrow("Invalid user input"); await expect(table.add([data4])).rejects.toThrow(
"Append with different schema",
);
// But we can alter columns to make them nullable // But we can alter columns to make them nullable
await table.alterColumns([{ path: "price", nullable: true }]); await table.alterColumns([{ path: "price", nullable: true }]);
await table.add([data4]); await table.add([data4]);
res = (await table.query().limit(10).toArray()).map((r) => r.toJSON()); res = (await table.query().limit(10).toArray()).map((r) => ({
expect(res).toEqual([data1, data2, data3, data4]); ...r.toJSON(),
vector: r.vector ? Array.from(r.vector) : null,
}));
// Rust fills missing nullable fields with null
expect(res).toEqual([
{ ...data1, vector: null },
{ ...data2, item: null },
data3,
{ ...data4, price: null, vector: null },
]);
}); });
it("should be able to insert nullable data for non-nullable fields", async () => { it("should be able to insert nullable data for non-nullable fields", async () => {
@@ -330,6 +350,43 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
const table = await db.createTable("my_table", data); const table = await db.createTable("my_table", data);
expect(await table.countRows()).toEqual(2); expect(await table.countRows()).toEqual(2);
}); });
it("should allow undefined and omitted nullable vector fields", async () => {
// Test for the bug: can't pass undefined or omit vector column
const db = await connect("memory://");
const schema = new arrow.Schema([
new arrow.Field("id", new arrow.Int32(), true),
new arrow.Field(
"vector",
new arrow.FixedSizeList(
32,
new arrow.Field("item", new arrow.Float32(), true),
),
true, // nullable = true
),
]);
const table = await db.createEmptyTable("test_table", schema);
// Should not throw error for undefined value
await table.add([{ id: 0, vector: undefined }]);
// Should not throw error for omitted field
await table.add([{ id: 1 }]);
// Should still work for null
await table.add([{ id: 2, vector: null }]);
// Should still work for actual vector
const testVector = new Array(32).fill(0.5);
await table.add([{ id: 3, vector: testVector }]);
expect(await table.countRows()).toEqual(4);
const res = await table.query().limit(10).toArray();
const resVector = res.map((r) =>
r.vector ? Array.from(r.vector) : null,
);
expect(resVector).toEqual([null, null, null, testVector]);
});
}, },
); );
@@ -1454,7 +1511,9 @@ describe("when optimizing a dataset", () => {
it("delete unverified", async () => { it("delete unverified", async () => {
const version = await table.version(); const version = await table.version();
const versionFile = `${tmpDir.name}/${table.name}.lance/_versions/${version - 1}.manifest`; const versionFile = `${tmpDir.name}/${table.name}.lance/_versions/${
version - 1
}.manifest`;
fs.rmSync(versionFile); fs.rmSync(versionFile);
let stats = await table.optimize({ deleteUnverified: false }); let stats = await table.optimize({ deleteUnverified: false });
@@ -1968,3 +2027,52 @@ describe("column name options", () => {
expect(results2.length).toBe(10); expect(results2.length).toBe(10);
}); });
}); });
describe("when creating an empty table", () => {
let con: Connection;
beforeEach(async () => {
const tmpDir = tmp.dirSync({ unsafeCleanup: true });
con = await connect(tmpDir.name);
});
afterEach(() => {
con.close();
});
it("can create an empty table from an arrow Schema", async () => {
const schema = new Schema([
new Field("id", new Int64()),
new Field("vector", new Float64()),
]);
const table = await con.createEmptyTable("test", schema);
const actualSchema = await table.schema();
expect(actualSchema.fields[0].type.typeId).toBe(Type.Int);
expect((actualSchema.fields[0].type as Int64).bitWidth).toBe(64);
expect(actualSchema.fields[1].type.typeId).toBe(Type.Float);
expect((actualSchema.fields[1].type as Float64).precision).toBe(2);
});
it("can create an empty table from schema that specifies field types by name", async () => {
const schemaLike = {
fields: [
{
name: "id",
type: "int64",
nullable: true,
},
{
name: "vector",
type: "float64",
nullable: true,
},
],
metadata: new Map(),
names: ["id", "vector"],
} satisfies SchemaLike;
const table = await con.createEmptyTable("test", schemaLike);
const actualSchema = await table.schema();
expect(actualSchema.fields[0].type.typeId).toBe(Type.Int);
expect((actualSchema.fields[0].type as Int64).bitWidth).toBe(64);
expect(actualSchema.fields[1].type.typeId).toBe(Type.Float);
expect((actualSchema.fields[1].type as Float64).precision).toBe(2);
});
});

View File

@@ -73,7 +73,7 @@ export type FieldLike =
| { | {
type: string; type: string;
name: string; name: string;
nullable?: boolean; nullable: boolean;
metadata?: Map<string, string>; metadata?: Map<string, string>;
}; };
@@ -1285,19 +1285,36 @@ function validateSchemaEmbeddings(
if (isFixedSizeList(field.type)) { if (isFixedSizeList(field.type)) {
field = sanitizeField(field); field = sanitizeField(field);
if (data.length !== 0 && data?.[0]?.[field.name] === undefined) { if (data.length !== 0 && data?.[0]?.[field.name] === undefined) {
// Check if there's an embedding function registered for this field
let hasEmbeddingFunction = false;
// Check schema metadata for embedding functions
if (schema.metadata.has("embedding_functions")) { if (schema.metadata.has("embedding_functions")) {
const embeddings = JSON.parse( const embeddings = JSON.parse(
schema.metadata.get("embedding_functions")!, schema.metadata.get("embedding_functions")!,
); );
if ( // biome-ignore lint/suspicious/noExplicitAny: we don't know the type of `f`
// biome-ignore lint/suspicious/noExplicitAny: we don't know the type of `f` if (embeddings.find((f: any) => f["vectorColumn"] === field.name)) {
embeddings.find((f: any) => f["vectorColumn"] === field.name) === hasEmbeddingFunction = true;
undefined }
) { }
// Check passed embedding function parameter
if (embeddings && embeddings.vectorColumn === field.name) {
hasEmbeddingFunction = true;
}
// If the field is nullable AND there's no embedding function, allow undefined/omitted values
if (field.nullable && !hasEmbeddingFunction) {
fields.push(field);
} else {
// Either not nullable OR has embedding function - require explicit values
if (hasEmbeddingFunction) {
// Don't add to missingEmbeddingFields since this is expected to be filled by embedding function
fields.push(field);
} else {
missingEmbeddingFields.push(field); missingEmbeddingFields.push(field);
} }
} else {
missingEmbeddingFields.push(field);
} }
} else { } else {
fields.push(field); fields.push(field);

View File

@@ -326,6 +326,9 @@ export function sanitizeDictionary(typeLike: object) {
// biome-ignore lint/suspicious/noExplicitAny: skip // biome-ignore lint/suspicious/noExplicitAny: skip
export function sanitizeType(typeLike: unknown): DataType<any> { export function sanitizeType(typeLike: unknown): DataType<any> {
if (typeof typeLike === "string") {
return dataTypeFromName(typeLike);
}
if (typeof typeLike !== "object" || typeLike === null) { if (typeof typeLike !== "object" || typeLike === null) {
throw Error("Expected a Type but object was null/undefined"); throw Error("Expected a Type but object was null/undefined");
} }
@@ -447,7 +450,7 @@ export function sanitizeType(typeLike: unknown): DataType<any> {
case Type.DurationSecond: case Type.DurationSecond:
return new DurationSecond(); return new DurationSecond();
default: default:
throw new Error("Unrecoginized type id in schema: " + typeId); throw new Error("Unrecognized type id in schema: " + typeId);
} }
} }
@@ -467,7 +470,15 @@ export function sanitizeField(fieldLike: unknown): Field {
"The field passed in is missing a `type`/`name`/`nullable` property", "The field passed in is missing a `type`/`name`/`nullable` property",
); );
} }
const type = sanitizeType(fieldLike.type); let type: DataType;
try {
type = sanitizeType(fieldLike.type);
} catch (error: unknown) {
throw Error(
`Unable to sanitize type for field: ${fieldLike.name} due to error: ${error}`,
{ cause: error },
);
}
const name = fieldLike.name; const name = fieldLike.name;
if (!(typeof name === "string")) { if (!(typeof name === "string")) {
throw Error("The field passed in had a non-string `name` property"); throw Error("The field passed in had a non-string `name` property");
@@ -581,3 +592,46 @@ function sanitizeData(
}, },
); );
} }
const constructorsByTypeName = {
null: () => new Null(),
binary: () => new Binary(),
utf8: () => new Utf8(),
bool: () => new Bool(),
int8: () => new Int8(),
int16: () => new Int16(),
int32: () => new Int32(),
int64: () => new Int64(),
uint8: () => new Uint8(),
uint16: () => new Uint16(),
uint32: () => new Uint32(),
uint64: () => new Uint64(),
float16: () => new Float16(),
float32: () => new Float32(),
float64: () => new Float64(),
datemillisecond: () => new DateMillisecond(),
dateday: () => new DateDay(),
timenanosecond: () => new TimeNanosecond(),
timemicrosecond: () => new TimeMicrosecond(),
timemillisecond: () => new TimeMillisecond(),
timesecond: () => new TimeSecond(),
intervaldaytime: () => new IntervalDayTime(),
intervalyearmonth: () => new IntervalYearMonth(),
durationnanosecond: () => new DurationNanosecond(),
durationmicrosecond: () => new DurationMicrosecond(),
durationmillisecond: () => new DurationMillisecond(),
durationsecond: () => new DurationSecond(),
} as const;
type MappableTypeName = keyof typeof constructorsByTypeName;
export function dataTypeFromName(typeName: string): DataType {
const normalizedTypeName = typeName.toLowerCase() as MappableTypeName;
const _constructor = constructorsByTypeName[normalizedTypeName];
if (!_constructor) {
throw new Error("Unrecognized type name in schema: " + typeName);
}
return _constructor();
}

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-darwin-arm64", "name": "@lancedb/lancedb-darwin-arm64",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node", "main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-darwin-x64", "name": "@lancedb/lancedb-darwin-x64",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.darwin-x64.node", "main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-gnu", "name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node", "main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-musl", "name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node", "main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-gnu", "name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node", "main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-musl", "name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node", "main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-arm64-msvc", "name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": [ "os": [
"win32" "win32"
], ],

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-x64-msvc", "name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"os": ["win32"], "os": ["win32"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node", "main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{ {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"cpu": [ "cpu": [
"x64", "x64",
"arm64" "arm64"

View File

@@ -11,7 +11,7 @@
"ann" "ann"
], ],
"private": false, "private": false,
"version": "0.22.2-beta.0", "version": "0.22.2-beta.2",
"main": "dist/index.js", "main": "dist/index.js",
"exports": { "exports": {
".": "./dist/index.js", ".": "./dist/index.js",

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.25.2-beta.1" current_version = "0.25.2"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb-python" name = "lancedb-python"
version = "0.25.2-beta.1" version = "0.25.2"
edition.workspace = true edition.workspace = true
description = "Python bindings for LanceDB" description = "Python bindings for LanceDB"
license.workspace = true license.workspace = true
@@ -14,12 +14,12 @@ name = "_lancedb"
crate-type = ["cdylib"] crate-type = ["cdylib"]
[dependencies] [dependencies]
arrow = { version = "55.1", features = ["pyarrow"] } arrow = { version = "56.2", features = ["pyarrow"] }
async-trait = "0.1" async-trait = "0.1"
lancedb = { path = "../rust/lancedb", default-features = false } lancedb = { path = "../rust/lancedb", default-features = false }
env_logger.workspace = true env_logger.workspace = true
pyo3 = { version = "0.24", features = ["extension-module", "abi3-py39"] } pyo3 = { version = "0.25", features = ["extension-module", "abi3-py39"] }
pyo3-async-runtimes = { version = "0.24", features = [ pyo3-async-runtimes = { version = "0.25", features = [
"attributes", "attributes",
"tokio-runtime", "tokio-runtime",
] } ] }
@@ -28,7 +28,7 @@ futures.workspace = true
tokio = { version = "1.40", features = ["sync"] } tokio = { version = "1.40", features = ["sync"] }
[build-dependencies] [build-dependencies]
pyo3-build-config = { version = "0.24", features = [ pyo3-build-config = { version = "0.25", features = [
"extension-module", "extension-module",
"abi3-py39", "abi3-py39",
] } ] }

View File

@@ -5,7 +5,7 @@ dynamic = ["version"]
dependencies = [ dependencies = [
"deprecation", "deprecation",
"numpy", "numpy",
"overrides>=0.7", "overrides>=0.7; python_version<'3.12'",
"packaging", "packaging",
"pyarrow>=16", "pyarrow>=16",
"pydantic>=1.10", "pydantic>=1.10",

View File

@@ -133,6 +133,7 @@ class Tags:
async def update(self, tag: str, version: int): ... async def update(self, tag: str, version: int): ...
class IndexConfig: class IndexConfig:
name: str
index_type: str index_type: str
columns: List[str] columns: List[str]

View File

@@ -6,10 +6,18 @@ from __future__ import annotations
from abc import abstractmethod from abc import abstractmethod
from pathlib import Path from pathlib import Path
import sys
from typing import TYPE_CHECKING, Dict, Iterable, List, Literal, Optional, Union from typing import TYPE_CHECKING, Dict, Iterable, List, Literal, Optional, Union
if sys.version_info >= (3, 12):
from typing import override
class EnforceOverrides:
pass
else:
from overrides import EnforceOverrides, override # type: ignore
from lancedb.embeddings.registry import EmbeddingFunctionRegistry from lancedb.embeddings.registry import EmbeddingFunctionRegistry
from overrides import EnforceOverrides, override # type: ignore
from lancedb.common import data_to_reader, sanitize_uri, validate_schema from lancedb.common import data_to_reader, sanitize_uri, validate_schema
from lancedb.background_loop import LOOP from lancedb.background_loop import LOOP

View File

@@ -12,13 +12,18 @@ from __future__ import annotations
from typing import Dict, Iterable, List, Optional, Union from typing import Dict, Iterable, List, Optional, Union
import os import os
import sys
if sys.version_info >= (3, 12):
from typing import override
else:
from overrides import override
from lancedb.db import DBConnection from lancedb.db import DBConnection
from lancedb.table import LanceTable, Table from lancedb.table import LanceTable, Table
from lancedb.util import validate_table_name from lancedb.util import validate_table_name
from lancedb.common import validate_schema from lancedb.common import validate_schema
from lancedb.table import sanitize_create_table from lancedb.table import sanitize_create_table
from overrides import override
from lance_namespace import LanceNamespace, connect as namespace_connect from lance_namespace import LanceNamespace, connect as namespace_connect
from lance_namespace_urllib3_client.models import ( from lance_namespace_urllib3_client.models import (

View File

@@ -5,15 +5,20 @@
from datetime import timedelta from datetime import timedelta
import logging import logging
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
import sys
from typing import Any, Dict, Iterable, List, Optional, Union from typing import Any, Dict, Iterable, List, Optional, Union
from urllib.parse import urlparse from urllib.parse import urlparse
import warnings import warnings
if sys.version_info >= (3, 12):
from typing import override
else:
from overrides import override
# Remove this import to fix circular dependency # Remove this import to fix circular dependency
# from lancedb import connect_async # from lancedb import connect_async
from lancedb.remote import ClientConfig from lancedb.remote import ClientConfig
import pyarrow as pa import pyarrow as pa
from overrides import override
from ..common import DATA from ..common import DATA
from ..db import DBConnection, LOOP from ..db import DBConnection, LOOP

View File

@@ -114,7 +114,7 @@ class RemoteTable(Table):
index_type: Literal["BTREE", "BITMAP", "LABEL_LIST", "scalar"] = "scalar", index_type: Literal["BTREE", "BITMAP", "LABEL_LIST", "scalar"] = "scalar",
*, *,
replace: bool = False, replace: bool = False,
wait_timeout: timedelta = None, wait_timeout: Optional[timedelta] = None,
name: Optional[str] = None, name: Optional[str] = None,
): ):
"""Creates a scalar index """Creates a scalar index
@@ -153,7 +153,7 @@ class RemoteTable(Table):
column: str, column: str,
*, *,
replace: bool = False, replace: bool = False,
wait_timeout: timedelta = None, wait_timeout: Optional[timedelta] = None,
with_position: bool = False, with_position: bool = False,
# tokenizer configs: # tokenizer configs:
base_tokenizer: str = "simple", base_tokenizer: str = "simple",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb" name = "lancedb"
version = "0.22.2-beta.0" version = "0.22.2-beta.2"
edition.workspace = true edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications" description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true license.workspace = true

View File

@@ -52,13 +52,13 @@ pub fn infer_vector_columns(
for field in reader.schema().fields() { for field in reader.schema().fields() {
match field.data_type() { match field.data_type() {
DataType::FixedSizeList(sub_field, _) if sub_field.data_type().is_floating() => { DataType::FixedSizeList(sub_field, _) if sub_field.data_type().is_floating() => {
columns.push(field.name().to_string()); columns.push(field.name().clone());
} }
DataType::List(sub_field) if sub_field.data_type().is_floating() && !strict => { DataType::List(sub_field) if sub_field.data_type().is_floating() && !strict => {
columns_to_infer.insert(field.name().to_string(), None); columns_to_infer.insert(field.name().clone(), None);
} }
DataType::LargeList(sub_field) if sub_field.data_type().is_floating() && !strict => { DataType::LargeList(sub_field) if sub_field.data_type().is_floating() && !strict => {
columns_to_infer.insert(field.name().to_string(), None); columns_to_infer.insert(field.name().clone(), None);
} }
_ => {} _ => {}
} }

View File

@@ -718,9 +718,9 @@ impl Database for ListingDatabase {
.map_err(|e| Error::Lance { source: e })?; .map_err(|e| Error::Lance { source: e })?;
let version_ref = match (request.source_version, request.source_tag) { let version_ref = match (request.source_version, request.source_tag) {
(Some(v), None) => Ok(Ref::Version(v)), (Some(v), None) => Ok(Ref::Version(None, Some(v))),
(None, Some(tag)) => Ok(Ref::Tag(tag)), (None, Some(tag)) => Ok(Ref::Tag(tag)),
(None, None) => Ok(Ref::Version(source_dataset.version().version)), (None, None) => Ok(Ref::Version(None, Some(source_dataset.version().version))),
_ => Err(Error::InvalidInput { _ => Err(Error::InvalidInput {
message: "Cannot specify both source_version and source_tag".to_string(), message: "Cannot specify both source_version and source_tag".to_string(),
}), }),

View File

@@ -261,7 +261,7 @@ impl Database for LanceNamespaceDatabase {
return listing_db return listing_db
.open_table(OpenTableRequest { .open_table(OpenTableRequest {
name: request.name.clone(), name: request.name.clone(),
namespace: request.namespace.clone(), namespace: vec![],
index_cache_size: None, index_cache_size: None,
lance_read_params: None, lance_read_params: None,
}) })
@@ -305,7 +305,14 @@ impl Database for LanceNamespaceDatabase {
) )
.await?; .await?;
listing_db.create_table(request).await let create_request = DbCreateTableRequest {
name: request.name,
namespace: vec![],
data: request.data,
mode: request.mode,
write_options: request.write_options,
};
listing_db.create_table(create_request).await
} }
async fn open_table(&self, request: OpenTableRequest) -> Result<Arc<dyn BaseTable>> { async fn open_table(&self, request: OpenTableRequest) -> Result<Arc<dyn BaseTable>> {
@@ -332,7 +339,13 @@ impl Database for LanceNamespaceDatabase {
.create_listing_database(&request.name, &location, response.storage_options) .create_listing_database(&request.name, &location, response.storage_options)
.await?; .await?;
listing_db.open_table(request).await let open_request = OpenTableRequest {
name: request.name.clone(),
namespace: vec![],
index_cache_size: request.index_cache_size,
lance_read_params: request.lance_read_params,
};
listing_db.open_table(open_request).await
} }
async fn clone_table(&self, _request: CloneTableRequest) -> Result<Arc<dyn BaseTable>> { async fn clone_table(&self, _request: CloneTableRequest) -> Result<Arc<dyn BaseTable>> {

View File

@@ -647,7 +647,7 @@ impl From<StorageOptions> for RemoteOptions {
let mut filtered = HashMap::new(); let mut filtered = HashMap::new();
for opt in supported_opts { for opt in supported_opts {
if let Some(v) = options.0.get(opt) { if let Some(v) = options.0.get(opt) {
filtered.insert(opt.to_string(), v.to_string()); filtered.insert(opt.to_string(), v.clone());
} }
} }
Self::new(filtered) Self::new(filtered)

View File

@@ -1383,30 +1383,35 @@ impl Table {
} }
pub struct NativeTags { pub struct NativeTags {
inner: LanceTags, dataset: dataset::DatasetConsistencyWrapper,
} }
#[async_trait] #[async_trait]
impl Tags for NativeTags { impl Tags for NativeTags {
async fn list(&self) -> Result<HashMap<String, TagContents>> { async fn list(&self) -> Result<HashMap<String, TagContents>> {
Ok(self.inner.list().await?) let dataset = self.dataset.get().await?;
Ok(dataset.tags().list().await?)
} }
async fn get_version(&self, tag: &str) -> Result<u64> { async fn get_version(&self, tag: &str) -> Result<u64> {
Ok(self.inner.get_version(tag).await?) let dataset = self.dataset.get().await?;
Ok(dataset.tags().get_version(tag).await?)
} }
async fn create(&mut self, tag: &str, version: u64) -> Result<()> { async fn create(&mut self, tag: &str, version: u64) -> Result<()> {
self.inner.create(tag, version).await?; let dataset = self.dataset.get().await?;
dataset.tags().create(tag, version).await?;
Ok(()) Ok(())
} }
async fn delete(&mut self, tag: &str) -> Result<()> { async fn delete(&mut self, tag: &str) -> Result<()> {
self.inner.delete(tag).await?; let dataset = self.dataset.get().await?;
dataset.tags().delete(tag).await?;
Ok(()) Ok(())
} }
async fn update(&mut self, tag: &str, version: u64) -> Result<()> { async fn update(&mut self, tag: &str, version: u64) -> Result<()> {
self.inner.update(tag, version).await?; let dataset = self.dataset.get().await?;
dataset.tags().update(tag, version).await?;
Ok(()) Ok(())
} }
} }
@@ -1780,13 +1785,13 @@ impl NativeTable {
BuiltinIndexType::BTree, BuiltinIndexType::BTree,
))) )))
} else { } else {
return Err(Error::InvalidInput { Err(Error::InvalidInput {
message: format!( message: format!(
"there are no indices supported for the field `{}` with the data type {}", "there are no indices supported for the field `{}` with the data type {}",
field.name(), field.name(),
field.data_type() field.data_type()
), ),
}); })?
} }
} }
Index::BTree(_) => { Index::BTree(_) => {
@@ -2440,10 +2445,8 @@ impl BaseTable for NativeTable {
} }
async fn tags(&self) -> Result<Box<dyn Tags + '_>> { async fn tags(&self) -> Result<Box<dyn Tags + '_>> {
let dataset = self.dataset.get().await?;
Ok(Box::new(NativeTags { Ok(Box::new(NativeTags {
inner: dataset.tags.clone(), dataset: self.dataset.clone(),
})) }))
} }

View File

@@ -172,7 +172,7 @@ impl TableProvider for BaseTableAdapter {
if let Some(projection) = projection { if let Some(projection) = projection {
let field_names = projection let field_names = projection
.iter() .iter()
.map(|i| self.schema.field(*i).name().to_string()) .map(|i| self.schema.field(*i).name().clone())
.collect(); .collect();
query.select = Select::Columns(field_names); query.select = Select::Columns(field_names);
} }

View File

@@ -98,8 +98,9 @@ impl DatasetRef {
} }
Self::TimeTravel { dataset, version } => { Self::TimeTravel { dataset, version } => {
let should_checkout = match &target_ref { let should_checkout = match &target_ref {
refs::Ref::Version(target_ver) => version != target_ver, refs::Ref::Version(_, Some(target_ver)) => version != target_ver,
refs::Ref::Tag(_) => true, // Always checkout for tags refs::Ref::Version(_, None) => true, // No specific version, always checkout
refs::Ref::Tag(_) => true, // Always checkout for tags
}; };
if should_checkout { if should_checkout {

View File

@@ -174,7 +174,7 @@ pub(crate) fn default_vector_column(schema: &Schema, dim: Option<i32>) -> Result
), ),
}) })
} else { } else {
Ok(candidates[0].to_string()) Ok(candidates[0].clone())
} }
} }