fix: sanitize foreign schemas (#1058)

Arrow-js uses brittle `instanceof` checks throughout the code base.
These fail unless the library instance that produced the object matches
exactly the same instance the vectordb is using. At a minimum, this
means that a user using arrow version 15 (or any version that doesn't
match exactly the version that vectordb is using) will get strange
errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly
identical, and the instanceof check still fails. One such example is
when using `vite` (e.g. https://github.com/vitejs/vite/issues/3910)

This PR solves the problem in a rather brute force, but workable,
fashion. If we encounter a schema that does not pass the `instanceof`
check then we will attempt to sanitize that schema by traversing the
object and, if it has all the correct properties, constructing an
appropriate `Schema` instance via deep cloning.
This commit is contained in:
Weston Pace
2024-03-04 13:06:36 -08:00
parent 785ecfa037
commit c60a193767
11 changed files with 1241 additions and 42 deletions

View File

@@ -22,6 +22,7 @@
"@types/tmp": "^0.2.6",
"@typescript-eslint/eslint-plugin": "^6.19.0",
"@typescript-eslint/parser": "^6.19.0",
"apache-arrow-old": "npm:apache-arrow@13.0.0",
"eslint": "^8.57.0",
"eslint-config-prettier": "^9.1.0",
"jest": "^29.7.0",
@@ -55,7 +56,7 @@
"build": "npm run build:debug && tsc -b",
"chkformat": "prettier . --check",
"docs": "typedoc --plugin typedoc-plugin-markdown lancedb/index.ts",
"lint": "eslint lancedb",
"lint": "eslint lancedb && eslint __test__",
"prepublishOnly": "napi prepublish -t npm",
"test": "npm run build && jest --verbose",
"universal": "napi universal",