Compare commits

..

34 Commits
v0.1 ... v0.1.2

Author SHA1 Message Date
Chang She
59014a01e0 bump version for v0.1.2 2023-05-05 11:27:09 -07:00
Chang She
47ae17ea05 Merge pull request #58 from lancedb/changhiskhan/parse-schema
Add method to get the URI scheme to support cloud storage
2023-05-04 14:36:45 -07:00
Chang She
b6739f3f66 windows paths 2023-05-04 11:41:05 -07:00
Chang She
3a2df0ce45 Add method to get the URI scheme to support cloud storage 2023-05-04 09:47:03 -07:00
Chang She
c0bc65cdfa Merge pull request #55 from lancedb/jaichopra/update-tagline
update tagline
2023-05-03 21:06:41 -07:00
Jai Chopra
298b81f0b0 update tagline 2023-05-03 19:55:10 -07:00
Jai
fe7a3ccd60 Merge pull request #53 from lancedb/jaichopra/update-major-features-readme
also update docs index
2023-05-03 07:51:54 -07:00
Jai Chopra
baf8d7c1a1 also update docs index 2023-05-03 07:50:44 -07:00
Chang She
2021e1bf6d Merge pull request #52 from lancedb/jaichopra/update-major-features-readme 2023-05-03 07:36:09 -07:00
Jai Chopra
2dbe71cf88 add new feature to readme.md 2023-05-03 07:30:46 -07:00
Lei Xu
afe19ade7f Merge pull request #49 from lancedb/lei/rust_core
Rust core directory
2023-04-27 10:40:21 -07:00
Lei Xu
118efdce73 add cargo metadata 2023-04-27 10:36:01 -07:00
Lei Xu
b0426387e7 initialize the rust core 2023-04-27 10:31:50 -07:00
Chang She
afa7fe19e6 bump version for v0.1.1 2023-04-26 16:55:25 -07:00
Chang She
66080d791b Merge pull request #46 from lancedb/changhiskhan/improve-index-docs 2023-04-25 21:13:51 -07:00
Chang She
5554fddd54 Merge branch 'main' into changhiskhan/improve-index-docs 2023-04-25 21:04:01 -07:00
Chang She
f06ea935fe Merge pull request #47 from lancedb/changhiskhan/expose-metric
Make distance metric configurable in LanceDB
2023-04-25 21:02:59 -07:00
Chang She
a8db7f56d2 tolerance 2023-04-25 20:08:18 -07:00
Chang She
7a375185a1 increment lance version to include cosine distance fix 2023-04-25 19:57:58 -07:00
Chang She
6592b4c13b document metric in create_index 2023-04-24 22:46:21 -07:00
Chang She
72a44eb927 specify metric during index creation 2023-04-24 22:45:37 -07:00
Chang She
b0e578c609 add documentation for metric 2023-04-24 22:42:30 -07:00
Chang She
89e6232aeb Make distance metric configurable during search 2023-04-24 22:40:40 -07:00
Chang She
44ea687984 Merge pull request #45 from lancedb/changhiskhan/notebook-fix
Minor notebook fix. Closes #40
2023-04-24 20:12:03 -07:00
Chang She
4f2dae8a0d Add more detailed docs for the ANN index and search features 2023-04-24 19:19:55 -07:00
Chang She
5e748e6e70 Minor notebook fix. Closes #40 2023-04-24 18:46:05 -07:00
Chang She
177192f852 Merge pull request #37 from lancedb/gsilvestrin/ratelimit_3.11
skipping embeddings rate limit when python version > 3.10
2023-04-22 21:03:18 -07:00
Lei Xu
1fb596942f Merge pull request #39 from wilhelmjung/patch-1
Update index.md
2023-04-22 20:32:36 -07:00
YangWeiliang_DeepNova@Deepexi
73d3cb78e6 Update index.md
Just a typo. Fixed.
2023-04-23 09:52:23 +08:00
gsilvestrin
a1583444ec add ann_index to main doc page 2023-04-20 16:07:25 -07:00
gsilvestrin
78e4f4d1a8 add ann_index to main doc page 2023-04-20 13:19:10 -07:00
gsilvestrin
b92eb988b6 add ann_index to main doc page 2023-04-20 11:51:42 -07:00
gsilvestrin
0cd092814d skipping rate limit when python version > 3.10 2023-04-20 10:28:14 -07:00
Jai
a6294925df Update README.md 2023-04-20 10:22:03 -07:00
19 changed files with 238 additions and 57 deletions

2
.gitignore vendored
View File

@@ -15,3 +15,5 @@ python/build
python/dist
notebooks/.ipynb_checkpoints
**/.hypothesis

View File

@@ -3,12 +3,12 @@
<img width="275" alt="LanceDB Logo" src="https://user-images.githubusercontent.com/917119/226205734-6063d87a-1ecc-45fe-85be-1dea6383a3d8.png">
**Serverless, low-latency vector database for AI applications**
**Developer-friendly, serverless vector database for AI applications**
<a href="https://lancedb.github.io/lancedb/">Documentation</a>
<a href="https://blog.eto.ai/">Blog</a>
<a href="https://discord.gg/zMM32dvNtd">Discord</a>
<a href="https://twitter.com/etodotai">Twitter</a>
<a href="https://twitter.com/lancedb">Twitter</a>
</p>
</div>
@@ -21,6 +21,10 @@ The key features of LanceDB include:
* Production-scale vector search with no servers to manage.
* Optimized for multi-modal data (text, images, videos, point clouds and more).
* Native Python and Javascript/Typescript support (coming soon).
* Combine attribute-based information with vectors and store them as a single source-of-truth.
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.

View File

@@ -15,6 +15,7 @@ nav:
- Home: index.md
- Basics: basic.md
- Embeddings: embedding.md
- Indexing: ann_indexes.md
- Integrations: integrations.md
- Python API: python.md

View File

@@ -1,12 +1,18 @@
# ANN (Approximate Nearest Neighbor) Indexes
You can create an index over your vector data to make search faster. Vector indexes are faster but less accurate than exhaustive search. LanceDB provides many parameters to fine-tune the index's size, the speed of queries, and the accuracy of results.
You can create an index over your vector data to make search faster.
Vector indexes are faster but less accurate than exhaustive search.
LanceDB provides many parameters to fine-tune the index's size, the speed of queries, and the accuracy of results.
Currently, LanceDB does not automatically create the ANN index. In the future we will look to improve this experience and automate index creation and configuration.
Currently, LanceDB does *not* automatically create the ANN index.
LanceDB has optimized code for KNN as well. For many use-cases, datasets under 100K vectors won't require index creation at all.
If you can live with <100ms latency, skipping index creation is a simpler workflow while guaranteeing 100% recall.
In the future we will look to automatically create and configure the ANN index.
## Creating an ANN Index
Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) function.
Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.
```python
import lancedb
@@ -28,11 +34,12 @@ tbl.create_index(num_partitions=256, num_sub_vectors=96)
Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
creation by providing the following parameters:
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
A higher number leads to faster queries, but it makes index generation slower.
- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support cosine distance.
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
A higher number leads to faster queries, but it makes index generation slower.
- **num_sub_vectors** (default: 96): The number of subvectors (M) that will be created during Product Quantization (PQ). A larger number makes
search more accurate, but also makes the index larger and slower to build.
search more accurate, but also makes the index larger and slower to build.
## Querying an ANN Index
@@ -41,15 +48,21 @@ Querying vector indexes is done via the [search](https://lancedb.github.io/lance
There are a couple of parameters that can be used to fine-tune the search:
- **limit** (default: 10): The amount of results that will be returned
- **nprobes** (default: 20): The number of probes used. A higher number makes search more accurate but also slower.
- **refine_factor** (default: None): Refine the results by reading extra elements and re-ranking them in memory. A higher number makes
search more accurate but also slower.
- **nprobes** (default: 20): The number of probes used. A higher number makes search more accurate but also slower.<br/>
Most of the time, setting nprobes to cover 5-10% of the dataset should achieve high recall with low latency.<br/>
e.g., for 1M vectors divided up into 256 partitions, nprobes should be set to ~20-40.<br/>
Note: nprobes is only applicable if an ANN index is present. If specified on a table without an ANN index, it is ignored.
- **refine_factor** (default: None): Refine the results by reading extra elements and re-ranking them in memory.<br/>
A higher number makes search more accurate but also slower. If you find the recall is less than idea, try refine_factor=10 to start.<br/>
e.g., for 1M vectors divided into 256 partitions, if you're looking for top 20, then refine_factor=200 reranks the whole partition.<br/>
Note: refine_factor is only applicable if an ANN index is present. If specified on a table without an ANN index, it is ignored.
```python
tbl.search(np.random.random((768))) \
.limit(2) \
.nprobes(20) \
.refine_factor(20) \
.refine_factor(10) \
.to_df()
vector item score
@@ -57,7 +70,9 @@ tbl.search(np.random.random((768))) \
1 [0.48587373, 0.269207, 0.15095535, 0.65531915,... item 3953 108.393867
```
The search will return the data requested in addition to the score of each item. The score is the distance between the query vector and the element. A lower number means that the result is more relevant.
The search will return the data requested in addition to the score of each item.
**Note:** The score is the distance between the query vector and the element. A lower number means that the result is more relevant.
### Filtering (where clause)

View File

@@ -1,11 +1,15 @@
# Welcome to LanceDB's Documentation
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrivial, filtering and management of embeddings.
The key features of LanceDB include:
* Production-scale vector search with no servers to manage.
* Optimized for multi-modal data (text, images, videos, point clouds and more).
* Native Python and Javascript/Typescript support (coming soon).
* Combine attribute-based information with vectors and store them as a single source-of-truth.
* Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
@@ -42,5 +46,6 @@ We will be adding completed demo apps built using LanceDB.
## Documentation Quick Links
* [`Basic Operations`](basic.md) - basic functionality of LanceDB.
* [`Embedding Functions`](embedding.md) - functions for working with embeddings.
* [`Indexing`](ann_indexes.md) - create vector indexes to speed up queries.
* [`Ecosystem Integrations`](integrations.md) - integrating LanceDB with python data tooling ecosystem.
* [`API Reference`](python.md) - detailed documentation for the LanceDB Python SDK.

View File

@@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "42bf01fb",
"metadata": {},
@@ -22,10 +21,10 @@
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.0.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.0.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
]
}
@@ -88,7 +87,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5ac2b6a3",
"metadata": {},
@@ -231,7 +229,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2106b5bb",
"metadata": {},
@@ -251,7 +248,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "39f3161f3ef54a129cd65fb296332b54",
"model_id": "c6f1c76d9567421d88911923388d2530",
"version_major": 2,
"version_minor": 0
},
@@ -574,7 +571,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "559a095b",
"metadata": {},
@@ -631,7 +627,7 @@
" <iframe\n",
" width=\"400\"\n",
" height=\"300\"\n",
" src=\"https://www.youtube.com/embed/pNvujJ1XyeQ?start=289.76\"\n",
" src=\"https://www.youtube.com/embed/pNvujJ1XyeQ?start=289\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
@@ -639,7 +635,7 @@
" "
],
"text/plain": [
"<IPython.lib.display.YouTubeVideo at 0x177fde4d0>"
"<IPython.lib.display.YouTubeVideo at 0x13ec062c0>"
]
},
"execution_count": 15,
@@ -651,7 +647,7 @@
"from IPython.display import YouTubeVideo\n",
"\n",
"top_match = context.iloc[0]\n",
"YouTubeVideo(top_match[\"url\"].split(\"/\")[-1], start=top_match[\"start\"])"
"YouTubeVideo(top_match[\"url\"].split(\"/\")[-1], start=int(top_match[\"start\"]))"
]
},
{

View File

@@ -11,7 +11,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .db import LanceDBConnection, URI
from .db import URI, LanceDBConnection
def connect(uri: URI) -> LanceDBConnection:

View File

@@ -11,7 +11,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from pathlib import Path
from typing import Union, List
from typing import List, Union
import numpy as np
import pandas as pd

View File

@@ -14,10 +14,12 @@
from __future__ import annotations
from pathlib import Path
import pyarrow as pa
from .common import URI, DATA
from .common import DATA, URI
from .table import LanceTable
from .util import get_uri_scheme
class LanceDBConnection:
@@ -26,10 +28,12 @@ class LanceDBConnection:
"""
def __init__(self, uri: URI):
if isinstance(uri, str):
uri = Path(uri)
uri = uri.expanduser().absolute()
Path(uri).mkdir(parents=True, exist_ok=True)
is_local = isinstance(uri, Path) or get_uri_scheme(uri) == "file"
if is_local:
if isinstance(uri, str):
uri = Path(uri)
uri = uri.expanduser().absolute()
Path(uri).mkdir(parents=True, exist_ok=True)
self._uri = str(uri)
@property
@@ -43,7 +47,11 @@ class LanceDBConnection:
-------
A list of table names.
"""
return [p.stem for p in Path(self.uri).glob("*.lance")]
if get_uri_scheme(self.uri) == "file":
return [p.stem for p in Path(self.uri).glob("*.lance")]
raise NotImplementedError(
"List table_names is only supported for local filesystem for now"
)
def __len__(self) -> int:
return len(self.table_names())

View File

@@ -12,13 +12,14 @@
# limitations under the License.
import math
from retry import retry
import sys
from typing import Callable, Union
from lance.vector import vec_to_table
import numpy as np
import pandas as pd
import pyarrow as pa
from lance.vector import vec_to_table
from retry import retry
def with_embeddings(
@@ -64,13 +65,19 @@ class EmbeddingFunction:
return self.func(c.tolist())
if len(self.rate_limiter_kwargs) > 0:
import ratelimiter
v = int(sys.version_info.minor)
if v >= 11:
print(
"WARNING: rate limit only support up to 3.10, proceeding without rate limiter"
)
else:
import ratelimiter
max_calls = self.rate_limiter_kwargs["max_calls"]
limiter = ratelimiter.RateLimiter(
max_calls, period=self.rate_limiter_kwargs["period"]
)
embed_func = limiter(embed_func)
max_calls = self.rate_limiter_kwargs["max_calls"]
limiter = ratelimiter.RateLimiter(
max_calls, period=self.rate_limiter_kwargs["period"]
)
embed_func = limiter(embed_func)
batches = self.to_batches(text)
embeds = [emb for c in batches for emb in embed_func(c)]
return embeds
@@ -79,11 +86,6 @@ class EmbeddingFunction:
return f"EmbeddingFunction(func={self.func})"
def rate_limit(self, max_calls=0.9, period=1.0):
import sys
v = int(sys.version_info.minor)
if v >= 11:
raise ValueError("rate limit only support up to 3.10")
self.rate_limiter_kwargs = dict(max_calls=max_calls, period=period)
return self

View File

@@ -24,6 +24,7 @@ class LanceQueryBuilder:
"""
def __init__(self, table: "lancedb.table.LanceTable", query: np.ndarray):
self._metric = "L2"
self._nprobes = 20
self._refine_factor = None
self._table = table
@@ -77,6 +78,21 @@ class LanceQueryBuilder:
self._where = where
return self
def metric(self, metric: str) -> LanceQueryBuilder:
"""Set the distance metric to use.
Parameters
----------
metric: str
The distance metric to use. By default "l2" is used.
Returns
-------
The LanceQueryBuilder object.
"""
self._metric = metric
return self
def nprobes(self, nprobes: int) -> LanceQueryBuilder:
"""Set the number of probes to use.
@@ -108,7 +124,12 @@ class LanceQueryBuilder:
return self
def to_df(self) -> pd.DataFrame:
"""Execute the query and return the results as a pandas DataFrame."""
"""
Execute the query and return the results as a pandas DataFrame.
In addition to the selected columns, LanceDB also returns a vector
and also the "score" column which is the distance between the query
vector and the returned vector.
"""
ds = self._table.to_lance()
# TODO indexed search
tbl = ds.to_table(
@@ -118,6 +139,7 @@ class LanceQueryBuilder:
"column": VECTOR_COLUMN_NAME,
"q": self._query,
"k": self._limit,
"metric": self._metric,
"nprobes": self._nprobes,
"refine_factor": self._refine_factor,
},

View File

@@ -19,12 +19,12 @@ from functools import cached_property
import lance
import numpy as np
import pandas as pd
from lance import LanceDataset
import pyarrow as pa
from lance import LanceDataset
from lance.vector import vec_to_table
from .common import DATA, VEC, VECTOR_COLUMN_NAME
from .query import LanceQueryBuilder
from .common import DATA, VECTOR_COLUMN_NAME, VEC
def _sanitize_data(data, schema):
@@ -106,11 +106,14 @@ class LanceTable:
def _dataset_uri(self) -> str:
return os.path.join(self._conn.uri, f"{self.name}.lance")
def create_index(self, num_partitions=256, num_sub_vectors=96):
def create_index(self, metric="L2", num_partitions=256, num_sub_vectors=96):
"""Create an index on the table.
Parameters
----------
metric: str, default "L2"
The distance metric to use when creating the index. Valid values are "L2" or "cosine".
L2 is euclidean distance.
num_partitions: int
The number of IVF partitions to use when creating the index.
Default is 256.
@@ -121,6 +124,7 @@ class LanceTable:
self._dataset.create_index(
column=VECTOR_COLUMN_NAME,
index_type="IVF_PQ",
metric=metric,
num_partitions=num_partitions,
num_sub_vectors=num_sub_vectors,
)
@@ -166,6 +170,9 @@ class LanceTable:
Returns
-------
A LanceQueryBuilder object representing the query.
Once executed, the query returns selected columns, the vector,
and also the "score" column which is the distance between the query
vector and the returned vector.
"""
if isinstance(query, list):
query = np.array(query)

43
python/lancedb/util.py Normal file
View File

@@ -0,0 +1,43 @@
# Copyright 2023 LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from urllib.parse import ParseResult, urlparse
from pyarrow import fs
def get_uri_scheme(uri: str) -> str:
"""
Get the scheme of a URI. If the URI does not have a scheme, assume it is a file URI.
Parameters
----------
uri : str
The URI to parse.
Returns
-------
str: The scheme of the URI.
"""
parsed = urlparse(uri)
scheme = parsed.scheme
if not scheme:
scheme = "file"
elif scheme in ["s3a", "s3n"]:
scheme = "s3"
elif len(scheme) == 1:
# Windows drive names are parsed as the scheme
# e.g. "c:\path" -> ParseResult(scheme="c", netloc="", path="/path", ...)
# So we add special handling here for schemes that are a single character
scheme = "file"
return scheme

View File

@@ -1,10 +1,10 @@
[project]
name = "lancedb"
version = "0.1"
dependencies = ["pylance>=0.4.3", "ratelimiter", "retry", "tqdm"]
version = "0.1.2"
dependencies = ["pylance>=0.4.6", "ratelimiter", "retry", "tqdm"]
description = "lancedb"
authors = [
{ name = "Lance Devs", email = "dev@eto.ai" },
{ name = "LanceDB Devs", email = "dev@lancedb.com" },
]
license = { file = "LICENSE" }
readme = "README.md"

View File

@@ -11,10 +11,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import lancedb
import pandas as pd
import pytest
import lancedb
def test_basic(tmp_path):
db = lancedb.connect(tmp_path)

View File

@@ -12,13 +12,14 @@
# limitations under the License.
import lance
from lancedb.query import LanceQueryBuilder
import numpy as np
import pandas as pd
import pandas.testing as tm
import pyarrow as pa
import pytest
from lancedb.query import LanceQueryBuilder
class MockTable:
def __init__(self, tmp_path):
@@ -60,3 +61,21 @@ def test_query_builder_with_filter(table):
df = LanceQueryBuilder(table, [0, 0]).where("id = 2").to_df()
assert df["id"].values[0] == 2
assert all(df["vector"].values[0] == [3, 4])
def test_query_builder_with_metric(table):
query = [4, 8]
df_default = LanceQueryBuilder(table, query).to_df()
df_l2 = LanceQueryBuilder(table, query).metric("l2").to_df()
tm.assert_frame_equal(df_default, df_l2)
df_cosine = LanceQueryBuilder(table, query).metric("cosine").limit(1).to_df()
assert df_cosine.score[0] == pytest.approx(
cosine_distance(query, df_cosine.vector[0]),
abs=1e-6,
)
assert 0 <= df_cosine.score[0] <= 1
def cosine_distance(vec1, vec2):
return 1 - np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

30
python/tests/test_util.py Normal file
View File

@@ -0,0 +1,30 @@
# Copyright 2023 LanceDB Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from lancedb.util import get_uri_scheme
def test_normalize_uri():
uris = [
"relative/path",
"/absolute/path",
"file:///absolute/path",
"s3://bucket/path",
"gs://bucket/path",
"c:\\windows\\path",
]
schemes = ["file", "file", "file", "s3", "gs", "file"]
for uri, expected_scheme in zip(uris, schemes):
parsed_scheme = get_uri_scheme(uri)
assert parsed_scheme == expected_scheme

12
rust/Cargo.toml Normal file
View File

@@ -0,0 +1,12 @@
[package]
name = "vectordb"
version = "0.0.1"
edition = "2021"
description = "Serverless, low-latency vector database for AI applications"
license = "Apache-2.0"
repository = "https://github.com/lancedb/lancedb"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
lance = "0.4.3"

14
rust/src/lib.rs Normal file
View File

@@ -0,0 +1,14 @@
pub fn add(left: usize, right: usize) -> usize {
left + right
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let result = add(2, 2);
assert_eq!(result, 4);
}
}