Compare commits

..

11 Commits

Author SHA1 Message Date
Lance Release
16a6b9ce8f Bump version: 0.14.1-beta.3 → 0.14.1-beta.4 2024-12-13 05:34:01 +00:00
Lance Release
e3c6213333 Bump version: 0.17.1-beta.3 → 0.17.1-beta.4 2024-12-13 05:33:34 +00:00
Weston Pace
00552439d9 feat: upgrade lance to 0.21.0b3 (#1936) 2024-12-12 21:32:59 -08:00
QianZhu
c0ee370f83 docs: improve schema evolution api examples (#1929) 2024-12-12 10:52:06 -08:00
QianZhu
17e4022045 docs: add faq to cloud doc (#1907)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-12 10:07:03 -08:00
BubbleCal
c3ebac1a92 feat(node): support FTS options in nodejs (#1934)
Closes #1790

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-12-12 08:19:04 -08:00
Lance Release
10f919a0a9 Updating package-lock.json 2024-12-11 19:18:36 +00:00
Lance Release
8af5476395 Bump version: 0.14.1-beta.2 → 0.14.1-beta.3 2024-12-11 19:18:17 +00:00
Lance Release
bcbbeb7a00 Bump version: 0.17.1-beta.2 → 0.17.1-beta.3 2024-12-11 19:17:54 +00:00
Weston Pace
d6c0f75078 feat: upgrade to lance prerelease 0.21.0b2 (#1933) 2024-12-11 11:17:10 -08:00
Lance Release
e820e356a0 Updating package-lock.json 2024-12-11 17:58:05 +00:00
29 changed files with 219 additions and 54 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.14.1-beta.2"
current_version = "0.14.1-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -21,16 +21,16 @@ categories = ["database-implementations"]
rust-version = "1.80.0" # TODO: lower this once we upgrade Lance again.
[workspace.dependencies]
lance = { "version" = "=0.20.0", "features" = [
lance = { "version" = "=0.21.0", "features" = [
"dynamodb",
] }
lance-io = "0.20.0"
lance-index = "0.20.0"
lance-linalg = "0.20.0"
lance-table = "0.20.0"
lance-testing = "0.20.0"
lance-datafusion = "0.20.0"
lance-encoding = "0.20.0"
], git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-io = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-index = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-linalg = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-table = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-testing = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-datafusion = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
lance-encoding = { version = "=0.21.0", git = "https://github.com/lancedb/lance.git", tag = "v0.21.0-beta.3" }
# Note that this one does not include pyarrow
arrow = { version = "53.2", optional = false }
arrow-array = "53.2"

View File

@@ -231,6 +231,7 @@ nav:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
- Quick start: basic.md
- Concepts:
@@ -357,6 +358,7 @@ nav:
- 🐍 Python: python/saas-python.md
- 👾 JavaScript: javascript/modules.md
- REST API: cloud/rest.md
- FAQs: cloud/cloud_faq.md
extra_css:
- styles/global.css

View File

@@ -0,0 +1,34 @@
This section provides answers to the most common questions asked about LanceDB Cloud. By following these guidelines, you can ensure a smooth, performant experience with LanceDB Cloud.
### Should I reuse the database connection?
Yes! It is recommended to establish a single database connection and maintain it throughout your interaction with the tables within.
LanceDB uses HTTP connections to communicate with the servers. By re-using the Connection object, you avoid the overhead of repeatedly establishing HTTP connections, significantly improving efficiency.
### Should I re-use the `Table` object?
`table = db.open_table()` should be called once and used for all subsequent table operations. If there are changes to the opened table, `table` always reflect the **latest version** of the data.
### What should I do if I need to search for rows by `id`?
LanceDB Cloud currently does not support an ID or primary key column. You are recommended to add a
user-defined ID column. To significantly improve the query performance with SQL causes, a scalar BITMAP/BTREE index should be created on this column.
### What are the vector indexing types supported by LanceDB Cloud?
We support `IVF_PQ` and `IVF_HNSW_SQ` as the `index_type` which is passed to `create_index`. LanceDB Cloud tunes the indexing parameters automatically to achieve the best tradeoff between query latency and query quality.
### When I add new rows to a table, do I need to manually update the index?
No! LanceDB Cloud triggers an asynchronous background job to index the new vectors.
Even though indexing is asynchronous, your vectors will still be immediately searchable. LanceDB uses brute-force search to search over unindexed rows. This makes you new data is immediately available, but does increase latency temporarily. To disable the brute-force part of search, set the `fast_search` flag in your query to `true`.
### Do I need to reindex the whole dataset if only a small portion of the data is deleted or updated?
No! Similar to adding data to the table, LanceDB Cloud triggers an asynchronous background job to update the existing indices. Therefore, no action is needed from users and there is absolutely no
downtime expected.
### How do I know whether an index has been created?
While index creation in LanceDB Cloud is generally fast, querying immediately after a `create_index` call may result in errors. It's recommended to use `list_indices` to verify index creation before querying.
### Why is my query latency higher than expected?
Multiple factors can impact query latency. To reduce query latency, consider the following:
- Send pre-warm queries: send a few queries to warm up the cache before an actual user query.
- Check network latency: LanceDB Cloud is hosted in AWS `us-east-1` region. It is recommended to run queries from an EC2 instance that is in the same region.
- Create scalar indices: If you are filtering on metadata, it is recommended to create scalar indices on those columns. This will speedup searches with metadata filtering. See [here](../guides/scalar_index.md) for more details on creating a scalar index.

View File

@@ -804,12 +804,13 @@ a table:
You can add new columns to the table with the `add_columns` method. New columns
are filled with values based on a SQL expression. For example, you can add a new
column `y` to the table and fill it with the value of `x + 1`.
column `y` to the table, fill it with the value of `x * 2` and set the expected
data type for it.
=== "Python"
```python
table.add_columns({"double_price": "price * 2"})
--8<-- "python/python/tests/docs/test_basic.py:add_columns"
```
**API Reference:** [lancedb.table.Table.add_columns][]
@@ -849,8 +850,7 @@ rewriting the column, which can be a heavy operation.
```python
import pyarrow as pa
table.alter_column({"path": "double_price", "rename": "dbl_price",
"data_type": pa.float32(), "nullable": False})
--8<-- "python/python/tests/docs/test_basic.py:alter_columns"
```
**API Reference:** [lancedb.table.Table.alter_columns][]
@@ -873,7 +873,7 @@ will remove the column from the schema.
=== "Python"
```python
table.drop_columns(["dbl_price"])
--8<-- "python/python/tests/docs/test_basic.py:drop_columns"
```
**API Reference:** [lancedb.table.Table.drop_columns][]

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.14.1-beta.2</version>
<version>0.14.1-beta.4</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.14.1-beta.2</version>
<version>0.14.1-beta.4</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

20
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.14.1-beta.1",
"version": "0.14.1-beta.3",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.14.1-beta.1",
"version": "0.14.1-beta.3",
"cpu": [
"x64",
"arm64"
@@ -52,14 +52,14 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.14.1-beta.1",
"@lancedb/vectordb-darwin-x64": "0.14.1-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.14.1-beta.1",
"@lancedb/vectordb-linux-arm64-musl": "0.14.1-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.14.1-beta.1",
"@lancedb/vectordb-linux-x64-musl": "0.14.1-beta.1",
"@lancedb/vectordb-win32-arm64-msvc": "0.14.1-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.14.1-beta.1"
"@lancedb/vectordb-darwin-arm64": "0.14.1-beta.3",
"@lancedb/vectordb-darwin-x64": "0.14.1-beta.3",
"@lancedb/vectordb-linux-arm64-gnu": "0.14.1-beta.3",
"@lancedb/vectordb-linux-arm64-musl": "0.14.1-beta.3",
"@lancedb/vectordb-linux-x64-gnu": "0.14.1-beta.3",
"@lancedb/vectordb-linux-x64-musl": "0.14.1-beta.3",
"@lancedb/vectordb-win32-arm64-msvc": "0.14.1-beta.3",
"@lancedb/vectordb-win32-x64-msvc": "0.14.1-beta.3"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -92,13 +92,13 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.14.1-beta.2",
"@lancedb/vectordb-darwin-arm64": "0.14.1-beta.2",
"@lancedb/vectordb-linux-x64-gnu": "0.14.1-beta.2",
"@lancedb/vectordb-linux-arm64-gnu": "0.14.1-beta.2",
"@lancedb/vectordb-linux-x64-musl": "0.14.1-beta.2",
"@lancedb/vectordb-linux-arm64-musl": "0.14.1-beta.2",
"@lancedb/vectordb-win32-x64-msvc": "0.14.1-beta.2",
"@lancedb/vectordb-win32-arm64-msvc": "0.14.1-beta.2"
"@lancedb/vectordb-darwin-x64": "0.14.1-beta.4",
"@lancedb/vectordb-darwin-arm64": "0.14.1-beta.4",
"@lancedb/vectordb-linux-x64-gnu": "0.14.1-beta.4",
"@lancedb/vectordb-linux-arm64-gnu": "0.14.1-beta.4",
"@lancedb/vectordb-linux-x64-musl": "0.14.1-beta.4",
"@lancedb/vectordb-linux-arm64-musl": "0.14.1-beta.4",
"@lancedb/vectordb-win32-x64-msvc": "0.14.1-beta.4",
"@lancedb/vectordb-win32-arm64-msvc": "0.14.1-beta.4"
}
}

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.14.1-beta.2"
version = "0.14.1-beta.4"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -1058,6 +1058,26 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(results[0].text).toBe(data[0].text);
});
test("full text search without lowercase", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: "hello world", vector: [0.1, 0.2, 0.3] },
{ text: "Hello World", vector: [0.4, 0.5, 0.6] },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts({ withPosition: false }),
});
const results = await table.search("hello").toArray();
expect(results.length).toBe(2);
await table.createIndex("text", {
config: Index.fts({ withPosition: false, lowercase: false }),
});
const results2 = await table.search("hello").toArray();
expect(results2.length).toBe(1);
});
test("full text search phrase query", async () => {
const db = await connect(tmpDir.name);
const data = [

View File

@@ -119,7 +119,9 @@ test("basic table examples", async () => {
{
// --8<-- [start:add_columns]
await tbl.addColumns([{ name: "double_price", valueSql: "price * 2" }]);
await tbl.addColumns([
{ name: "double_price", valueSql: "cast((price * 2) as Float)" },
]);
// --8<-- [end:add_columns]
// --8<-- [start:alter_columns]
await tbl.alterColumns([

View File

@@ -349,6 +349,52 @@ export interface FtsOptions {
* which will make the index smaller and faster to build, but will not support phrase queries.
*/
withPosition?: boolean;
/**
* The tokenizer to use when building the index.
* The default is "simple".
*
* The following tokenizers are available:
*
* "simple" - Simple tokenizer. This tokenizer splits the text into tokens using whitespace and punctuation as a delimiter.
*
* "whitespace" - Whitespace tokenizer. This tokenizer splits the text into tokens using whitespace as a delimiter.
*
* "raw" - Raw tokenizer. This tokenizer does not split the text into tokens and indexes the entire text as a single token.
*/
baseTokenizer?: "simple" | "whitespace" | "raw";
/**
* language for stemming and stop words
* this is only used when `stem` or `remove_stop_words` is true
*/
language?: string;
/**
* maximum token length
* tokens longer than this length will be ignored
*/
maxTokenLength?: number;
/**
* whether to lowercase tokens
*/
lowercase?: boolean;
/**
* whether to stem tokens
*/
stem?: boolean;
/**
* whether to remove stop words
*/
removeStopWords?: boolean;
/**
* whether to remove punctuation
*/
asciiFolding?: boolean;
}
export class Index {
@@ -450,7 +496,18 @@ export class Index {
* For now, the full text search index only supports English, and doesn't support phrase search.
*/
static fts(options?: Partial<FtsOptions>) {
return new Index(LanceDbIndex.fts(options?.withPosition));
return new Index(
LanceDbIndex.fts(
options?.withPosition,
options?.baseTokenizer,
options?.language,
options?.maxTokenLength,
options?.lowercase,
options?.stem,
options?.removeStopWords,
options?.asciiFolding,
),
);
}
/**

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.14.1-beta.2",
"version": "0.14.1-beta.4",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -96,11 +96,45 @@ impl Index {
}
#[napi(factory)]
pub fn fts(with_position: Option<bool>) -> Self {
#[allow(clippy::too_many_arguments)]
pub fn fts(
with_position: Option<bool>,
base_tokenizer: Option<String>,
language: Option<String>,
max_token_length: Option<u32>,
lower_case: Option<bool>,
stem: Option<bool>,
remove_stop_words: Option<bool>,
ascii_folding: Option<bool>,
) -> Self {
let mut opts = FtsIndexBuilder::default();
let mut tokenizer_configs = opts.tokenizer_configs.clone();
if let Some(with_position) = with_position {
opts = opts.with_position(with_position);
}
if let Some(base_tokenizer) = base_tokenizer {
tokenizer_configs = tokenizer_configs.base_tokenizer(base_tokenizer);
}
if let Some(language) = language {
tokenizer_configs = tokenizer_configs.language(&language).unwrap();
}
if let Some(max_token_length) = max_token_length {
tokenizer_configs = tokenizer_configs.max_token_length(Some(max_token_length as usize));
}
if let Some(lower_case) = lower_case {
tokenizer_configs = tokenizer_configs.lower_case(lower_case);
}
if let Some(stem) = stem {
tokenizer_configs = tokenizer_configs.stem(stem);
}
if let Some(remove_stop_words) = remove_stop_words {
tokenizer_configs = tokenizer_configs.remove_stop_words(remove_stop_words);
}
if let Some(ascii_folding) = ascii_folding {
tokenizer_configs = tokenizer_configs.ascii_folding(ascii_folding);
}
opts.tokenizer_configs = tokenizer_configs;
Self {
inner: Mutex::new(Some(LanceDbIndex::FTS(opts))),
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.17.1-beta.2"
current_version = "0.17.1-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.17.1-beta.2"
version = "0.17.1-beta.4"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -3,7 +3,7 @@ name = "lancedb"
# version in Cargo.toml
dependencies = [
"deprecation",
"pylance==0.20.0",
"pylance==0.21.0b3",
"tqdm>=4.27.0",
"pydantic>=1.10",
"packaging",

View File

@@ -75,6 +75,22 @@ def test_quickstart():
for _ in range(1000)
]
)
# --8<-- [start:add_columns]
tbl.add_columns({"double_price": "cast((price * 2) as float)"})
# --8<-- [end:add_columns]
# --8<-- [start:alter_columns]
tbl.alter_columns(
{
"path": "double_price",
"rename": "dbl_price",
"data_type": pa.float64(),
"nullable": True,
}
)
# --8<-- [end:alter_columns]
# --8<-- [start:drop_columns]
tbl.drop_columns(["dbl_price"])
# --8<-- [end:drop_columns]
# --8<-- [start:create_index]
# Synchronous client
tbl.create_index(num_sub_vectors=1)

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.14.1-beta.2"
version = "0.14.1-beta.4"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.14.1-beta.2"
version = "0.14.1-beta.4"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true