feat: add drop_index() method (#2039)

Closes #1665
This commit is contained in:
Will Jones
2025-01-20 10:08:51 -08:00
committed by GitHub
parent 3dc1803c07
commit f059372137
18 changed files with 388 additions and 34 deletions

View File

@@ -40,37 +40,4 @@ The [quickstart](../basic.md) contains a more complete example.
## Development ## Development
```sh See [CONTRIBUTING.md](_media/CONTRIBUTING.md) for information on how to contribute to LanceDB.
npm run build
npm run test
```
### Running lint / format
LanceDb uses [biome](https://biomejs.dev/) for linting and formatting. if you are using VSCode you will need to install the official [Biome](https://marketplace.visualstudio.com/items?itemName=biomejs.biome) extension.
To manually lint your code you can run:
```sh
npm run lint
```
to automatically fix all fixable issues:
```sh
npm run lint-fix
```
If you do not have your workspace root set to the `nodejs` directory, unfortunately the extension will not work. You can still run the linting and formatting commands manually.
### Generating docs
```sh
npm run docs
cd ../docs
# Asssume the virtual environment was created
# python3 -m venv venv
# pip install -r requirements.txt
. ./venv/bin/activate
mkdocs build
```

View File

@@ -0,0 +1,76 @@
# Contributing to LanceDB Typescript
This document outlines the process for contributing to LanceDB Typescript.
For general contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
## Project layout
The Typescript package is a wrapper around the Rust library, `lancedb`. We use
the [napi-rs](https://napi.rs/) library to create the bindings between Rust and
Typescript.
* `src/`: Rust bindings source code
* `lancedb/`: Typescript package source code
* `__test__/`: Unit tests
* `examples/`: An npm package with the examples shown in the documentation
## Development environment
To set up your development environment, you will need to install the following:
1. Node.js 14 or later
2. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
3. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
Initial setup:
```shell
npm install
```
### Commit Hooks
It is **highly recommended** to install the [pre-commit](https://pre-commit.com/) hooks to ensure that your
code is formatted correctly and passes basic checks before committing:
```shell
pre-commit install
```
## Development
Most common development commands can be run using the npm scripts.
Build the package
```shell
npm install
npm run build
```
Lint:
```shell
npm run lint
```
Format and fix lints:
```shell
npm run lint-fix
```
Run tests:
```shell
npm test
```
To run a single test:
```shell
# Single file: table.test.ts
npm test -- table.test.ts
# Single test: 'merge insert' in table.test.ts
npm test -- table.test.ts --testNamePattern=merge\ insert
```

View File

@@ -317,6 +317,32 @@ then call ``cleanup_files`` to remove the old files.
*** ***
### dropIndex()
```ts
abstract dropIndex(name): Promise<void>
```
Drop an index from the table.
#### Parameters
* **name**: `string`
The name of the index.
#### Returns
`Promise`&lt;`void`&gt;
#### Note
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [Table#optimize](Table.md#optimize) after dropping the index.
Use [Table.listIndices](Table.md#listindices) to find the names of the indices.
***
### indexStats() ### indexStats()
```ts ```ts
@@ -336,6 +362,8 @@ List all the stats of a specified index
The stats of the index. If the index does not exist, it will return undefined The stats of the index. If the index does not exist, it will return undefined
Use [Table.listIndices](Table.md#listindices) to find the names of the indices.
*** ***
### isOpen() ### isOpen()

View File

@@ -128,6 +128,24 @@ whose data type is a fixed-size-list of floats.
*** ***
### distanceRange()
```ts
distanceRange(lowerBound?, upperBound?): VectorQuery
```
#### Parameters
* **lowerBound?**: `number`
* **upperBound?**: `number`
#### Returns
[`VectorQuery`](VectorQuery.md)
***
### distanceType() ### distanceType()
```ts ```ts
@@ -528,6 +546,22 @@ distance between the query vector and the actual uncompressed vector.
*** ***
### rerank()
```ts
rerank(reranker): VectorQuery
```
#### Parameters
* **reranker**: [`Reranker`](../namespaces/rerankers/interfaces/Reranker.md)
#### Returns
[`VectorQuery`](VectorQuery.md)
***
### select() ### select()
```ts ```ts

View File

@@ -7,6 +7,7 @@
## Namespaces ## Namespaces
- [embedding](namespaces/embedding/README.md) - [embedding](namespaces/embedding/README.md)
- [rerankers](namespaces/rerankers/README.md)
## Enumerations ## Enumerations

View File

@@ -68,6 +68,21 @@ The default value is 50.
*** ***
### numBits?
```ts
optional numBits: number;
```
Number of bits per sub-vector.
This value controls how much each subvector is compressed. The more bits the more
accurate the index will be but the slower search. The default is 8 bits.
The number of bits must be 4 or 8.
***
### numPartitions? ### numPartitions?
```ts ```ts

View File

@@ -0,0 +1,17 @@
[**@lancedb/lancedb**](../../README.md) • **Docs**
***
[@lancedb/lancedb](../../globals.md) / rerankers
# rerankers
## Index
### Classes
- [RRFReranker](classes/RRFReranker.md)
### Interfaces
- [Reranker](interfaces/Reranker.md)

View File

@@ -0,0 +1,66 @@
[**@lancedb/lancedb**](../../../README.md) • **Docs**
***
[@lancedb/lancedb](../../../globals.md) / [rerankers](../README.md) / RRFReranker
# Class: RRFReranker
Reranks the results using the Reciprocal Rank Fusion (RRF) algorithm.
Internally this uses the Rust implementation
## Constructors
### new RRFReranker()
```ts
new RRFReranker(inner): RRFReranker
```
#### Parameters
* **inner**: `RrfReranker`
#### Returns
[`RRFReranker`](RRFReranker.md)
## Methods
### rerankHybrid()
```ts
rerankHybrid(
query,
vecResults,
ftsResults): Promise<RecordBatch<any>>
```
#### Parameters
* **query**: `string`
* **vecResults**: `RecordBatch`&lt;`any`&gt;
* **ftsResults**: `RecordBatch`&lt;`any`&gt;
#### Returns
`Promise`&lt;`RecordBatch`&lt;`any`&gt;&gt;
***
### create()
```ts
static create(k): Promise<RRFReranker>
```
#### Parameters
* **k**: `number` = `60`
#### Returns
`Promise`&lt;[`RRFReranker`](RRFReranker.md)&gt;

View File

@@ -0,0 +1,30 @@
[**@lancedb/lancedb**](../../../README.md) • **Docs**
***
[@lancedb/lancedb](../../../globals.md) / [rerankers](../README.md) / Reranker
# Interface: Reranker
## Methods
### rerankHybrid()
```ts
rerankHybrid(
query,
vecResults,
ftsResults): Promise<RecordBatch<any>>
```
#### Parameters
* **query**: `string`
* **vecResults**: `RecordBatch`&lt;`any`&gt;
* **ftsResults**: `RecordBatch`&lt;`any`&gt;
#### Returns
`Promise`&lt;`RecordBatch`&lt;`any`&gt;&gt;

View File

@@ -473,6 +473,10 @@ describe("When creating an index", () => {
// test offset // test offset
rst = await tbl.query().limit(2).offset(1).nearestTo(queryVec).toArrow(); rst = await tbl.query().limit(2).offset(1).nearestTo(queryVec).toArrow();
expect(rst.numRows).toBe(1); expect(rst.numRows).toBe(1);
await tbl.dropIndex("vec_idx");
const indices2 = await tbl.listIndices();
expect(indices2.length).toBe(0);
}); });
it("should search with distance range", async () => { it("should search with distance range", async () => {

View File

@@ -226,6 +226,19 @@ export abstract class Table {
column: string, column: string,
options?: Partial<IndexOptions>, options?: Partial<IndexOptions>,
): Promise<void>; ): Promise<void>;
/**
* Drop an index from the table.
*
* @param name The name of the index.
*
* @note This does not delete the index from disk, it just removes it from the table.
* To delete the index, run {@link Table#optimize} after dropping the index.
*
* Use {@link Table.listIndices} to find the names of the indices.
*/
abstract dropIndex(name: string): Promise<void>;
/** /**
* Create a {@link Query} Builder. * Create a {@link Query} Builder.
* *
@@ -426,6 +439,8 @@ export abstract class Table {
* *
* @param {string} name The name of the index. * @param {string} name The name of the index.
* @returns {IndexStatistics | undefined} The stats of the index. If the index does not exist, it will return undefined * @returns {IndexStatistics | undefined} The stats of the index. If the index does not exist, it will return undefined
*
* Use {@link Table.listIndices} to find the names of the indices.
*/ */
abstract indexStats(name: string): Promise<IndexStatistics | undefined>; abstract indexStats(name: string): Promise<IndexStatistics | undefined>;
@@ -591,6 +606,10 @@ export class LocalTable extends Table {
await this.inner.createIndex(nativeIndex, column, options?.replace); await this.inner.createIndex(nativeIndex, column, options?.replace);
} }
async dropIndex(name: string): Promise<void> {
await this.inner.dropIndex(name);
}
query(): Query { query(): Query {
return new Query(this.inner); return new Query(this.inner);
} }

View File

@@ -135,6 +135,14 @@ impl Table {
builder.execute().await.default_error() builder.execute().await.default_error()
} }
#[napi(catch_unwind)]
pub async fn drop_index(&self, index_name: String) -> napi::Result<()> {
self.inner_ref()?
.drop_index(&index_name)
.await
.default_error()
}
#[napi(catch_unwind)] #[napi(catch_unwind)]
pub async fn update( pub async fn update(
&self, &self,

View File

@@ -586,6 +586,26 @@ class Table(ABC):
""" """
raise NotImplementedError raise NotImplementedError
def drop_index(self, name: str) -> None:
"""
Drop an index from the table.
Parameters
----------
name: str
The name of the index to drop.
Notes
-----
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [optimize][lancedb.table.Table.optimize]
after dropping the index.
Use [list_indices][lancedb.table.Table.list_indices] to find the names of
the indices.
"""
raise NotImplementedError
@abstractmethod @abstractmethod
def create_scalar_index( def create_scalar_index(
self, self,
@@ -1594,6 +1614,9 @@ class LanceTable(Table):
) )
) )
def drop_index(self, name: str) -> None:
return LOOP.run(self._table.drop_index(name))
def create_scalar_index( def create_scalar_index(
self, self,
column: str, column: str,
@@ -2716,6 +2739,26 @@ class AsyncTable:
add_note(e, help_msg) add_note(e, help_msg)
raise e raise e
async def drop_index(self, name: str) -> None:
"""
Drop an index from the table.
Parameters
----------
name: str
The name of the index to drop.
Notes
-----
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [optimize][lancedb.table.AsyncTable.optimize]
after dropping the index.
Use [list_indices][lancedb.table.AsyncTable.list_indices] to find the names
of the indices.
"""
await self._inner.drop_index(name)
async def add( async def add(
self, self,
data: DATA, data: DATA,

View File

@@ -80,6 +80,10 @@ async def test_create_scalar_index(some_table: AsyncTable):
# can also specify index type # can also specify index type
await some_table.create_index("id", config=BTree()) await some_table.create_index("id", config=BTree())
await some_table.drop_index("id_idx")
indices = await some_table.list_indices()
assert len(indices) == 0
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_create_bitmap_index(some_table: AsyncTable): async def test_create_bitmap_index(some_table: AsyncTable):

View File

@@ -1008,6 +1008,10 @@ def test_create_scalar_index(mem_db: DBConnection):
results = table.search([5, 5]).where("x != 'b'").to_arrow() results = table.search([5, 5]).where("x != 'b'").to_arrow()
assert results["_distance"][0].as_py() > 0 assert results["_distance"][0].as_py() > 0
table.drop_index(scalar_index.name)
indices = table.list_indices()
assert len(indices) == 0
def test_empty_query(mem_db: DBConnection): def test_empty_query(mem_db: DBConnection):
table = mem_db.create_table( table = mem_db.create_table(

View File

@@ -194,6 +194,14 @@ impl Table {
}) })
} }
pub fn drop_index(self_: PyRef<'_, Self>, index_name: String) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
inner.drop_index(&index_name).await.infer_error()?;
Ok(())
})
}
pub fn list_indices(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> { pub fn list_indices(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone(); let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move { future_into_py(self_.py(), async move {

View File

@@ -816,6 +816,14 @@ impl<S: HttpSend> TableInternal for RemoteTable<S> {
Ok(Some(stats)) Ok(Some(stats))
} }
/// Not yet supported on LanceDB Cloud.
async fn drop_index(&self, _name: &str) -> Result<()> {
Err(Error::NotSupported {
message: "Drop index is not yet supported on LanceDB Cloud.".into(),
})
}
async fn table_definition(&self) -> Result<TableDefinition> { async fn table_definition(&self) -> Result<TableDefinition> {
Err(Error::NotSupported { Err(Error::NotSupported {
message: "table_definition is not supported on LanceDB cloud.".into(), message: "table_definition is not supported on LanceDB cloud.".into(),

View File

@@ -410,6 +410,7 @@ pub(crate) trait TableInternal: std::fmt::Display + std::fmt::Debug + Send + Syn
async fn update(&self, update: UpdateBuilder) -> Result<u64>; async fn update(&self, update: UpdateBuilder) -> Result<u64>;
async fn create_index(&self, index: IndexBuilder) -> Result<()>; async fn create_index(&self, index: IndexBuilder) -> Result<()>;
async fn list_indices(&self) -> Result<Vec<IndexConfig>>; async fn list_indices(&self) -> Result<Vec<IndexConfig>>;
async fn drop_index(&self, name: &str) -> Result<()>;
async fn index_stats(&self, index_name: &str) -> Result<Option<IndexStatistics>>; async fn index_stats(&self, index_name: &str) -> Result<Option<IndexStatistics>>;
async fn merge_insert( async fn merge_insert(
&self, &self,
@@ -984,6 +985,18 @@ impl Table {
self.inner.index_stats(index_name.as_ref()).await self.inner.index_stats(index_name.as_ref()).await
} }
/// Drop an index from the table.
///
/// Note: This is not yet available in LanceDB cloud.
///
/// This does not delete the index from disk, it just removes it from the table.
/// To delete the index, run [`Self::optimize()`] after dropping the index.
///
/// Use [`Self::list_indices()`] to find the names of the indices.
pub async fn drop_index(&self, name: &str) -> Result<()> {
self.inner.drop_index(name).await
}
// Take many execution plans and map them into a single plan that adds // Take many execution plans and map them into a single plan that adds
// a query_index column and unions them. // a query_index column and unions them.
pub(crate) fn multi_vector_plan( pub(crate) fn multi_vector_plan(
@@ -1871,6 +1884,12 @@ impl TableInternal for NativeTable {
} }
} }
async fn drop_index(&self, index_name: &str) -> Result<()> {
let mut dataset = self.dataset.get_mut().await?;
dataset.drop_index(index_name).await?;
Ok(())
}
async fn update(&self, update: UpdateBuilder) -> Result<u64> { async fn update(&self, update: UpdateBuilder) -> Result<u64> {
let dataset = self.dataset.get().await?.clone(); let dataset = self.dataset.get().await?.clone();
let mut builder = LanceUpdateBuilder::new(Arc::new(dataset)); let mut builder = LanceUpdateBuilder::new(Arc::new(dataset));
@@ -2897,6 +2916,9 @@ mod tests {
assert_eq!(stats.num_unindexed_rows, 0); assert_eq!(stats.num_unindexed_rows, 0);
assert_eq!(stats.index_type, crate::index::IndexType::IvfPq); assert_eq!(stats.index_type, crate::index::IndexType::IvfPq);
assert_eq!(stats.distance_type, Some(crate::DistanceType::L2)); assert_eq!(stats.distance_type, Some(crate::DistanceType::L2));
table.drop_index(index_name).await.unwrap();
assert_eq!(table.list_indices().await.unwrap().len(), 0);
} }
#[tokio::test] #[tokio::test]