feat: add drop_index() method (#2039)

Closes #1665
This commit is contained in:
Will Jones
2025-01-20 10:08:51 -08:00
committed by GitHub
parent 3dc1803c07
commit f059372137
18 changed files with 388 additions and 34 deletions

View File

@@ -40,37 +40,4 @@ The [quickstart](../basic.md) contains a more complete example.
## Development
```sh
npm run build
npm run test
```
### Running lint / format
LanceDb uses [biome](https://biomejs.dev/) for linting and formatting. if you are using VSCode you will need to install the official [Biome](https://marketplace.visualstudio.com/items?itemName=biomejs.biome) extension.
To manually lint your code you can run:
```sh
npm run lint
```
to automatically fix all fixable issues:
```sh
npm run lint-fix
```
If you do not have your workspace root set to the `nodejs` directory, unfortunately the extension will not work. You can still run the linting and formatting commands manually.
### Generating docs
```sh
npm run docs
cd ../docs
# Asssume the virtual environment was created
# python3 -m venv venv
# pip install -r requirements.txt
. ./venv/bin/activate
mkdocs build
```
See [CONTRIBUTING.md](_media/CONTRIBUTING.md) for information on how to contribute to LanceDB.

View File

@@ -0,0 +1,76 @@
# Contributing to LanceDB Typescript
This document outlines the process for contributing to LanceDB Typescript.
For general contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
## Project layout
The Typescript package is a wrapper around the Rust library, `lancedb`. We use
the [napi-rs](https://napi.rs/) library to create the bindings between Rust and
Typescript.
* `src/`: Rust bindings source code
* `lancedb/`: Typescript package source code
* `__test__/`: Unit tests
* `examples/`: An npm package with the examples shown in the documentation
## Development environment
To set up your development environment, you will need to install the following:
1. Node.js 14 or later
2. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
3. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
Initial setup:
```shell
npm install
```
### Commit Hooks
It is **highly recommended** to install the [pre-commit](https://pre-commit.com/) hooks to ensure that your
code is formatted correctly and passes basic checks before committing:
```shell
pre-commit install
```
## Development
Most common development commands can be run using the npm scripts.
Build the package
```shell
npm install
npm run build
```
Lint:
```shell
npm run lint
```
Format and fix lints:
```shell
npm run lint-fix
```
Run tests:
```shell
npm test
```
To run a single test:
```shell
# Single file: table.test.ts
npm test -- table.test.ts
# Single test: 'merge insert' in table.test.ts
npm test -- table.test.ts --testNamePattern=merge\ insert
```

View File

@@ -317,6 +317,32 @@ then call ``cleanup_files`` to remove the old files.
***
### dropIndex()
```ts
abstract dropIndex(name): Promise<void>
```
Drop an index from the table.
#### Parameters
* **name**: `string`
The name of the index.
#### Returns
`Promise`&lt;`void`&gt;
#### Note
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [Table#optimize](Table.md#optimize) after dropping the index.
Use [Table.listIndices](Table.md#listindices) to find the names of the indices.
***
### indexStats()
```ts
@@ -336,6 +362,8 @@ List all the stats of a specified index
The stats of the index. If the index does not exist, it will return undefined
Use [Table.listIndices](Table.md#listindices) to find the names of the indices.
***
### isOpen()

View File

@@ -128,6 +128,24 @@ whose data type is a fixed-size-list of floats.
***
### distanceRange()
```ts
distanceRange(lowerBound?, upperBound?): VectorQuery
```
#### Parameters
* **lowerBound?**: `number`
* **upperBound?**: `number`
#### Returns
[`VectorQuery`](VectorQuery.md)
***
### distanceType()
```ts
@@ -528,6 +546,22 @@ distance between the query vector and the actual uncompressed vector.
***
### rerank()
```ts
rerank(reranker): VectorQuery
```
#### Parameters
* **reranker**: [`Reranker`](../namespaces/rerankers/interfaces/Reranker.md)
#### Returns
[`VectorQuery`](VectorQuery.md)
***
### select()
```ts

View File

@@ -7,6 +7,7 @@
## Namespaces
- [embedding](namespaces/embedding/README.md)
- [rerankers](namespaces/rerankers/README.md)
## Enumerations

View File

@@ -68,6 +68,21 @@ The default value is 50.
***
### numBits?
```ts
optional numBits: number;
```
Number of bits per sub-vector.
This value controls how much each subvector is compressed. The more bits the more
accurate the index will be but the slower search. The default is 8 bits.
The number of bits must be 4 or 8.
***
### numPartitions?
```ts

View File

@@ -0,0 +1,17 @@
[**@lancedb/lancedb**](../../README.md) • **Docs**
***
[@lancedb/lancedb](../../globals.md) / rerankers
# rerankers
## Index
### Classes
- [RRFReranker](classes/RRFReranker.md)
### Interfaces
- [Reranker](interfaces/Reranker.md)

View File

@@ -0,0 +1,66 @@
[**@lancedb/lancedb**](../../../README.md) • **Docs**
***
[@lancedb/lancedb](../../../globals.md) / [rerankers](../README.md) / RRFReranker
# Class: RRFReranker
Reranks the results using the Reciprocal Rank Fusion (RRF) algorithm.
Internally this uses the Rust implementation
## Constructors
### new RRFReranker()
```ts
new RRFReranker(inner): RRFReranker
```
#### Parameters
* **inner**: `RrfReranker`
#### Returns
[`RRFReranker`](RRFReranker.md)
## Methods
### rerankHybrid()
```ts
rerankHybrid(
query,
vecResults,
ftsResults): Promise<RecordBatch<any>>
```
#### Parameters
* **query**: `string`
* **vecResults**: `RecordBatch`&lt;`any`&gt;
* **ftsResults**: `RecordBatch`&lt;`any`&gt;
#### Returns
`Promise`&lt;`RecordBatch`&lt;`any`&gt;&gt;
***
### create()
```ts
static create(k): Promise<RRFReranker>
```
#### Parameters
* **k**: `number` = `60`
#### Returns
`Promise`&lt;[`RRFReranker`](RRFReranker.md)&gt;

View File

@@ -0,0 +1,30 @@
[**@lancedb/lancedb**](../../../README.md) • **Docs**
***
[@lancedb/lancedb](../../../globals.md) / [rerankers](../README.md) / Reranker
# Interface: Reranker
## Methods
### rerankHybrid()
```ts
rerankHybrid(
query,
vecResults,
ftsResults): Promise<RecordBatch<any>>
```
#### Parameters
* **query**: `string`
* **vecResults**: `RecordBatch`&lt;`any`&gt;
* **ftsResults**: `RecordBatch`&lt;`any`&gt;
#### Returns
`Promise`&lt;`RecordBatch`&lt;`any`&gt;&gt;

View File

@@ -473,6 +473,10 @@ describe("When creating an index", () => {
// test offset
rst = await tbl.query().limit(2).offset(1).nearestTo(queryVec).toArrow();
expect(rst.numRows).toBe(1);
await tbl.dropIndex("vec_idx");
const indices2 = await tbl.listIndices();
expect(indices2.length).toBe(0);
});
it("should search with distance range", async () => {

View File

@@ -226,6 +226,19 @@ export abstract class Table {
column: string,
options?: Partial<IndexOptions>,
): Promise<void>;
/**
* Drop an index from the table.
*
* @param name The name of the index.
*
* @note This does not delete the index from disk, it just removes it from the table.
* To delete the index, run {@link Table#optimize} after dropping the index.
*
* Use {@link Table.listIndices} to find the names of the indices.
*/
abstract dropIndex(name: string): Promise<void>;
/**
* Create a {@link Query} Builder.
*
@@ -426,6 +439,8 @@ export abstract class Table {
*
* @param {string} name The name of the index.
* @returns {IndexStatistics | undefined} The stats of the index. If the index does not exist, it will return undefined
*
* Use {@link Table.listIndices} to find the names of the indices.
*/
abstract indexStats(name: string): Promise<IndexStatistics | undefined>;
@@ -591,6 +606,10 @@ export class LocalTable extends Table {
await this.inner.createIndex(nativeIndex, column, options?.replace);
}
async dropIndex(name: string): Promise<void> {
await this.inner.dropIndex(name);
}
query(): Query {
return new Query(this.inner);
}

View File

@@ -135,6 +135,14 @@ impl Table {
builder.execute().await.default_error()
}
#[napi(catch_unwind)]
pub async fn drop_index(&self, index_name: String) -> napi::Result<()> {
self.inner_ref()?
.drop_index(&index_name)
.await
.default_error()
}
#[napi(catch_unwind)]
pub async fn update(
&self,

View File

@@ -586,6 +586,26 @@ class Table(ABC):
"""
raise NotImplementedError
def drop_index(self, name: str) -> None:
"""
Drop an index from the table.
Parameters
----------
name: str
The name of the index to drop.
Notes
-----
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [optimize][lancedb.table.Table.optimize]
after dropping the index.
Use [list_indices][lancedb.table.Table.list_indices] to find the names of
the indices.
"""
raise NotImplementedError
@abstractmethod
def create_scalar_index(
self,
@@ -1594,6 +1614,9 @@ class LanceTable(Table):
)
)
def drop_index(self, name: str) -> None:
return LOOP.run(self._table.drop_index(name))
def create_scalar_index(
self,
column: str,
@@ -2716,6 +2739,26 @@ class AsyncTable:
add_note(e, help_msg)
raise e
async def drop_index(self, name: str) -> None:
"""
Drop an index from the table.
Parameters
----------
name: str
The name of the index to drop.
Notes
-----
This does not delete the index from disk, it just removes it from the table.
To delete the index, run [optimize][lancedb.table.AsyncTable.optimize]
after dropping the index.
Use [list_indices][lancedb.table.AsyncTable.list_indices] to find the names
of the indices.
"""
await self._inner.drop_index(name)
async def add(
self,
data: DATA,

View File

@@ -80,6 +80,10 @@ async def test_create_scalar_index(some_table: AsyncTable):
# can also specify index type
await some_table.create_index("id", config=BTree())
await some_table.drop_index("id_idx")
indices = await some_table.list_indices()
assert len(indices) == 0
@pytest.mark.asyncio
async def test_create_bitmap_index(some_table: AsyncTable):

View File

@@ -1008,6 +1008,10 @@ def test_create_scalar_index(mem_db: DBConnection):
results = table.search([5, 5]).where("x != 'b'").to_arrow()
assert results["_distance"][0].as_py() > 0
table.drop_index(scalar_index.name)
indices = table.list_indices()
assert len(indices) == 0
def test_empty_query(mem_db: DBConnection):
table = mem_db.create_table(

View File

@@ -194,6 +194,14 @@ impl Table {
})
}
pub fn drop_index(self_: PyRef<'_, Self>, index_name: String) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
inner.drop_index(&index_name).await.infer_error()?;
Ok(())
})
}
pub fn list_indices(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {

View File

@@ -816,6 +816,14 @@ impl<S: HttpSend> TableInternal for RemoteTable<S> {
Ok(Some(stats))
}
/// Not yet supported on LanceDB Cloud.
async fn drop_index(&self, _name: &str) -> Result<()> {
Err(Error::NotSupported {
message: "Drop index is not yet supported on LanceDB Cloud.".into(),
})
}
async fn table_definition(&self) -> Result<TableDefinition> {
Err(Error::NotSupported {
message: "table_definition is not supported on LanceDB cloud.".into(),

View File

@@ -410,6 +410,7 @@ pub(crate) trait TableInternal: std::fmt::Display + std::fmt::Debug + Send + Syn
async fn update(&self, update: UpdateBuilder) -> Result<u64>;
async fn create_index(&self, index: IndexBuilder) -> Result<()>;
async fn list_indices(&self) -> Result<Vec<IndexConfig>>;
async fn drop_index(&self, name: &str) -> Result<()>;
async fn index_stats(&self, index_name: &str) -> Result<Option<IndexStatistics>>;
async fn merge_insert(
&self,
@@ -984,6 +985,18 @@ impl Table {
self.inner.index_stats(index_name.as_ref()).await
}
/// Drop an index from the table.
///
/// Note: This is not yet available in LanceDB cloud.
///
/// This does not delete the index from disk, it just removes it from the table.
/// To delete the index, run [`Self::optimize()`] after dropping the index.
///
/// Use [`Self::list_indices()`] to find the names of the indices.
pub async fn drop_index(&self, name: &str) -> Result<()> {
self.inner.drop_index(name).await
}
// Take many execution plans and map them into a single plan that adds
// a query_index column and unions them.
pub(crate) fn multi_vector_plan(
@@ -1871,6 +1884,12 @@ impl TableInternal for NativeTable {
}
}
async fn drop_index(&self, index_name: &str) -> Result<()> {
let mut dataset = self.dataset.get_mut().await?;
dataset.drop_index(index_name).await?;
Ok(())
}
async fn update(&self, update: UpdateBuilder) -> Result<u64> {
let dataset = self.dataset.get().await?.clone();
let mut builder = LanceUpdateBuilder::new(Arc::new(dataset));
@@ -2897,6 +2916,9 @@ mod tests {
assert_eq!(stats.num_unindexed_rows, 0);
assert_eq!(stats.index_type, crate::index::IndexType::IvfPq);
assert_eq!(stats.distance_type, Some(crate::DistanceType::L2));
table.drop_index(index_name).await.unwrap();
assert_eq!(table.list_indices().await.unwrap().len(), 0);
}
#[tokio::test]