feat(python): expose prefilter to lancedb (#522)

We have experimental support for prefiltering (without ANN) in pylance.
This means that we can now apply a filter BEFORE vector search is
performed. This can be done via the `.where(filter_string,
prefilter=True)` kwargs of the query.

Limitations:
- When connecting to LanceDB cloud, `prefilter=True` will raise
NotImplemented
- When an ANN index is present, `prefilter=True` will raise
NotImplemented
- This option is not available for full text search query
- This option is not available for empty search query (just
filter/project)

Additional changes in this PR:
- Bump pylance version to v0.8.0 which supports the experimental
prefiltering.

---------

Co-authored-by: Chang She <chang@lancedb.com>
This commit is contained in:
Chang She
2023-10-01 10:34:12 -07:00
committed by GitHub
parent 343e274ea5
commit 693bca1eba
6 changed files with 68 additions and 9 deletions

View File

@@ -26,6 +26,7 @@ import numpy as np
import pyarrow as pa
from cachetools import cached
from pydantic import BaseModel, Field, PrivateAttr
from tqdm import tqdm
class EmbeddingFunctionRegistry:
@@ -514,7 +515,7 @@ class OpenClipEmbeddings(EmbeddingFunction):
executor.submit(self.generate_image_embedding, image)
for image in images
]
return [future.result() for future in futures]
return [future.result() for future in tqdm(futures)]
def generate_image_embedding(
self, image: Union[str, bytes, "PIL.Image.Image"]
@@ -557,7 +558,7 @@ class OpenClipEmbeddings(EmbeddingFunction):
"""
encode a single image tensor and optionally normalize the output
"""
image_features = self._model.encode_image(image_tensor)
image_features = self._model.encode_image(image_tensor.to(self.device))
if self.normalize:
image_features /= image_features.norm(dim=-1, keepdim=True)
return image_features.cpu().numpy().squeeze()