fix: use Self return type annotation for abstract query builder (#2127)

Hello LanceDB team,

while developing using `lancedb` as a library I encountered a typing
problem affecting IDE hints and completions during development.

---

## Current Situation

Currently, the abstract base class `lancedb.query:LanceQueryBuilder`
uses method chaining to build up the search parameters, where the
methods have `LanceQueryBuilder` as a return type hint.

This leads to two issues:
1. Implementing subclasses of `LanceQueryBuilder` need to override
methods to modify the return type hint, even when they don't need to
change its implementation, just to ensure adequate IDE hints and
completions.
2. When using method chaining the first method directly inherited from
the abstract `LanceQueryBuilder` causes the inferred type to switch back
to `LanceQueryBuilder`. So even when the type starts from
`lancdb.table:LanceTable.search(query_type="vector", ...)` and therefor
correctly is inferred as `LanceVectorQueryBuilder`, after calling e.g.
`LanceVectorQueryBuilder.limit(...)` it is seen as the abstract
`LanceQueryBuilder` from that point on.

### Example of current situation


![image](https://github.com/user-attachments/assets/09678727-8722-43bd-a8a2-67d9b5fc0db5)

## Proposed changes

I propose to change the return type hints of the corresponding methods
(including classmethod `create()`) in the abstract base class
`LanceQueryBuilder` from `LanceQueryBuilder` to `Self`.
`Self` is already imported in the module:

```py
    if sys.version_info >= (3, 11):
        from typing import Self
    else:
        from typing_extensions import Self
```

### Further possible changes

Additionally, the implementing subclasses could also change the return
type hints to `Self` to potentially allow for further inheritance
easily.
> [!NOTE]
> **However this is not part of this pull request as of writing.**

### Example after proposed changes


![image](https://github.com/user-attachments/assets/a9aea636-e426-477a-86ee-2dad3af2876f)

---

Best regards
Martin
This commit is contained in:
Martin Schorfmann
2025-03-12 18:08:25 +01:00
committed by GitHub
parent 7747c9bcbf
commit dd22a379b2

View File

@@ -155,7 +155,7 @@ class LanceQueryBuilder(ABC):
ordering_field_name: Optional[str] = None, ordering_field_name: Optional[str] = None,
fts_columns: Union[str, List[str]] = [], fts_columns: Union[str, List[str]] = [],
fast_search: bool = False, fast_search: bool = False,
) -> LanceQueryBuilder: ) -> Self:
""" """
Create a query builder based on the given query and query type. Create a query builder based on the given query and query type.
@@ -361,7 +361,7 @@ class LanceQueryBuilder(ABC):
return pl.from_arrow(self.to_arrow()) return pl.from_arrow(self.to_arrow())
def limit(self, limit: Union[int, None]) -> LanceQueryBuilder: def limit(self, limit: Union[int, None]) -> Self:
"""Set the maximum number of results to return. """Set the maximum number of results to return.
Parameters Parameters
@@ -391,7 +391,7 @@ class LanceQueryBuilder(ABC):
self._limit = limit self._limit = limit
return self return self
def offset(self, offset: int) -> LanceQueryBuilder: def offset(self, offset: int) -> Self:
"""Set the offset for the results. """Set the offset for the results.
Parameters Parameters
@@ -410,7 +410,7 @@ class LanceQueryBuilder(ABC):
self._offset = offset self._offset = offset
return self return self
def select(self, columns: Union[list[str], dict[str, str]]) -> LanceQueryBuilder: def select(self, columns: Union[list[str], dict[str, str]]) -> Self:
"""Set the columns to return. """Set the columns to return.
Parameters Parameters
@@ -431,7 +431,7 @@ class LanceQueryBuilder(ABC):
raise ValueError("columns must be a list or a dictionary") raise ValueError("columns must be a list or a dictionary")
return self return self
def where(self, where: str, prefilter: bool = True) -> LanceQueryBuilder: def where(self, where: str, prefilter: bool = True) -> Self:
"""Set the where clause. """Set the where clause.
Parameters Parameters
@@ -455,7 +455,7 @@ class LanceQueryBuilder(ABC):
self._prefilter = prefilter self._prefilter = prefilter
return self return self
def with_row_id(self, with_row_id: bool) -> LanceQueryBuilder: def with_row_id(self, with_row_id: bool) -> Self:
"""Set whether to return row ids. """Set whether to return row ids.
Parameters Parameters
@@ -516,7 +516,7 @@ class LanceQueryBuilder(ABC):
offset=self._offset, offset=self._offset,
).explain_plan(verbose) ).explain_plan(verbose)
def vector(self, vector: Union[np.ndarray, list]) -> LanceQueryBuilder: def vector(self, vector: Union[np.ndarray, list]) -> Self:
"""Set the vector to search for. """Set the vector to search for.
Parameters Parameters
@@ -531,7 +531,7 @@ class LanceQueryBuilder(ABC):
""" """
raise NotImplementedError raise NotImplementedError
def text(self, text: str) -> LanceQueryBuilder: def text(self, text: str) -> Self:
"""Set the text to search for. """Set the text to search for.
Parameters Parameters
@@ -547,7 +547,7 @@ class LanceQueryBuilder(ABC):
raise NotImplementedError raise NotImplementedError
@abstractmethod @abstractmethod
def rerank(self, reranker: Reranker) -> LanceQueryBuilder: def rerank(self, reranker: Reranker) -> Self:
"""Rerank the results using the specified reranker. """Rerank the results using the specified reranker.
Parameters Parameters