Module shoji.filter
Using filters
Filters are expressions used to select tensor rows, columns, etc., for reading or writing to the database.
Filters can be applied to a workspace, dimension or tensor using the slicing expression []
.
Applying filters to tensors
Filtering on a tensor selects a subtensor by filtering along each dimension, returning a numpy.ndarray
:
vals = ws.scRNA.Age[ws.scRNA.Tissue == "Cortex"]
# Returns those rows of Age where Tissue equals "Cortex"
Filters applied to a tensor must match tensor dimensions. For example, if a tensor is defined on ("cells", "genes"), then the first filter expression must run along the "cells" dimension and the second filter expression along the "genes" dimension. If a filter expression is not given for a dimension, then all indices along that dimension are included in the result. If you want to omit dimensions in the middle, use ellipsis:
vals = ws.images.ImageStack[10:20, ..., 90:100]
# Returns indices 10 to 20 along dimension 1, all indices along dimension 2, and 90:100 along dimension 3.
Writing through a filter (view)
Assigning values (which must be a numpy.ndarray
of the right shape and dtype
) to a
filtered tensor causes the corresponding tensor elements in the database to be updated:
vals = np.array(...) # A numpy array of the right shape and dtype
ws.scRNA.Age[ws.scRNA.Tissue == "Cortex"] = vals
# Those rows of Age where Tissue equals "Cortex" are updated
Assigning values in this way is an atomic operation (it will either succeed or fail completely), and is subject to the size and time limits of shoji transactions.
Applying filters to dimensions
Filtering on a dimension selects rows of that dimension and returns a View
.
You can then read tensors from the view, or assign values to tensors through the view:
view = ws.scRNA.cells[:10]
# A view that includes the first ten rows along the 'cells' dimension
t = view.Tissue # Returns a numpy.ndarray of the first ten values of the Tissue tensor
a = view.Age # Returns a numpy.ndarray of the first ten values of the Age tensor
view.Age = vals # assign a np.ndarray of suitable shape and dtype
Assigning values in this way is an atomic operation (it will either succeed or fail completely), and is subject to the size and time limits of shoji transactions.
Applying filters to workspaces
Recall that workspaces may contain multiple dimensions. When filtering on a workspace,
the dimension that the filter applies to is inferred from the expression. For example,
if the filter expression is ws.Age > 10
and Age
is a tensor with dims=("cells",)
,
then the filter expression applies along the cells
dimension.
Filtering on a workspace selects rows of the inferred dimension and returns a View
.
You can then read tensors from the view, or assign values to tensors through the view as above.
However, when filtering on workspaces, you can also simultaneously filter on multiple dimensions, by providing two or more filter expressions separated by comma:
ws = db.scRNA
ws.cells = shoji.Dimension(shape=None)
ws.genes = shoji.Dimension(shape=31768)
ws.Age = shoji.Tensor("string", ("cells",))
ws.Chromosome = shoji.Tensor("string", ("genes",))
# Slice both dimensions:
view = ws.scRNA[ws.Age > 10, ws.Chromosome == "chr1"]
Creating views on workspaces like this is a powerful way to focus on a defined subset of the dataset.
To work with views, see the shoji.view
API reference.
Kinds of filters
Comparisons
You can compare a tensor to a constant:
ws = db.cancer_project
view = ws[ws.Age > 12] # Create a view of the workspace including only samples where Age > 12
Or compare a two tensors:
ws = db.cancer_project
view = ws[ws.Age == ws.OriginalAge] # Create a view of the workspace including only samples where Age == OriginalAge
Comparison operators ==
, !=
, >
, >=
, <
. <=
are supported.
Slices
You can use Python slices on dimensions and tensors (but not on workspaces):
ws = db.cancer_project
view = ws.samples[3:10] # Create a view along the 'samples' dimension including only rows 3 - 9 (zero-based)
Index arrays
You can use lists, tuples or np.ndarrays of integers to select rows on dimensions and tensors (but not on workspaces):
ws = db.cancer_project
view = ws.samples[(0, 1, 2, 3, 10, 20, 21)]
Boolean arrays
You can use lists, tuples or np.ndarrays of bools to select rows on dimensions and tensors (but not on workspaces):
ws = db.cancer_project
view = ws.samples[(True, False, False, True, False)]
Compound filters
You can combine filters using &
(and), |
(or), ~
(not), ^
(xor), and -
(set difference).
Note: The individual filter expressions must be surrounded by parentheses:
ws = db.cancer_project
view = ws[(ws.Age > 12) & (ws.SampleID < 10)]
The set difference operator -
returns all rows selected by the left-hand expression except
those selected by the right-hand expression.
Expand source code
"""
## Using filters
Filters are expressions used to select tensor rows, columns, etc., for reading or writing to the database.
Filters can be applied to a workspace, dimension or tensor using the slicing expression `[]`.
### Applying filters to tensors
Filtering on a tensor selects a subtensor by filtering along each dimension, returning a `numpy.ndarray`:
```python
vals = ws.scRNA.Age[ws.scRNA.Tissue == "Cortex"]
# Returns those rows of Age where Tissue equals "Cortex"
```
Filters applied to a tensor must match tensor dimensions. For example, if a tensor is defined
on ("cells", "genes"), then the first filter expression must run along the "cells" dimension
and the second filter expression along the "genes" dimension. If a filter expression is not
given for a dimension, then all indices along that dimension are included in the result. If you
want to omit dimensions in the middle, use ellipsis:
```python
vals = ws.images.ImageStack[10:20, ..., 90:100]
# Returns indices 10 to 20 along dimension 1, all indices along dimension 2, and 90:100 along dimension 3.
```
### Writing through a filter (view)
Assigning values (which must be a `numpy.ndarray` of the right shape and `dtype`) to a
filtered tensor causes the corresponding tensor elements in the database to be updated:
```python
vals = np.array(...) # A numpy array of the right shape and dtype
ws.scRNA.Age[ws.scRNA.Tissue == "Cortex"] = vals
# Those rows of Age where Tissue equals "Cortex" are updated
```
Assigning values in this way is an atomic operation (it will either succeed or fail
completely), and is subject to the [size and time limits](file:///Users/stelin/shoji/html/shoji/index.html#limitations) of shoji transactions.
### Applying filters to dimensions
Filtering on a dimension selects rows of that dimension and returns a `shoji.view.View`.
You can then read tensors from the view, or assign values to tensors through the view:
```python
view = ws.scRNA.cells[:10]
# A view that includes the first ten rows along the 'cells' dimension
t = view.Tissue # Returns a numpy.ndarray of the first ten values of the Tissue tensor
a = view.Age # Returns a numpy.ndarray of the first ten values of the Age tensor
view.Age = vals # assign a np.ndarray of suitable shape and dtype
```
Assigning values in this way is an atomic operation (it will either succeed or fail
completely), and is subject to the [size and time limits](file:///Users/stelin/shoji/html/shoji/index.html#limitations) of shoji transactions.
### Applying filters to workspaces
Recall that workspaces may contain multiple dimensions. When filtering on a workspace,
the dimension that the filter applies to is inferred from the expression. For example,
if the filter expression is `ws.Age > 10` and `Age` is a tensor with `dims=("cells",)`,
then the filter expression applies along the `cells` dimension.
Filtering on a workspace selects rows of the inferred dimension and returns a `shoji.view.View`.
You can then read tensors from the view, or assign values to tensors through the view as above.
However, when filtering on workspaces, you can also simultaneously filter on multiple dimensions,
by providing two or more filter expressions separated by comma:
```python
ws = db.scRNA
ws.cells = shoji.Dimension(shape=None)
ws.genes = shoji.Dimension(shape=31768)
ws.Age = shoji.Tensor("string", ("cells",))
ws.Chromosome = shoji.Tensor("string", ("genes",))
# Slice both dimensions:
view = ws.scRNA[ws.Age > 10, ws.Chromosome == "chr1"]
```
Creating views on workspaces like this is a powerful way to focus on a defined subset of the dataset.
To work with views, see the `shoji.view` API reference.
## Kinds of filters
### Comparisons
You can compare a tensor to a constant:
```python
ws = db.cancer_project
view = ws[ws.Age > 12] # Create a view of the workspace including only samples where Age > 12
```
Or compare a two tensors:
```python
ws = db.cancer_project
view = ws[ws.Age == ws.OriginalAge] # Create a view of the workspace including only samples where Age == OriginalAge
```
Comparison operators `==`, `!=`, `>`, `>=`, `<`. `<=` are supported.
### Slices
You can use Python slices on dimensions and tensors (but not on workspaces):
```python
ws = db.cancer_project
view = ws.samples[3:10] # Create a view along the 'samples' dimension including only rows 3 - 9 (zero-based)
```
### Index arrays
You can use lists, tuples or np.ndarrays of integers to select rows on dimensions and tensors (but not on workspaces):
```python
ws = db.cancer_project
view = ws.samples[(0, 1, 2, 3, 10, 20, 21)]
```
### Boolean arrays
You can use lists, tuples or np.ndarrays of bools to select rows on dimensions and tensors (but not on workspaces):
```python
ws = db.cancer_project
view = ws.samples[(True, False, False, True, False)]
```
### Compound filters
You can combine filters using `&` (and), `|` (or), `~` (not), `^` (xor), and `-` (set difference).
**Note:** The individual filter expressions must be surrounded by parentheses:
```python
ws = db.cancer_project
view = ws[(ws.Age > 12) & (ws.SampleID < 10)]
```
The set difference operator `-` returns all rows selected by the left-hand expression except
those selected by the right-hand expression.
"""
from typing import Union, Optional
import numpy as np
import shoji
import logging
class Filter:
def __init__(self) -> None:
self.dim: Union[str, int, None]
def _combine(self, operator: str, this: Union["Filter", "shoji.View"], other: Union["Filter", "shoji.View"]) -> "Filter":
def fixup(arg):
if isinstance(arg, Filter):
return arg
elif isinstance(this, shoji.View):
if len(this.filters) == 1:
for f in this.filters.values():
return f
else:
raise ValueError("Cannot use logical expression on compound view")
a = fixup(this)
b = fixup(other)
return shoji.CompoundFilter(operator, a, b)
def __and__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("&", self, other)
def __rand__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("&", other, self)
def __or__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("|", self, other)
def __ror__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("|", other, self)
def __sub__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("-", self, other)
def __rsub__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("-", other, self)
def __xor__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("^", self, other)
def __rxor__(self, other: Union["Filter", "shoji.View"]) -> "Filter":
return self._combine("^", other, self)
def __invert__(self) -> "Filter":
return shoji.CompoundFilter("~", self, None)
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
pass
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
pass
class CompoundFilter(Filter):
"""Filter that compares two filters"""
def __init__(self, operator: str, left_operand: Filter, right_operand: Optional[Filter]) -> None:
self.operator = operator
if operator not in ("~", "&", "|", "-", "^"):
raise SyntaxError(f"Invalid operator {operator}")
self.left_operand = left_operand
self.right_operand = right_operand
if left_operand.dim is not None:
self.dim = left_operand.dim
if (right_operand is not None) and (right_operand.dim is not None) and left_operand.dim != right_operand.dim:
raise SyntaxError("All tensors in an expression must have same first dimensions")
else:
self.dim = right_operand.dim if right_operand is not None else None
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.left_operand.get_all_rows(wsm))
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
if self.operator == "&":
assert isinstance(self.left_operand, Filter)
assert isinstance(self.right_operand, Filter)
return np.intersect1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm))
if self.operator == "|":
assert isinstance(self.left_operand, Filter)
assert isinstance(self.right_operand, Filter)
return np.union1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm))
if self.operator == "-":
assert isinstance(self.left_operand, Filter)
assert isinstance(self.right_operand, Filter)
return np.setdiff1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm))
if self.operator == "^":
assert isinstance(self.left_operand, Filter)
assert isinstance(self.right_operand, Filter)
return np.setxor1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm))
if self.operator == "~":
assert isinstance(self.left_operand, Filter)
return np.setdiff1d(self.left_operand.get_all_rows(wsm), self.left_operand.get_rows(wsm))
def __repr__(self) -> str:
if self.operator == "~":
return f"~{self.left_operand}"
else:
return f"({self.left_operand} {self.operator} {self.right_operand})"
class TensorFilter(Filter):
"""Filter that compares two tensors"""
def __init__(self, operator: str, left_operand: shoji.Tensor, right_operand: shoji.Tensor) -> None:
self.operator = operator
if operator not in (">", "<", ">=", "<=", "==", "!="):
raise SyntaxError(f"Invalid operator {operator}")
self.left_operand = left_operand
if left_operand.rank != 1:
raise SyntaxError(f"Only rank-1 tensors can be used in filters")
self.right_operand = right_operand
if right_operand.rank != 1:
raise SyntaxError(f"Only rank-1 tensors can be used in filters")
if isinstance(left_operand.dims[0], str):
self.dim = left_operand.dims[0]
if isinstance(right_operand.dims[0], str) and left_operand.dims[0] != right_operand.dims[0]:
raise SyntaxError("All tensors in an expression must have same first dimensions")
elif isinstance(right_operand.dims[0], str):
self.dim = right_operand.dims[0]
else:
self.dim = None
if left_operand.shape[0] != right_operand.shape[0]:
raise SyntaxError(f"Tensor first dimensions mismatch")
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.left_operand.shape[0]) # TODO: maybe read this from db instead, to avoid stale state
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
if self.operator == ">":
# Do the range lookups on the tensor indexes
raise NotImplementedError("Tensor-tensor comparisons not yet supported")
# etc
def __repr__(self) -> str:
return f"({self.left_operand} {self.operator} {self.right_operand})"
class ConstFilter(Filter):
"""Filter that compares a tensor to a constant"""
def __init__(self, operator: str, left_operand: shoji.Tensor, right_operand: Union[str, int, float, bool]) -> None:
self.operator = operator
if operator not in (">", "<", ">=", "<=", "==", "!="):
raise SyntaxError(f"Invalid operator {operator}")
self.left_operand = left_operand
if left_operand.rank != 1:
raise SyntaxError(f"Only rank-1 tensors can be used in filters")
self.dim = left_operand.dims[0]
if not isinstance(self.dim, str):
# TODO: relax this limitation by reading the whole tensor and filtering on the values (maybe?)
raise SyntaxError(f"Only tensors with named first dimension can be used in filters")
self.right_operand = right_operand
if type(right_operand) not in (str, int, float, bool):
raise SyntaxError(f"Only str, int, float and bool can be used as constants in filters")
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.left_operand.shape[0])
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
# TODO: this might cause concurrency problems when assigning to a filter expression
result = shoji.io.const_compare_non_transactional(wsm, self.left_operand.name, self.operator, self.right_operand)
return result
def __repr__(self) -> str:
return f"({self.left_operand.name} {self.operator} {self.right_operand})"
class DimensionSliceFilter(Filter):
def __init__(self, dim: shoji.Dimension, slice_: slice) -> None:
self.dim = dim.name
self.dimension = dim
self.slice_ = slice_
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.dimension.length)
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
s = self.slice_.indices(self.dimension.length)
return np.arange(s[0], s[1], s[2])
def __repr__(self) -> str:
s = self.slice_.indices(self.dimension.length)
return f"({self.dim}[{s[0]}:{s[1]}:{s[2]}])"
class DimensionIndicesFilter(Filter):
def __init__(self, dim: shoji.Dimension, indices: np.ndarray) -> None:
self.dim = dim.name
self.dimension = dim
self.indices = indices
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.dimension.length)
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.dimension.length
if not np.all(self.indices < self.dimension.length):
raise IndexError("Index out of range")
return self.indices[self.indices < self.dimension.length]
def __repr__(self) -> str:
return f"({self.dim}[{self.indices}])"
class DimensionBoolFilter(Filter):
def __init__(self, dim: shoji.Dimension, selected: np.ndarray) -> None:
self.dim = dim.name
self.dimension = dim
self.selected = selected
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
return np.arange(self.dimension.length)
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
if self.selected.shape[0] != self.dimension.length:
raise IndexError(f"Boolean array used for fancy indexing along '{self.dim}' has {self.selected.shape[0]} elements but dimension length is {self.dimension.length}")
return np.where(self.selected)[0]
def __repr__(self) -> str:
return f"({self.dim}[{self.selected}])"
class TensorSliceFilter(Filter):
def __init__(self, tensor: shoji.Tensor, slice_: slice, axis: int) -> None:
self.dim = tensor.dims[axis] if tensor.rank > 0 else None
self.tensor = tensor
self.slice_ = slice_
self.axis = axis
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
raise NotImplementedError()
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray:
if n_rows is None:
n_rows = self.tensor.shape[self.axis]
s = self.slice_.indices(n_rows)
return np.arange(s[0], s[1], s[2])
def __repr__(self) -> str:
s = self.slice_.indices(self.tensor.shape[self.axis])
return f"({self.tensor.name}[{s[0]}:{s[1]}:{s[2]}])"
class TensorIndicesFilter(Filter):
def __init__(self, tensor: shoji.Tensor, indices: np.ndarray, axis: int) -> None:
self.axis = axis
self.dim = tensor.dims[axis] if tensor.rank > 0 else None
self.tensor = tensor
self.indices = indices
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
raise NotImplementedError()
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray:
if n_rows is None:
n_rows = self.tensor.shape[self.axis]
self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.tensor.shape[self.axis]
if not np.all(self.indices < n_rows):
raise IndexError("Index out of range")
return self.indices[self.indices < n_rows]
def __repr__(self) -> str:
return f"({self.tensor.name}[{self.indices}])"
class TensorBoolFilter(Filter):
def __init__(self, tensor: shoji.Tensor, selected: np.ndarray, axis: int) -> None:
self.axis = axis
self.dim = tensor.dims[axis] if tensor.rank > 0 else None
self.tensor = tensor
self.selected = selected
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray:
raise NotImplementedError()
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray:
if n_rows is None:
n_rows = self.tensor.shape[self.axis]
if self.selected.shape[0] != n_rows:
raise IndexError(f"Boolean array used for fancy indexing along axis {self.axis} of '{self.tensor.name}' has {self.selected.shape[0]} elements but tensor length is {n_rows}")
return np.where(self.selected)[0]
def __repr__(self) -> str:
return f"({self.tensor.name}[{self.selected}])"
Classes
class CompoundFilter (operator: str, left_operand: Filter, right_operand: Union[Filter, NoneType])
-
Filter that compares two filters
Expand source code
class CompoundFilter(Filter): """Filter that compares two filters""" def __init__(self, operator: str, left_operand: Filter, right_operand: Optional[Filter]) -> None: self.operator = operator if operator not in ("~", "&", "|", "-", "^"): raise SyntaxError(f"Invalid operator {operator}") self.left_operand = left_operand self.right_operand = right_operand if left_operand.dim is not None: self.dim = left_operand.dim if (right_operand is not None) and (right_operand.dim is not None) and left_operand.dim != right_operand.dim: raise SyntaxError("All tensors in an expression must have same first dimensions") else: self.dim = right_operand.dim if right_operand is not None else None def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.get_all_rows(wsm)) def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.operator == "&": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.intersect1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "|": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.union1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "-": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.setdiff1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "^": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.setxor1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "~": assert isinstance(self.left_operand, Filter) return np.setdiff1d(self.left_operand.get_all_rows(wsm), self.left_operand.get_rows(wsm)) def __repr__(self) -> str: if self.operator == "~": return f"~{self.left_operand}" else: return f"({self.left_operand} {self.operator} {self.right_operand})"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.get_all_rows(wsm))
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.operator == "&": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.intersect1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "|": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.union1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "-": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.setdiff1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "^": assert isinstance(self.left_operand, Filter) assert isinstance(self.right_operand, Filter) return np.setxor1d(self.left_operand.get_rows(wsm), self.right_operand.get_rows(wsm)) if self.operator == "~": assert isinstance(self.left_operand, Filter) return np.setdiff1d(self.left_operand.get_all_rows(wsm), self.left_operand.get_rows(wsm))
class ConstFilter (operator: str, left_operand: Tensor, right_operand: Union[str, int, float, bool])
-
Filter that compares a tensor to a constant
Expand source code
class ConstFilter(Filter): """Filter that compares a tensor to a constant""" def __init__(self, operator: str, left_operand: shoji.Tensor, right_operand: Union[str, int, float, bool]) -> None: self.operator = operator if operator not in (">", "<", ">=", "<=", "==", "!="): raise SyntaxError(f"Invalid operator {operator}") self.left_operand = left_operand if left_operand.rank != 1: raise SyntaxError(f"Only rank-1 tensors can be used in filters") self.dim = left_operand.dims[0] if not isinstance(self.dim, str): # TODO: relax this limitation by reading the whole tensor and filtering on the values (maybe?) raise SyntaxError(f"Only tensors with named first dimension can be used in filters") self.right_operand = right_operand if type(right_operand) not in (str, int, float, bool): raise SyntaxError(f"Only str, int, float and bool can be used as constants in filters") def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.shape[0]) def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: # TODO: this might cause concurrency problems when assigning to a filter expression result = shoji.io.const_compare_non_transactional(wsm, self.left_operand.name, self.operator, self.right_operand) return result def __repr__(self) -> str: return f"({self.left_operand.name} {self.operator} {self.right_operand})"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.shape[0])
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: # TODO: this might cause concurrency problems when assigning to a filter expression result = shoji.io.const_compare_non_transactional(wsm, self.left_operand.name, self.operator, self.right_operand) return result
class DimensionBoolFilter (dim: Dimension, selected: numpy.ndarray)
-
Expand source code
class DimensionBoolFilter(Filter): def __init__(self, dim: shoji.Dimension, selected: np.ndarray) -> None: self.dim = dim.name self.dimension = dim self.selected = selected def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length) def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.selected.shape[0] != self.dimension.length: raise IndexError(f"Boolean array used for fancy indexing along '{self.dim}' has {self.selected.shape[0]} elements but dimension length is {self.dimension.length}") return np.where(self.selected)[0] def __repr__(self) -> str: return f"({self.dim}[{self.selected}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length)
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.selected.shape[0] != self.dimension.length: raise IndexError(f"Boolean array used for fancy indexing along '{self.dim}' has {self.selected.shape[0]} elements but dimension length is {self.dimension.length}") return np.where(self.selected)[0]
class DimensionIndicesFilter (dim: Dimension, indices: numpy.ndarray)
-
Expand source code
class DimensionIndicesFilter(Filter): def __init__(self, dim: shoji.Dimension, indices: np.ndarray) -> None: self.dim = dim.name self.dimension = dim self.indices = indices def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length) def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.dimension.length if not np.all(self.indices < self.dimension.length): raise IndexError("Index out of range") return self.indices[self.indices < self.dimension.length] def __repr__(self) -> str: return f"({self.dim}[{self.indices}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length)
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.dimension.length if not np.all(self.indices < self.dimension.length): raise IndexError("Index out of range") return self.indices[self.indices < self.dimension.length]
class DimensionSliceFilter (dim: Dimension, slice_: slice)
-
Expand source code
class DimensionSliceFilter(Filter): def __init__(self, dim: shoji.Dimension, slice_: slice) -> None: self.dim = dim.name self.dimension = dim self.slice_ = slice_ def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length) def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: s = self.slice_.indices(self.dimension.length) return np.arange(s[0], s[1], s[2]) def __repr__(self) -> str: s = self.slice_.indices(self.dimension.length) return f"({self.dim}[{s[0]}:{s[1]}:{s[2]}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.dimension.length)
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: s = self.slice_.indices(self.dimension.length) return np.arange(s[0], s[1], s[2])
class Filter
-
Expand source code
class Filter: def __init__(self) -> None: self.dim: Union[str, int, None] def _combine(self, operator: str, this: Union["Filter", "shoji.View"], other: Union["Filter", "shoji.View"]) -> "Filter": def fixup(arg): if isinstance(arg, Filter): return arg elif isinstance(this, shoji.View): if len(this.filters) == 1: for f in this.filters.values(): return f else: raise ValueError("Cannot use logical expression on compound view") a = fixup(this) b = fixup(other) return shoji.CompoundFilter(operator, a, b) def __and__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("&", self, other) def __rand__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("&", other, self) def __or__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("|", self, other) def __ror__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("|", other, self) def __sub__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("-", self, other) def __rsub__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("-", other, self) def __xor__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("^", self, other) def __rxor__(self, other: Union["Filter", "shoji.View"]) -> "Filter": return self._combine("^", other, self) def __invert__(self) -> "Filter": return shoji.CompoundFilter("~", self, None) def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: pass def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: pass
Subclasses
- CompoundFilter
- ConstFilter
- DimensionBoolFilter
- DimensionIndicesFilter
- DimensionSliceFilter
- TensorBoolFilter
- TensorFilter
- TensorIndicesFilter
- TensorSliceFilter
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: pass
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: pass
class TensorBoolFilter (tensor: Tensor, selected: numpy.ndarray, axis: int)
-
Expand source code
class TensorBoolFilter(Filter): def __init__(self, tensor: shoji.Tensor, selected: np.ndarray, axis: int) -> None: self.axis = axis self.dim = tensor.dims[axis] if tensor.rank > 0 else None self.tensor = tensor self.selected = selected def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError() def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] if self.selected.shape[0] != n_rows: raise IndexError(f"Boolean array used for fancy indexing along axis {self.axis} of '{self.tensor.name}' has {self.selected.shape[0]} elements but tensor length is {n_rows}") return np.where(self.selected)[0] def __repr__(self) -> str: return f"({self.tensor.name}[{self.selected}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError()
def get_rows(self, wsm: WorkspaceManager, n_rows: int = None) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] if self.selected.shape[0] != n_rows: raise IndexError(f"Boolean array used for fancy indexing along axis {self.axis} of '{self.tensor.name}' has {self.selected.shape[0]} elements but tensor length is {n_rows}") return np.where(self.selected)[0]
class TensorFilter (operator: str, left_operand: Tensor, right_operand: Tensor)
-
Filter that compares two tensors
Expand source code
class TensorFilter(Filter): """Filter that compares two tensors""" def __init__(self, operator: str, left_operand: shoji.Tensor, right_operand: shoji.Tensor) -> None: self.operator = operator if operator not in (">", "<", ">=", "<=", "==", "!="): raise SyntaxError(f"Invalid operator {operator}") self.left_operand = left_operand if left_operand.rank != 1: raise SyntaxError(f"Only rank-1 tensors can be used in filters") self.right_operand = right_operand if right_operand.rank != 1: raise SyntaxError(f"Only rank-1 tensors can be used in filters") if isinstance(left_operand.dims[0], str): self.dim = left_operand.dims[0] if isinstance(right_operand.dims[0], str) and left_operand.dims[0] != right_operand.dims[0]: raise SyntaxError("All tensors in an expression must have same first dimensions") elif isinstance(right_operand.dims[0], str): self.dim = right_operand.dims[0] else: self.dim = None if left_operand.shape[0] != right_operand.shape[0]: raise SyntaxError(f"Tensor first dimensions mismatch") def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.shape[0]) # TODO: maybe read this from db instead, to avoid stale state def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.operator == ">": # Do the range lookups on the tensor indexes raise NotImplementedError("Tensor-tensor comparisons not yet supported") # etc def __repr__(self) -> str: return f"({self.left_operand} {self.operator} {self.right_operand})"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: return np.arange(self.left_operand.shape[0]) # TODO: maybe read this from db instead, to avoid stale state
def get_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: if self.operator == ">": # Do the range lookups on the tensor indexes raise NotImplementedError("Tensor-tensor comparisons not yet supported") # etc
class TensorIndicesFilter (tensor: Tensor, indices: numpy.ndarray, axis: int)
-
Expand source code
class TensorIndicesFilter(Filter): def __init__(self, tensor: shoji.Tensor, indices: np.ndarray, axis: int) -> None: self.axis = axis self.dim = tensor.dims[axis] if tensor.rank > 0 else None self.tensor = tensor self.indices = indices def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError() def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.tensor.shape[self.axis] if not np.all(self.indices < n_rows): raise IndexError("Index out of range") return self.indices[self.indices < n_rows] def __repr__(self) -> str: return f"({self.tensor.name}[{self.indices}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError()
def get_rows(self, wsm: WorkspaceManager, n_rows: int = None) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] self.indices[self.indices < 0] = self.indices[self.indices < 0] + self.tensor.shape[self.axis] if not np.all(self.indices < n_rows): raise IndexError("Index out of range") return self.indices[self.indices < n_rows]
class TensorSliceFilter (tensor: Tensor, slice_: slice, axis: int)
-
Expand source code
class TensorSliceFilter(Filter): def __init__(self, tensor: shoji.Tensor, slice_: slice, axis: int) -> None: self.dim = tensor.dims[axis] if tensor.rank > 0 else None self.tensor = tensor self.slice_ = slice_ self.axis = axis def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError() def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] s = self.slice_.indices(n_rows) return np.arange(s[0], s[1], s[2]) def __repr__(self) -> str: s = self.slice_.indices(self.tensor.shape[self.axis]) return f"({self.tensor.name}[{s[0]}:{s[1]}:{s[2]}])"
Ancestors
Methods
def get_all_rows(self, wsm: WorkspaceManager) ‑> numpy.ndarray
-
Expand source code
def get_all_rows(self, wsm: shoji.WorkspaceManager) -> np.ndarray: raise NotImplementedError()
def get_rows(self, wsm: WorkspaceManager, n_rows: int = None) ‑> numpy.ndarray
-
Expand source code
def get_rows(self, wsm: shoji.WorkspaceManager, n_rows: int = None) -> np.ndarray: if n_rows is None: n_rows = self.tensor.shape[self.axis] s = self.slice_.indices(n_rows) return np.arange(s[0], s[1], s[2])