LoomConnection (class)¶
-
class
loompy.loompy.
LoomConnection
(filename: str, mode: str = 'r+', *, validate: bool = True)[source]¶ A connection to a Loom file on disk. Typically LoomConnection objects are created using one of the functions on the loompy module, such as
loompy.connect()
orloompy.new()
. LoomConnection objects are context managers and should normally be wrapped in awith
block:import loompy with loompy.connect("mydata.loom") as ds: print(ds.ca.keys())
Inside the
with
block, you can access the dataset (here using the variableds
). When execution leaves thewith
block, the connection is automatically closed, freeing up resources.-
__init__
(filename: str, mode: str = 'r+', *, validate: bool = True) → None[source]¶ Establish a connection to a Loom file.
- Parameters
filename – Name of the .loom file to open
mode – read/write mode, accepts ‘r+’ (read/write) or ‘r’ (read-only), defaults to ‘r+’ without arguments, and to ‘r’ with incorrect arguments
validate – Validate that the file conforms with the Loom specification
- Returns
Nothing.
-
filename
= None¶ Path to the file (as given when the LoomConnection was created)
-
shape
= None¶ Shape of the dataset (n_rows, n_cols)
-
view
= None¶ 100]``
- Type
Create a view of the file by slicing this attribute, like ``ds.view[
- Type
100,
-
ra
= None¶ Row attributes, dict-like with np.ndarray values
-
ca
= None¶ Column attributes, dict-like with np.ndarray values
-
attrs
= None¶ Global attributes
-
row_graphs
= None¶ Row graphs, dict-like with values that are
scipy.sparse.coo_matrix
objects
-
col_graphs
= None¶ Column graphs, dict-like with values that are
scipy.sparse.coo_matrix
objects
-
property
mode
¶ The access mode of the connection (‘r’ or ‘r+’)
-
last_modified
() → str[source]¶ Return an ISO8601 timestamp indicating when the file was last modified
- Returns
An ISO8601 timestamp indicating when the file was last modified
- Remarks:
If the file has no timestamp, and mode is ‘r+’, a new timestamp is created and returned. Otherwise, the current time in UTC is returned
-
get_changes_since
(timestamp: str) → Dict[str, List][source]¶ Get a summary of the parts of the file that changed since the given time
- Parameters
timestamp – ISO8601 timestamp
- Returns
Dictionary like
{"row_graphs": rg, "col_graphs": cg, "row_attrs": ra, "col_attrs": ca, "layers": layers}
listing the names of objects that were modified since the given time- Return type
-
sparse
(rows: numpy.ndarray = None, cols: numpy.ndarray = None, layer: str = None) → scipy.sparse.coo.coo_matrix[source]¶ Return the main matrix or specified layer as a scipy.sparse.coo_matrix, without loading dense matrix in RAM
- Parameters
rows – Rows to include, or None to include all
cols – Columns to include, or None to include all
layer – Layer to return, or None to return the default layer
- Returns
Sparse matrix (
scipy.sparse.coo_matrix
)
-
close
(suppress_warning: bool = False) → None[source]¶ Close the connection. After this, the connection object becomes invalid. Warns user if called after closing.
- Parameters
suppress_warning – Suppresses warning message if True (defaults to false)
-
property
closed
¶ True if the connection is closed.
-
add_columns
(layers: Union[numpy.ndarray, Dict[str, numpy.ndarray], loompy.layer_manager.LayerManager], col_attrs: Dict[str, numpy.ndarray], *, row_attrs: Dict[str, numpy.ndarray] = None, fill_values: Dict[str, numpy.ndarray] = None) → None[source]¶ Add columns of data and attribute values to the dataset.
- Parameters
layers (dict or numpy.ndarray or LayerManager) – Either: 1) A N-by-M matrix of float32s (N rows, M columns) in this case columns are added at the default layer 2) A dict {layer_name : matrix} specified so that the matrix (N, M) will be added to layer layer_name 3) A LayerManager object (such as what is returned by view.layers)
col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
row_attrs (dict) – Optional row attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
fill_values – dictionary of values to use if a column attribute is missing, or “auto” to fill with zeros or empty strings
- Returns
Nothing.
Notes
This will modify the underlying HDF5 file, which will interfere with any concurrent readers.
Column attributes in the file that are NOT provided, will be deleted (unless fill value provided).
Array with Nan should not be provided
-
add_loom
(other_file: str, key: str = None, fill_values: Dict[str, numpy.ndarray] = None, batch_size: int = 1000, convert_attrs: bool = False, include_graphs: bool = False) → None[source]¶ Add the content of another loom file
- Parameters
other_file – filename of the loom file to append
key – Primary key to use to align rows in the other file with this file
fill_values – default values to use for missing attributes (or None to drop missing attrs, or ‘auto’ to fill with sensible defaults)
batch_size – the batch size used by batchscan (limits the number of rows/columns read in memory)
convert_attrs – convert file attributes that differ between files into column attributes
include_graphs – if true, include all the column graphs from other_file that are also present in this file
- Returns
Nothing, but adds the loom file. Note that the other loom file must have exactly the same number of rows, and must have exactly the same column attributes. Adds all the contents including layers but ignores layers in other_file that are not already present in self Note that graphs are normally not added, unless include_graphs == True, in which case column graphs are added
-
scan
(*, items: numpy.ndarray = None, axis: int = None, layers: Iterable = None, key: str = None, batch_size: int = 512, what: List[str] = ['col_attrs', 'row_attrs', 'layers', 'col_graphs', 'row_graphs']) → Iterable[Tuple[int, numpy.ndarray, loompy.loom_view.LoomView]][source]¶ Scan across one axis and return batches of rows (columns) as LoomView objects
- Parameters
items (np.ndarray) – the indexes [0, 2, 13, … ,973] of the rows/cols to include along the axis OR: boolean mask array giving the rows/cols to include
axis (int) – 0:rows or 1:cols
batch_size (int) – the chuncks returned at every element of the iterator
layers (iterable) – if specified it will batch scan only across some of the layers of the loom file if layers == None, all layers will be scanned if layers == [“”] or “”, only the default layer will be scanned
key – Name of primary key attribute. If specified, return the values sorted by the key
- Returns
Iterable that yields triplets of (ix, indexes, view) where
ix (int) – first position / how many rows/cols have been yielded alredy
indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e.
np.arange(len(ds.shape[axis]))
) this isix + selection
view (LoomView) – a view corresponding to the current chunk
-
map
(f_list: List[Callable[[numpy.ndarray], int]], *, axis: int = 0, chunksize: int = 1000, selection: numpy.ndarray = None) → List[numpy.ndarray][source]¶ Apply a function along an axis without loading the entire dataset in memory.
- Parameters
f – Function(s) that takes a numpy ndarray as argument
axis – Axis along which to apply the function (0 = rows, 1 = columns)
chunksize – Number of rows (columns) to load per chunk
selection – Columns (rows) to include
- Returns
numpy.ndarray result of function application The result is a list of numpy arrays, one per supplied function in f_list. This is more efficient than repeatedly calling map() one function at a time.
-
permute
(ordering: numpy.ndarray, axis: int) → None[source]¶ Permute the dataset along the indicated axis.
- Parameters
ordering (list of int) – The desired order along the axis
axis (int) – The axis along which to permute
- Returns
Nothing.
-
aggregate
(out_file: str = None, select: numpy.ndarray = None, group_by: Union[str, numpy.ndarray] = 'Clusters', aggr_by: str = 'mean', aggr_ca_by: Dict[str, str] = None) → numpy.ndarray[source]¶ Aggregate the Loom file by applying aggregation functions to the main matrix as well as to the column attributes
- Parameters
The name of the output Loom file (out_file) –
Bool array giving the columns to include (select) –
The column attribute to group by, or an np.ndarray of integer group labels (group_by) –
The aggregation function for the main matrix (aggr_by) –
A dictionary of aggregation functions for the column attributes (aggr_ca_by) –
- Returns
m Aggregated main matrix
- Remarks:
aggr_by gives the aggregation function for the main matrix aggr_ca_by is a dictionary with column attributes as keys and aggregation functionas as values
Aggregation functions can be any valid aggregation function from here: https://github.com/ml31415/numpy-groupies
- In addition, you can specify:
“tally” to count the number of occurences of each value of a categorical attribute
-
export
(out_file: str, layer: str = None, format: str = 'tab') → None[source]¶ Export the specified layer and row/col attributes as tab-delimited file.
- Parameters
out_file – Path to the output file
layer – Name of the layer to export, or None to export the main matrix
format – Desired file format (only ‘tab’ is supported)
-