LoomConnection (class)¶

class loompy.loompy.LoomConnection(filename: str, mode: str = 'r+', *, validate: bool = True)[source]¶

A connection to a Loom file on disk. Typically LoomConnection objects are created using one of the functions on the loompy module, such as loompy.connect() or loompy.new(). LoomConnection objects are context managers and should normally be wrapped in a with block:

import loompy
with loompy.connect("mydata.loom") as ds:
        print(ds.ca.keys())

Inside the with block, you can access the dataset (here using the variable ds). When execution leaves the with block, the connection is automatically closed, freeing up resources.

__init__(filename: str, mode: str = 'r+', *, validate: bool = True) → None[source]¶

Establish a connection to a Loom file.

Parameters

filename – Name of the .loom file to open
mode – read/write mode, accepts ‘r+’ (read/write) or ‘r’ (read-only), defaults to ‘r+’ without arguments, and to ‘r’ with incorrect arguments
validate – Validate that the file conforms with the Loom specification

Returns

Nothing.

filename = None¶: Path to the file (as given when the LoomConnection was created)

shape = None¶: Shape of the dataset (n_rows, n_cols)

view = None¶

100]``

Type: Create a view of the file by slicing this attribute, like ``ds.view[
Type: 100,

ra = None¶: Row attributes, dict-like with np.ndarray values

ca = None¶: Column attributes, dict-like with np.ndarray values

attrs = None¶: Global attributes

row_graphs = None¶: Row graphs, dict-like with values that are scipy.sparse.coo_matrix objects

col_graphs = None¶: Column graphs, dict-like with values that are scipy.sparse.coo_matrix objects

property mode¶: The access mode of the connection (‘r’ or ‘r+’)

last_modified() → str[source]¶

Return an ISO8601 timestamp indicating when the file was last modified

Returns: An ISO8601 timestamp indicating when the file was last modified

Remarks:: If the file has no timestamp, and mode is ‘r+’, a new timestamp is created and returned. Otherwise, the current time in UTC is returned

get_changes_since(timestamp: str) → Dict[str, List][source]¶

Get a summary of the parts of the file that changed since the given time

Parameters: timestamp – ISO8601 timestamp
Returns: Dictionary like {"row_graphs": rg, "col_graphs": cg, "row_attrs": ra, "col_attrs": ca, "layers": layers} listing the names of objects that were modified since the given time
Return type: dict

sparse(rows: numpy.ndarray = None, cols: numpy.ndarray = None, layer: str = None) → scipy.sparse.coo.coo_matrix[source]¶

Return the main matrix or specified layer as a scipy.sparse.coo_matrix, without loading dense matrix in RAM

Parameters

rows – Rows to include, or None to include all
cols – Columns to include, or None to include all
layer – Layer to return, or None to return the default layer

Returns

Sparse matrix (scipy.sparse.coo_matrix)

close(suppress_warning: bool = False) → None[source]¶

Close the connection. After this, the connection object becomes invalid. Warns user if called after closing.

Parameters: suppress_warning – Suppresses warning message if True (defaults to false)

property closed¶: True if the connection is closed.

add_columns(layers: Union[numpy.ndarray, Dict[str, numpy.ndarray], loompy.layer_manager.LayerManager], col_attrs: Dict[str, numpy.ndarray], *, row_attrs: Dict[str, numpy.ndarray] = None, fill_values: Dict[str, numpy.ndarray] = None) → None[source]¶

Add columns of data and attribute values to the dataset.

Parameters

layers (dict or numpy.ndarray or LayerManager) – Either: 1) A N-by-M matrix of float32s (N rows, M columns) in this case columns are added at the default layer 2) A dict {layer_name : matrix} specified so that the matrix (N, M) will be added to layer layer_name 3) A LayerManager object (such as what is returned by view.layers)
col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
row_attrs (dict) – Optional row attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
fill_values – dictionary of values to use if a column attribute is missing, or “auto” to fill with zeros or empty strings

Returns

Nothing.

Notes

This will modify the underlying HDF5 file, which will interfere with any concurrent readers.
Column attributes in the file that are NOT provided, will be deleted (unless fill value provided).
Array with Nan should not be provided

add_loom(other_file: str, key: str = None, fill_values: Dict[str, numpy.ndarray] = None, batch_size: int = 1000, convert_attrs: bool = False, include_graphs: bool = False) → None[source]¶

Add the content of another loom file

Parameters

other_file – filename of the loom file to append
key – Primary key to use to align rows in the other file with this file
fill_values – default values to use for missing attributes (or None to drop missing attrs, or ‘auto’ to fill with sensible defaults)
batch_size – the batch size used by batchscan (limits the number of rows/columns read in memory)
convert_attrs – convert file attributes that differ between files into column attributes
include_graphs – if true, include all the column graphs from other_file that are also present in this file

Returns

Nothing, but adds the loom file. Note that the other loom file must have exactly the same number of rows, and must have exactly the same column attributes. Adds all the contents including layers but ignores layers in other_file that are not already present in self Note that graphs are normally not added, unless include_graphs == True, in which case column graphs are added

scan(*, items: numpy.ndarray = None, axis: int = None, layers: Iterable = None, key: str = None, batch_size: int = 512, what: List[str] = ['col_attrs', 'row_attrs', 'layers', 'col_graphs', 'row_graphs']) → Iterable[Tuple[int, numpy.ndarray, loompy.loom_view.LoomView]][source]¶

Scan across one axis and return batches of rows (columns) as LoomView objects

Parameters

items (np.ndarray) – the indexes [0, 2, 13, … ,973] of the rows/cols to include along the axis OR: boolean mask array giving the rows/cols to include
axis (int) – 0:rows or 1:cols
batch_size (int) – the chuncks returned at every element of the iterator
layers (iterable) – if specified it will batch scan only across some of the layers of the loom file if layers == None, all layers will be scanned if layers == [“”] or “”, only the default layer will be scanned
key – Name of primary key attribute. If specified, return the values sorted by the key

Returns

Iterable that yields triplets of (ix, indexes, view) where
ix (int) – first position / how many rows/cols have been yielded alredy
indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e. np.arange(len(ds.shape[axis]))) this is ix + selection
view (LoomView) – a view corresponding to the current chunk

map(f_list: List[Callable[[numpy.ndarray], int]], *, axis: int = 0, chunksize: int = 1000, selection: numpy.ndarray = None) → List[numpy.ndarray][source]¶

Apply a function along an axis without loading the entire dataset in memory.

Parameters

f – Function(s) that takes a numpy ndarray as argument
axis – Axis along which to apply the function (0 = rows, 1 = columns)
chunksize – Number of rows (columns) to load per chunk
selection – Columns (rows) to include

Returns

numpy.ndarray result of function application The result is a list of numpy arrays, one per supplied function in f_list. This is more efficient than repeatedly calling map() one function at a time.

permute(ordering: numpy.ndarray, axis: int) → None[source]¶

Permute the dataset along the indicated axis.

Parameters

ordering (list of int) – The desired order along the axis
axis (int) – The axis along which to permute

Returns

Nothing.

aggregate(out_file: str = None, select: numpy.ndarray = None, group_by: Union[str, numpy.ndarray] = 'Clusters', aggr_by: str = 'mean', aggr_ca_by: Dict[str, str] = None) → numpy.ndarray[source]¶

Aggregate the Loom file by applying aggregation functions to the main matrix as well as to the column attributes

Parameters

The name of the output Loom file (out_file) –
Bool array giving the columns to include (select) –
The column attribute to group by, or an np.ndarray of integer group labels (group_by) –
The aggregation function for the main matrix (aggr_by) –
A dictionary of aggregation functions for the column attributes (aggr_ca_by) –

Returns

m Aggregated main matrix

Remarks:

aggr_by gives the aggregation function for the main matrix aggr_ca_by is a dictionary with column attributes as keys and aggregation functionas as values

Aggregation functions can be any valid aggregation function from here: https://github.com/ml31415/numpy-groupies

In addition, you can specify:: “tally” to count the number of occurences of each value of a categorical attribute

export(out_file: str, layer: str = None, format: str = 'tab') → None[source]¶

Export the specified layer and row/col attributes as tab-delimited file.

Parameters

out_file – Path to the output file
layer – Name of the layer to export, or None to export the main matrix
format – Desired file format (only ‘tab’ is supported)