LoomConnection (class)

class loompy.loompy.LoomConnection(filename: str, mode: str = 'r+', *, validate: bool = True)[source]

A connection to a Loom file on disk. Typically LoomConnection objects are created using one of the functions on the loompy module, such as loompy.connect() or loompy.new(). LoomConnection objects are context managers and should normally be wrapped in a with block:

import loompy
with loompy.connect("mydata.loom") as ds:
        print(ds.ca.keys())

Inside the with block, you can access the dataset (here using the variable ds). When execution leaves the with block, the connection is automatically closed, freeing up resources.

__init__(filename: str, mode: str = 'r+', *, validate: bool = True) → None[source]

Establish a connection to a Loom file.

Parameters
  • filename – Name of the .loom file to open

  • mode – read/write mode, accepts ‘r+’ (read/write) or ‘r’ (read-only), defaults to ‘r+’ without arguments, and to ‘r’ with incorrect arguments

  • validate – Validate that the file conforms with the Loom specification

Returns

Nothing.

filename = None

Path to the file (as given when the LoomConnection was created)

shape = None

Shape of the dataset (n_rows, n_cols)

view = None

100]``

Type

Create a view of the file by slicing this attribute, like ``ds.view[

Type

100,

ra = None

Row attributes, dict-like with np.ndarray values

ca = None

Column attributes, dict-like with np.ndarray values

attrs = None

Global attributes

row_graphs = None

Row graphs, dict-like with values that are scipy.sparse.coo_matrix objects

col_graphs = None

Column graphs, dict-like with values that are scipy.sparse.coo_matrix objects

property mode

The access mode of the connection (‘r’ or ‘r+’)

last_modified() → str[source]

Return an ISO8601 timestamp indicating when the file was last modified

Returns

An ISO8601 timestamp indicating when the file was last modified

Remarks:

If the file has no timestamp, and mode is ‘r+’, a new timestamp is created and returned. Otherwise, the current time in UTC is returned

get_changes_since(timestamp: str) → Dict[str, List][source]

Get a summary of the parts of the file that changed since the given time

Parameters

timestamp – ISO8601 timestamp

Returns

Dictionary like {"row_graphs": rg, "col_graphs": cg, "row_attrs": ra, "col_attrs": ca, "layers": layers} listing the names of objects that were modified since the given time

Return type

dict

sparse(rows: numpy.ndarray = None, cols: numpy.ndarray = None, layer: str = None) → scipy.sparse.coo.coo_matrix[source]

Return the main matrix or specified layer as a scipy.sparse.coo_matrix, without loading dense matrix in RAM

Parameters
  • rows – Rows to include, or None to include all

  • cols – Columns to include, or None to include all

  • layer – Layer to return, or None to return the default layer

Returns

Sparse matrix (scipy.sparse.coo_matrix)

close(suppress_warning: bool = False) → None[source]

Close the connection. After this, the connection object becomes invalid. Warns user if called after closing.

Parameters

suppress_warning – Suppresses warning message if True (defaults to false)

property closed

True if the connection is closed.

add_columns(layers: Union[numpy.ndarray, Dict[str, numpy.ndarray], loompy.layer_manager.LayerManager], col_attrs: Dict[str, numpy.ndarray], *, row_attrs: Dict[str, numpy.ndarray] = None, fill_values: Dict[str, numpy.ndarray] = None) → None[source]

Add columns of data and attribute values to the dataset.

Parameters
  • layers (dict or numpy.ndarray or LayerManager) – Either: 1) A N-by-M matrix of float32s (N rows, M columns) in this case columns are added at the default layer 2) A dict {layer_name : matrix} specified so that the matrix (N, M) will be added to layer layer_name 3) A LayerManager object (such as what is returned by view.layers)

  • col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M

  • row_attrs (dict) – Optional row attributes, where keys are attribute names and values are numpy arrays (float or string) of length M

  • fill_values – dictionary of values to use if a column attribute is missing, or “auto” to fill with zeros or empty strings

Returns

Nothing.

Notes

  • This will modify the underlying HDF5 file, which will interfere with any concurrent readers.

  • Column attributes in the file that are NOT provided, will be deleted (unless fill value provided).

  • Array with Nan should not be provided

add_loom(other_file: str, key: str = None, fill_values: Dict[str, numpy.ndarray] = None, batch_size: int = 1000, convert_attrs: bool = False, include_graphs: bool = False) → None[source]

Add the content of another loom file

Parameters
  • other_file – filename of the loom file to append

  • key – Primary key to use to align rows in the other file with this file

  • fill_values – default values to use for missing attributes (or None to drop missing attrs, or ‘auto’ to fill with sensible defaults)

  • batch_size – the batch size used by batchscan (limits the number of rows/columns read in memory)

  • convert_attrs – convert file attributes that differ between files into column attributes

  • include_graphs – if true, include all the column graphs from other_file that are also present in this file

Returns

Nothing, but adds the loom file. Note that the other loom file must have exactly the same number of rows, and must have exactly the same column attributes. Adds all the contents including layers but ignores layers in other_file that are not already present in self Note that graphs are normally not added, unless include_graphs == True, in which case column graphs are added

scan(*, items: numpy.ndarray = None, axis: int = None, layers: Iterable = None, key: str = None, batch_size: int = 512, what: List[str] = ['col_attrs', 'row_attrs', 'layers', 'col_graphs', 'row_graphs']) → Iterable[Tuple[int, numpy.ndarray, loompy.loom_view.LoomView]][source]

Scan across one axis and return batches of rows (columns) as LoomView objects

Parameters
  • items (np.ndarray) – the indexes [0, 2, 13, … ,973] of the rows/cols to include along the axis OR: boolean mask array giving the rows/cols to include

  • axis (int) – 0:rows or 1:cols

  • batch_size (int) – the chuncks returned at every element of the iterator

  • layers (iterable) – if specified it will batch scan only across some of the layers of the loom file if layers == None, all layers will be scanned if layers == [“”] or “”, only the default layer will be scanned

  • key – Name of primary key attribute. If specified, return the values sorted by the key

Returns

  • Iterable that yields triplets of (ix, indexes, view) where

  • ix (int) – first position / how many rows/cols have been yielded alredy

  • indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e. np.arange(len(ds.shape[axis]))) this is ix + selection

  • view (LoomView) – a view corresponding to the current chunk

map(f_list: List[Callable[[numpy.ndarray], int]], *, axis: int = 0, chunksize: int = 1000, selection: numpy.ndarray = None) → List[numpy.ndarray][source]

Apply a function along an axis without loading the entire dataset in memory.

Parameters
  • f – Function(s) that takes a numpy ndarray as argument

  • axis – Axis along which to apply the function (0 = rows, 1 = columns)

  • chunksize – Number of rows (columns) to load per chunk

  • selection – Columns (rows) to include

Returns

numpy.ndarray result of function application The result is a list of numpy arrays, one per supplied function in f_list. This is more efficient than repeatedly calling map() one function at a time.

permute(ordering: numpy.ndarray, axis: int) → None[source]

Permute the dataset along the indicated axis.

Parameters
  • ordering (list of int) – The desired order along the axis

  • axis (int) – The axis along which to permute

Returns

Nothing.

aggregate(out_file: str = None, select: numpy.ndarray = None, group_by: Union[str, numpy.ndarray] = 'Clusters', aggr_by: str = 'mean', aggr_ca_by: Dict[str, str] = None) → numpy.ndarray[source]

Aggregate the Loom file by applying aggregation functions to the main matrix as well as to the column attributes

Parameters
  • The name of the output Loom file (out_file) –

  • Bool array giving the columns to include (select) –

  • The column attribute to group by, or an np.ndarray of integer group labels (group_by) –

  • The aggregation function for the main matrix (aggr_by) –

  • A dictionary of aggregation functions for the column attributes (aggr_ca_by) –

Returns

m Aggregated main matrix

Remarks:

aggr_by gives the aggregation function for the main matrix aggr_ca_by is a dictionary with column attributes as keys and aggregation functionas as values

Aggregation functions can be any valid aggregation function from here: https://github.com/ml31415/numpy-groupies

In addition, you can specify:

“tally” to count the number of occurences of each value of a categorical attribute

export(out_file: str, layer: str = None, format: str = 'tab') → None[source]

Export the specified layer and row/col attributes as tab-delimited file.

Parameters
  • out_file – Path to the output file

  • layer – Name of the layer to export, or None to export the main matrix

  • format – Desired file format (only ‘tab’ is supported)