LoomConnection (class)

class loompy.loompy.LoomConnection(filename: str, mode: str = 'r+', *, validate: bool = True, spec_version: str = '2.0.1')[source]

A connection to a Loom file on disk. Typically LoomConnection objects are created using one of the functions on the loompy module, such as loompy.connect() or loompy.new(). LoomConnection objects are context managers and should normally be wrapped in a with block:

import loompy
with loompy.connect("mydata.loom") as ds:
     print(ds.ca.keys())

Inside the with block, you can access the dataset (here using the variable ds). When execution leaves the with block, the connection is automatically closed, freeing up resources.

__init__(filename: str, mode: str = 'r+', *, validate: bool = True, spec_version: str = '2.0.1') → None[source]

Establish a connection to a Loom file.

Parameters:
  • filename – Name of the .loom file to open
  • mode – read/write mode, accepts ‘r+’ (read/write) or ‘r’ (read-only), defaults to ‘r+’ without arguments, and to ‘r’ with incorrect arguments
  • validate – Validate that the file conforms with the Loom specification
Returns:

Nothing.

filename = None

Path to the file (as given when the LoomConnection was created)

shape = None

Shape of the dataset (n_rows, n_cols)

layers = None

A dict-like interface to named layers. Keys are strings (the main matrix is named “”) and values are sliceable LoomLayer objects that support fancy indexing like numpy.ndarray objects.

To read an entire layer into memory, use ds.layers[name][:, :] (i.e. select all rows and all columns). Layers can also be loaded as sparse matrices by ds.layers[name].sparse().

with loompy.connect("mydataset.loom") as ds:
   print(ds.layers.keys())
   print(f"There are {len(ds.layers)} layers")
   for name, layer in ds.layers.items():
       print(name, layer.shape, layer.dtype)
view = None

100]``

Type:Create a view of the file by slicing this attribute, like
Type:100,
ra = None

Row attributes, dict-like with np.ndarray values

ca = None

Column attributes, dict-like with np.ndarray values

attrs = None

Global attributes

row_graphs = None

Row graphs, dict-like with values that are scipy.sparse.coo_matrix objects

col_graphs = None

Column graphs, dict-like with values that are scipy.sparse.coo_matrix objects

mode

The access mode of the connection (‘r’ or ‘r+’)

last_modified() → str[source]

Return an ISO8601 timestamp indicating when the file was last modified

Returns:An ISO8601 timestamp indicating when the file was last modified
Remarks:
If the file has no timestamp, and mode is ‘r+’, a new timestamp is created and returned. Otherwise, the current time in UTC is returned
get_changes_since(timestamp: str) → Dict[str, List][source]

Get a summary of the parts of the file that changed since the given time

Parameters:timestamp – ISO8601 timestamp
Returns:Dictionary like {"row_graphs": rg, "col_graphs": cg, "row_attrs": ra, "col_attrs": ca, "layers": layers} listing the names of objects that were modified since the given time
Return type:dict
sparse(rows: numpy.ndarray = None, cols: numpy.ndarray = None, layer: str = None) → scipy.sparse.coo.coo_matrix[source]

Return the main matrix or specified layer as a scipy.sparse.coo_matrix, without loading dense matrix in RAM

Parameters:
  • rows – Rows to include, or None to include all
  • cols – Columns to include, or None to include all
  • layer – Layer to return, or None to return the default layer
Returns:

Sparse matrix (scipy.sparse.coo_matrix)

close(suppress_warning: bool = False) → None[source]

Close the connection. After this, the connection object becomes invalid. Warns user if called after closing.

Parameters:suppress_warning – Suppresses warning message if True (defaults to false)
closed

True if the connection is closed.

add_columns(layers: Union[numpy.ndarray, Dict[str, numpy.ndarray], loompy.layer_manager.LayerManager], col_attrs: Dict[str, numpy.ndarray], *, row_attrs: Dict[str, numpy.ndarray] = None, fill_values: Dict[str, numpy.ndarray] = None) → None[source]

Add columns of data and attribute values to the dataset.

Parameters:
  • layers (dict or numpy.ndarray or LayerManager) – Either: 1) A N-by-M matrix of float32s (N rows, M columns) in this case columns are added at the default layer 2) A dict {layer_name : matrix} specified so that the matrix (N, M) will be added to layer layer_name 3) A LayerManager object (such as what is returned by view.layers)
  • col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
  • row_attrs (dict) – Optional row attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
  • fill_values – dictionary of values to use if a column attribute is missing, or “auto” to fill with zeros or empty strings
Returns:

Nothing.

Notes

  • This will modify the underlying HDF5 file, which will interfere with any concurrent readers.
  • Column attributes in the file that are NOT provided, will be deleted (unless fill value provided).
  • Array with Nan should not be provided
add_loom(other_file: str, key: str = None, fill_values: Dict[str, numpy.ndarray] = None, batch_size: int = 1000, convert_attrs: bool = False) → None[source]

Add the content of another loom file

Parameters:
  • other_file – filename of the loom file to append
  • key – Primary key to use to align rows in the other file with this file
  • fill_values – default values to use for missing attributes (or None to drop missing attrs, or ‘auto’ to fill with sensible defaults)
  • batch_size – the batch size used by batchscan (limits the number of rows/columns read in memory)
  • convert_attrs – convert file attributes that differ between files into column attributes
Returns:

Nothing, but adds the loom file. Note that the other loom file must have exactly the same number of rows, and must have exactly the same column attributes. The all the contents including layers but ignores layers in other_file that are not already present in self

scan(*, items: numpy.ndarray = None, axis: int = None, layers: Iterable = None, key: str = None, batch_size: int = 512) → Iterable[Tuple[int, numpy.ndarray, loompy.loom_view.LoomView]][source]

Scan across one axis and return batches of rows (columns) as LoomView objects

Parameters:
  • items (np.ndarray) – the indexes [0, 2, 13, … ,973] of the rows/cols to include along the axis OR: boolean mask array giving the rows/cols to include
  • axis (int) – 0:rows or 1:cols
  • batch_size (int) – the chuncks returned at every element of the iterator
  • layers (iterable) – if specified it will batch scan only across some of the layers of the loom file if layers == None, all layers will be scanned if layers == [“”] or “”, only the default layer will be scanned
  • key – Name of primary key attribute. If specified, return the values sorted by the key
Returns:

  • Iterable that yields triplets of (ix, indexes, view) where
  • ix (int) – first position / how many rows/cols have been yielded alredy
  • indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e. np.arange(len(ds.shape[axis]))) this is ix + selection
  • view (LoomView) – a view corresponding to the current chunk

map(f_list: List[Callable[numpy.ndarray, int]], *, axis: int = 0, chunksize: int = 1000, selection: numpy.ndarray = None) → List[numpy.ndarray][source]

Apply a function along an axis without loading the entire dataset in memory.

Parameters:
  • f – Function(s) that takes a numpy ndarray as argument
  • axis – Axis along which to apply the function (0 = rows, 1 = columns)
  • chunksize – Number of rows (columns) to load per chunk
  • selection – Columns (rows) to include
Returns:

numpy.ndarray result of function application The result is a list of numpy arrays, one per supplied function in f_list. This is more efficient than repeatedly calling map() one function at a time.

permute(ordering: numpy.ndarray, axis: int) → None[source]

Permute the dataset along the indicated axis.

Parameters:
  • ordering (list of int) – The desired order along the axis
  • axis (int) – The axis along which to permute
Returns:

Nothing.

pandas()[source]

Create a Pandas DataFrame corresponding to (selected parts of) the Loom file.

Parameters:
  • row_attr – Name of the row attribute to use for selecting rows to include (or None to omit row data)
  • selector – A list, a tuple, a numpy.ndarray or a slice; used to select rows (or None to include all rows)
  • columns – A list of column attributes to include, or None to include all
Returns:

Pandas DataFrame

Remarks:
The method returns a Pandas DataFrame with one column per row of the Loom file (i.e. transposed), which is usually what is required for plotting and statistical analysis. By default, all column attributes and no rows are included. To include row data, provide a row_attr and a selector. The selector is matched against values of the given row attribute, and matching rows are included.

Examples

import loompy
with loompy.connect("mydata.loom") as ds:
        # Include all column attributes, and rows where attribute "Gene" matches one of the given genes
        df1 = ds.pandas("Gene", ["Actb", "Npy", "Vip", "Pvalb"])
        # Include the top 100 rows and name them after values of the "Gene" attribute
        df2 = ds.pandas("Gene", :100)
        # Include the entire dataset, and name the rows after values of the "Accession" attribute
        df3 = ds.pandas("Accession")
export(out_file: str, layer: str = None, format: str = 'tab') → None[source]

Export the specified layer and row/col attributes as tab-delimited file.

Parameters:
  • out_file – Path to the output file
  • layer – Name of the layer to export, or None to export the main matrix
  • format – Desired file format (only ‘tab’ is supported)