Cookbook¶
In this section, we will show by example how to complete common tasks with idiomatic use of loompy.
Working with a newly created file¶
Loompy 2 changes the behaviour of loompy.create
: it no longer returns a value. Thus in order to work with a newly created file you have to do this:
loompy.create("filename.loom", m, row_attrs, col_attrs)
with loompy.connect("filename.loom") as ds:
....do something with ds
# File closes automatically
The reason for the change is that we would often create files without closing the returned file handle, which causes issues especially in multi-process scenarios.
Note: if you simply want to create the file, and not access it, there is no need to use a with
statement:
loompy.create("filename.loom", m, row_attrs, col_attrs)
This will leave the file closed.
Loading attributes from Pandas¶
If you have your metadata in a Pandas DataFrame, you can easily use it to create a Loom file. You will need one DataFrame for the column metadata and one for the row metadata. Convert each DataFrame into a dictionary of lists:
df_row_metadata = ... # A pandas DataFrame holding row metadata
df_col_metadata = ... # A pandas DataFrame holding column metadata
data = ... # A numpy ndarray holding the main dataset
loompy.create(filename, data, df_row_metadata.to_dict("list"), df_col_metadata.to_dict("list"))
Combining data using scan() and new()¶
We often want to scan through a number of input files (for example, raw
data files from multiple experiments), select a subset of the columns (e.g. cells passing QC)
and write them to a new file. This can be accomplished by creating an empty file using loompy.new()
and
then filling it up using the scan()
method.
For example, let’s select cells that have more than 500 detected UMIs in each of several files:
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Note that by using new()
we will first be creating the new file, then appending columns to it.
But what if the input files do not have their rows in the same order? scan()
accepts a key
argument
to designate a primary key; each view is then returned sorted on the primary key on the other axis.
For example, if you’re scanning across columns, you should provide a row attribute as key, and each view will be sorted
on that attribute.
Here’s the same example, but this time we provide the key="Accession"
argument to ensure that the input files
are sorted on the accession identifier along rows:
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1, key="Accession"):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Fitting an incremental PCA¶
Incremental algorithms are a powerful way of working with datasets that won’t fit in RAM. For example, we can use incremental PCA to learn a PCA transform by batch-wise partial fits:
from sklearn.decomposition import IncrementalPCA
genes = (ds.ra.Selected == 1)
pca = IncrementalPCA(n_components=50)
for (ix, selection, view) in ds.scan(axis=1):
self.pca.partial_fit(view[genes, :].transpose())