demuxEM for cell-hashing/nucleus-hashing data analysis¶
demuxEM is a module on analyzing cell-hashing/nucleus-hashing data. It’s used by Cumulus in demultiplexing step.
Version 0.1.7 October 05, 2021¶
Filter cell barcodes with 0 hashtag counts in estimate_background_probs
Version 0.1.6 May 28, 2021¶
Fix the bug when the input multi-modal object contains non-RNA modality data.
Version 0.1.5 September 16, 2020¶
Add barplot showing percentage of RNA barcodes with HTO counts.
Version 0.1.4 July 15, 2020¶
Adapt to pegasusio 0.2.5.
Version 0.1.1 April 9, 2020¶
Adapt to pegasusio 0.1.3.
Version 0.1.0 April 8, 2020¶
Initial release under new name demuxEM.
Installation¶
demuxEM is published on PyPI as a Python package, and you can simply install it via pip
:
pip install demuxEM
Use demuxEM
as a command line tool¶
If you have data generated by cell-hashing or nucleus-hashing, you can use demuxEM
as a command line tool to demultiplex your data. Type:
demuxEM -h
to see the usage information:
Usage:
demuxEM [options] <input_raw_gene_bc_matrices_h5> <input_hto_csv_file> <output_name>
demuxEM -h | --help
demuxEM -v | --version
Arguments:
- input_raw_gene_bc_matrices_h5
Input raw RNA expression matrix in 10x hdf5 format.
- input_hto_csv_file
Input HTO (antibody tag) count matrix in CSV format.
- output_name
Output name. All outputs will use it as the prefix.
Options:
- -p <number>, -\-threads <number>
Number of threads. [default: 1]
- -\-genome <genome>
Reference genome name. If not provided, we will infer it from the expression matrix file.
- -\-alpha-on-samples <alpha>
The Dirichlet prior concentration parameter (alpha) on samples. An alpha value < 1.0 will make the prior sparse. [default: 0.0]
- -\-min-num-genes <number>
We only demultiplex cells/nuclei with at least <number> of expressed genes. [default: 100]
- -\-min-num-umis <number>
We only demultiplex cells/nuclei with at least <number> of UMIs. [default: 100]
- -\-min-signal-hashtag <count>
Any cell/nucleus with less than <count> hashtags from the signal will be marked as unknown. [default: 10.0]
- -\-random-state <seed>
The random seed used in the KMeans algorithm to separate empty ADT droplets from others. [default: 0]
- -\-generate-diagnostic-plots
Generate a series of diagnostic plots, including the background/signal between HTO counts, estimated background probabilities, HTO distributions of cells and non-cells etc.
- -\-generate-gender-plot <genes>
Generate violin plots using gender-specific genes (e.g. Xist). <gene> is a comma-separated list of gene names.
- -h, -\-help
Print out help information.
Outputs:
- output_name_demux.zarr
RNA expression matrix with demultiplexed sample identities in Zarr format.
- output_name.out.demuxEM.zarr
DemuxEM-calculated results in Zarr format, containing two datasets, one for HTO and one for RNA.
- output_name.ambient_hashtag.hist.pdf
Optional output. A histogram plot depicting hashtag distributions of empty droplets and non-empty droplets.
- output_name.background_probabilities.bar.pdf
Optional output. A bar plot visualizing the estimated hashtag background probability distribution.
- output_name.real_content.hist.pdf
Optional output. A histogram plot depicting hashtag distributions of not-real-cells and real-cells as defined by total number of expressed genes in the RNA assay.
- output_name.rna_demux.hist.pdf
Optional output. This figure consists of two plots. The first one is a horizontal bar plot depicting the percentage of RNA barcodes with at least one HTO count. The second plot is a histogram plot depicting RNA UMI distribution for singlets, doublets and unknown cells.
- output_name.gene_name.violin.pdf
Optional outputs. Violin plots depicting gender-specific gene expression across samples. We can have multiple plots if a gene list is provided in ‘–generate-gender-plot’ option.
Examples:
demuxEM -p 8 --generate-diagnostic-plots sample_raw_gene_bc_matrices.h5 sample_hto.csv sample_output
API¶
demuxEM can also be used as a python package. Import demuxEM by:
import demuxEM
Demultiplexing¶
|
For cell-hashing data, estimate antibody background probability using KMeans algorithm. |
|
Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in |
|
Write demultiplexing results into raw gene expression matrix. |
Contact us¶
demuxEM is maintained by Cumulus team. If you have any questions, please feel free to contact us via Cumulus Support Google Group.