demuxEM for cell-hashing/nucleus-hashing data analysis

PyPI Conda Python License Docs

demuxEM is a module on analyzing cell-hashing/nucleus-hashing data. It’s used by Cumulus in demultiplexing step.

Version 0.1.7 October 05, 2021

Filter cell barcodes with 0 hashtag counts in estimate_background_probs

Version 0.1.6 May 28, 2021

Fix the bug when the input multi-modal object contains non-RNA modality data.

Version 0.1.5 September 16, 2020

Add barplot showing percentage of RNA barcodes with HTO counts.

Version 0.1.4 July 15, 2020

Adapt to pegasusio 0.2.5.

Version 0.1.1 April 9, 2020

Adapt to pegasusio 0.1.3.

Version 0.1.0 April 8, 2020

Initial release under new name demuxEM.

Installation

demuxEM is published on PyPI as a Python package, and you can simply install it via pip:

pip install demuxEM

Use demuxEM as a command line tool

If you have data generated by cell-hashing or nucleus-hashing, you can use demuxEM as a command line tool to demultiplex your data. Type:

demuxEM -h

to see the usage information:

Usage:
        demuxEM [options] <input_raw_gene_bc_matrices_h5> <input_hto_csv_file> <output_name>
        demuxEM -h | --help
        demuxEM -v | --version
  • Arguments:

    input_raw_gene_bc_matrices_h5

    Input raw RNA expression matrix in 10x hdf5 format.

    input_hto_csv_file

    Input HTO (antibody tag) count matrix in CSV format.

    output_name

    Output name. All outputs will use it as the prefix.

  • Options:

    -p <number>, -\-threads <number>

    Number of threads. [default: 1]

    -\-genome <genome>

    Reference genome name. If not provided, we will infer it from the expression matrix file.

    -\-alpha-on-samples <alpha>

    The Dirichlet prior concentration parameter (alpha) on samples. An alpha value < 1.0 will make the prior sparse. [default: 0.0]

    -\-min-num-genes <number>

    We only demultiplex cells/nuclei with at least <number> of expressed genes. [default: 100]

    -\-min-num-umis <number>

    We only demultiplex cells/nuclei with at least <number> of UMIs. [default: 100]

    -\-min-signal-hashtag <count>

    Any cell/nucleus with less than <count> hashtags from the signal will be marked as unknown. [default: 10.0]

    -\-random-state <seed>

    The random seed used in the KMeans algorithm to separate empty ADT droplets from others. [default: 0]

    -\-generate-diagnostic-plots

    Generate a series of diagnostic plots, including the background/signal between HTO counts, estimated background probabilities, HTO distributions of cells and non-cells etc.

    -\-generate-gender-plot <genes>

    Generate violin plots using gender-specific genes (e.g. Xist). <gene> is a comma-separated list of gene names.

    -h, -\-help

    Print out help information.

  • Outputs:

    output_name_demux.zarr

    RNA expression matrix with demultiplexed sample identities in Zarr format.

    output_name.out.demuxEM.zarr

    DemuxEM-calculated results in Zarr format, containing two datasets, one for HTO and one for RNA.

    output_name.ambient_hashtag.hist.pdf

    Optional output. A histogram plot depicting hashtag distributions of empty droplets and non-empty droplets.

    output_name.background_probabilities.bar.pdf

    Optional output. A bar plot visualizing the estimated hashtag background probability distribution.

    output_name.real_content.hist.pdf

    Optional output. A histogram plot depicting hashtag distributions of not-real-cells and real-cells as defined by total number of expressed genes in the RNA assay.

    output_name.rna_demux.hist.pdf

    Optional output. This figure consists of two plots. The first one is a horizontal bar plot depicting the percentage of RNA barcodes with at least one HTO count. The second plot is a histogram plot depicting RNA UMI distribution for singlets, doublets and unknown cells.

    output_name.gene_name.violin.pdf

    Optional outputs. Violin plots depicting gender-specific gene expression across samples. We can have multiple plots if a gene list is provided in ‘–generate-gender-plot’ option.

  • Examples:

    demuxEM -p 8 --generate-diagnostic-plots sample_raw_gene_bc_matrices.h5 sample_hto.csv sample_output
    

API

demuxEM can also be used as a python package. Import demuxEM by:

import demuxEM

Demultiplexing

estimate_background_probs(hashing_data[, ...])

For cell-hashing data, estimate antibody background probability using KMeans algorithm.

demultiplex(rna_data, hashing_data[, ...])

Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in demuxEM.estimate_background_probs.

attach_demux_results(input_rna_file, rna_data)

Write demultiplexing results into raw gene expression matrix.

Contact us

demuxEM is maintained by Cumulus team. If you have any questions, please feel free to contact us via Cumulus Support Google Group.