Use demuxEM as a command line tool

If you have data generated by cell-hashing or nucleus-hashing, you can use demuxEM as a command line tool to demultiplex your data. Type:

demuxEM -h

to see the usage information:

        demuxEM [options] <input_raw_gene_bc_matrices_h5> <input_hto_csv_file> <output_name>
        demuxEM -h | --help
        demuxEM -v | --version
  • Arguments:


    Input raw RNA expression matrix in 10x hdf5 format.


    Input HTO (antibody tag) count matrix in CSV format.


    Output name. All outputs will use it as the prefix.

  • Options:

    -p <number>, -\-threads <number>

    Number of threads. [default: 1]

    -\-genome <genome>

    Reference genome name. If not provided, we will infer it from the expression matrix file.

    -\-alpha-on-samples <alpha>

    The Dirichlet prior concentration parameter (alpha) on samples. An alpha value < 1.0 will make the prior sparse. [default: 0.0]

    -\-min-num-genes <number>

    We only demultiplex cells/nuclei with at least <number> of expressed genes. [default: 100]

    -\-min-num-umis <number>

    We only demultiplex cells/nuclei with at least <number> of UMIs. [default: 100]

    -\-min-signal-hashtag <count>

    Any cell/nucleus with less than <count> hashtags from the signal will be marked as unknown. [default: 10.0]

    -\-random-state <seed>

    The random seed used in the KMeans algorithm to separate empty ADT droplets from others. [default: 0]


    Generate a series of diagnostic plots, including the background/signal between HTO counts, estimated background probabilities, HTO distributions of cells and non-cells etc.

    -\-generate-gender-plot <genes>

    Generate violin plots using gender-specific genes (e.g. Xist). <gene> is a comma-separated list of gene names.

    -h, -\-help

    Print out help information.

  • Outputs:


    RNA expression matrix with demultiplexed sample identities in Zarr format.


    DemuxEM-calculated results in Zarr format, containing two datasets, one for HTO and one for RNA.


    Optional output. A histogram plot depicting hashtag distributions of empty droplets and non-empty droplets.

    Optional output. A bar plot visualizing the estimated hashtag background probability distribution.


    Optional output. A histogram plot depicting hashtag distributions of not-real-cells and real-cells as defined by total number of expressed genes in the RNA assay.


    Optional output. This figure consists of two plots. The first one is a horizontal bar plot depicting the percentage of RNA barcodes with at least one HTO count. The second plot is a histogram plot depicting RNA UMI distribution for singlets, doublets and unknown cells.


    Optional outputs. Violin plots depicting gender-specific gene expression across samples. We can have multiple plots if a gene list is provided in ‘–generate-gender-plot’ option.

  • Examples:

    demuxEM -p 8 --generate-diagnostic-plots sample_raw_gene_bc_matrices.h5 sample_hto.csv sample_output