`mixcr analyze`

A single command to execute a complete upstream analysis pipeline from the raw fastq files to clonotype tables.

The analyze command takes a preset name as a required argument and runs a sequence of analysis steps defined by the preset. It sets meaningful names for the intermediate and resulting files and saves all the reports along the pipeline in both txt and json formats (if not set otherwise by command line options). Preset defines specifically optimized parameters for the particular data type for each of the execution analysis steps. A powerful file name expansion functionality allows to take and process a batch of raw sequencing files at once on the fly and optionally assign molecular, cell and sample barcodes extracted from the file names. Sample tables allow to analyze several patient samples at once using sample barcodes that may be picked up from all possible sources. MiXCR supports paired-end and single-end .fastq, .fasta, .bam and .sam formats.

MiXCR provides a comprehensive list of built-in preset for many of commercially available kits and public protocols.

Command line options

mixcr analyze [--help]

    # analyze-specific options

    [--no-reports] 
    [--no-json-reports]
    [--output-not-used-reads]  
    [--use-local-temp]
    [--threads <n>] 
    [--force-overwrite]

    # mix-ins

    [--add-step <step>] 
    [--remove-step <step>] 
    [--limit-input <n>]
    [--species <species>] 
    [--library <library>] 
    [--split-by-sample]
    [--dont-split-by-sample]
    [--sample-table sample_table.tsv]
    [--dna] [--rna] 
    [--floating-left-alignment-boundary [<anchor_point>]]
    [--rigid-left-alignment-boundary [<anchor_point>]]
    [--floating-right-alignment-boundary (<gene_type>|<anchor_point>)] 
    [--rigid-right-alignment-boundary [(<gene_type>|<anchor_point>)]] 
    [--tag-pattern <pattern>] 
    [--keep-non-CDR3-alignments] [--drop-non-CDR3-alignments] 
    [--assemble-clonotypes-by <gene_features>]
    [--split-clones-by <gene_type>]... [--dont-split-clones-by <gene_type>]...  
    [--assemble-contigs-by <gene_features>] 
    [--impute-germline-on-export]
    [--dont-impute-germline-on-export]
    [--prepend-export-clones-field <field> [<param>...]]...
    [--append-export-clones-field <field> [<param>...]]...
    [--prepend-export-alignments-field <field> [<param>...]]...
    [--append-export-alignments-field <field> [<param>...]]... 
    [--add-export-clone-table-splitting <(geneLabel|tag):key>]
    [--reset-export-clone-table-splitting] 
    [--add-export-clone-grouping <(geneLabel|tag):key>]
    [--reset-export-clone-grouping]
    [-M <key=value>]...      

    # inputs and outputs

    <preset_name> 
    ([I1.fastq[.gz] [I2.fastq[.gz]]] R1.fastq[.gz] [R2.fastq[.gz]] 
     | file.(fasta|bam|sam))  
    output_prefix

To take and process a batch of input sequencing files at once and optionally assign molecular, cell and sample barcodes extracted from the file names one can use a powerful file name expansion functionality. Sample tables allow to analyze several patient samples at once using sample barcodes that may be picked up from all possible sources.

Analyze-specific command line options:

<preset_name>: Name of the analysis preset (see complete list of available presets). This is the only required option to run the analysis.
output_prefix: Path prefix telling mixcr where to put all output files, individual intermediate and resulting files will have suffixes according to the steps they were produced with. If argument ends with file separator, then outputs will be written in specified directory.
--no-reports: Don't output txt report files for each of the steps
--no-json-reports: Don't output json report files for each of the steps
--output-not-used-reads: If specified, not aligned reads will be written in {output_prefix}.not_aligned.{(I1|I2|R1|R2)}.fastq.gz, not parsed reads will be written in {output_prefix}.not_parsed.{(I1|I2|R1|R2)}.fastq.gz
--use-local-temp: Put temporary files in the same folder as the output files.
-t, --threads <n>: Processing threads
-f, --force-overwrite: Force overwrite of output file(s). Beware, no "smart resume / reanalysis" feature is yet implemented for the new incarnation of analyze, with this option analyze will just remove all output files and start analysis from scratch.
-h, --help: Show the help message and exit.

In addition to these parameters, any of the available mix-in options may be additionally specify at analyze.