Fast and accurate sample inference from amplicon data with single-nucleotide resolution


Nov 1: Version 1.6 of the dada2 R package has been released on Bioconductor!

 

Installation

Binaries for the current release version of DADA2 (1.6) are available from Bioconductor. Note that you must have R 3.4.2 or newer, and Biocondcutor version 3.6, to install the current release from Bioconductor.

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("dada2")

If you wish to install the latest and greatest development version, or to install to earlier versions of R, see our from-source installation instructions.

 

Tutorials

Start here: The DADA2 tutorial goes through a typical workflow for paired end Illumina Miseq data: raw amplicon sequencing data is processed into the table of exact amplicon sequence variants (ASVs) present in each sample.

The DADA2 Workflow on Big Data goes through workflow optimized to run on large datasets (10s of millions to billions of reads).

Short demonstrations of assigning taxonomy and assigning species to sequences.

 

Benchmarking

Our manuscript introducing DADA2 (OA link) compares the accuracy of DADA2 and other methods on several mock community datasets.

We describe the broad advantages of exact sequence variants over OTUs in our recent open-access ISMEJ paper.

Further benchmarking of DADA2 against the methods evaluated in a recent QIIME1 benchmarking paper is available.

And sometimes a picture says a thousand word:

 

Advantages

 

Support and Development

Planned feature improvements are publicly catalogued at the main DADA2 development site on github; specifically on the issues tracker for DADA2. If the feature you are hoping for is not listed, you are welcome to add it as a feature request on this page.

Bug reports and problems using DADA2 are also welcome on the issues tracker. We prefer posting to the issue tracker over email as these posts are searchable by other users who may experience the same problems.

 

How?

Accuracy: DADA2’s crucial advantage is that it uses more of the data. The DADA2 error model incorporates quality information, which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances, whereas most other methods use abundance ranks if they use abundance at all. The DADA2 error model identifies the differences between sequences, eg. A->C, whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself, rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.

Performance: DADA2’s computational scaling gains come from the fact that it infers sequences exactly rather than constructing OTUs. De novo OTUs cannot be compared across samples unless all samples were pooled during OTU construction. However, exact sequences are comparable across samples, as exact sequences are consistent labels. Thus DADA2 can analyze each sample independently, resulting in linear scaling with sample number and trivial parallelization.


Maintained by Benjamin Callahan (benjamin DOT j DOT callahan AT gmail DOT com)