• The new seqComplexity function quantifies the complexity of sequences in terms of the Shannon richness of their kmers. plotComplexity interrogates the distribution of sequences complexities in fastq files, and the rm.lowcomplex argument in the filterAndTrim function allows filtering of low complexity sequences.

  • The new removePrimers function removes forward and reverse primers from sequencing reads, and can orient reads based on the location of the forward primer. Currently we recommend removePrimers for use with PacBio CCS data, but external solutions remain recommended for Illumina data.


  • The dada function can now accept fastq filenames rather than requiring files be dereplicated and stored into memory first. This allows memory requirements to remain flat when processing large numbers of samples.

  • Pseudo-pooling, an algorithmic approximation to sample inference from pooled samples, now has memory requirements that remain flat with sample number when invoked using filenames, e.g. dada(fastqFiles, err=err, pool="pseudo"). We now recommend pseudo-pooling for those interested in detecting singleton ASVs in their samples.


  • Pooled sample inference with dada(..., pool=TRUE) no longer fails to output the most abundant ASV in the first sample under certain conditions.

  • The data.frame returned by mergePairs is now properly formatted even when only one sample was provided as a list.

Maintained by Benjamin Callahan (benjamin DOT j DOT callahan AT gmail DOT com)
Documentation License: CC-BY 4.0