ReleaseNotes_1_14.utf8.md

DADA2 1.14 RELEASE NOTES

Directories containing fastq files (possibly compressed) can now be provided to core dada2 functions instead of a character vector of the fastq filenames. This functionality is supported by filterAndTrim, learnErrors, dada, mergePairs and derepFastq. Note, this feature requires fastqs in the provided directory to have standard file extensions: .fastq, .fastq.gz or .fastq.bz2.
The new DETECT_SINGLETONS option removes the removes the conditional in the calculation of probabilties used in the core dada algorithm, which effectively discounts the first read of any novel sequence. In practice, setting DETECT_SINGLETONS = TRUE allows singletons to be detected (of course) and also increases sensitivity to other low abundance sequences slightly, i.e. those present in just 2/3/4 reads. Note, we do not generally recommend this option as it will also result in a large increase in false positives in typical datasets. Instead we recommend pool = "pseudo" or pool=TRUE for typical datasets to increase sensitivity to rare sequences with less impact on specificity. But, for the prepared, this is a useful new option to increase sensitivity to rare sequences, and may be particulary effective in certain contexts (e.g. very low depth samples, very well-behaved sequencing techs).

The removePrimers function has been improved in several ways. Indels are now allowed when matching primers with the allow.indels=TRUE flag. This option can increase primer matching, but at a roughly 4x cost in speed. Multiple files are now properly handled, and a previous bug in handling the absence of a reverse primer sequence has been rectified. Note, removePrimers is still only recommended for PacBio or other long-read technologies for speed reasons. For deeper short-read data (e.g. Illumina) we recommend external solutions such as cutadapt or trimmomatic.
Sequence lengths up to 9999 nucleotides are now supported throughout the dada2 package.
The new tryRC option in the mergeSequenceTables function will collapse together sequences that are identical up to reverse-complementation. This is most useful for combining datasets from the same gene region, but that may have been sequenced in different orientations.

collapseNoMismatch no properly collapses sequences together that substantially vary in length.
getSequences now coerces sequences to upper case, as expected by other dada2 functions.