Load packages.
library(decontam); packageVersion("decontam")
## [1] '1.11.1'
library(ggplot2); packageVersion("ggplot2")
## [1] '3.3.5'
library(dada2); packageVersion("dada2")
## Loading required package: Rcpp
## [1] '1.19.2'
Note, the imported data has already been curated in the companion script FelineBileCuration.Rmd.
load(file="../RDS/bile.rda")
The decontam package can be used to discriminate between contaminants, which could be likely given the very low biomass (often below-limit-of-detection DNA concentrations) samples we are working with here, and “real” sequences actually present in the sample.
We’ll apply the decontam package to score ASVs as being more contaminant-like or real-sequence-like using the prevalence approach (see the decontam manuscript for more detail).
contam <- isContaminant(st.filt, neg=df$NegativeControl, method="prevalence")
Add some additional annotations to the data.frame output by isContaminant
.
sq.filt <- colnames(st.filt)
if(!identical(sq.filt, rownames(contam))) stop("Mismatch between sq.filt and contam.")
names(sq.filt) <- paste0("ASV", seq_along(sq.filt))
contam$Phylum <- tax.filt[,"Phylum"]
contam$Genus <- tax.filt[,"Genus"]
contam$ASV <- names(sq.filt)
contam$Prevalence <- cut(contam$prev, c(0, 10, 25, 45, 9999), labels=c("1-10", "11-25", "26-45", "45+"))
contam$Abundance <- colSums(st.filt)
rownames(contam) <- NULL # To avoid printing long sequences
head(contam)
## freq prev p.freq p.prev p contaminant Phylum
## 1 0.11387662 72 NA 0.2954461 0.2954461 FALSE Firmicutes
## 2 0.05388086 69 NA 0.8003070 0.8003070 FALSE Proteobacteria
## 3 0.05487483 73 NA 0.3851351 0.3851351 FALSE Firmicutes
## 4 0.02419335 73 NA 0.3851351 0.3851351 FALSE Firmicutes
## 5 0.02054138 70 NA 0.5594520 0.5594520 FALSE Bacteroidota
## 6 0.01668738 69 NA 0.4683594 0.4683594 FALSE Actinobacteriota
## Genus ASV Prevalence Abundance
## 1 Anoxybacillus ASV1 45+ 611129
## 2 Escherichia-Shigella ASV2 45+ 395226
## 3 Ileibacterium ASV3 45+ 284515
## 4 Dubosiella ASV4 45+ 124657
## 5 <NA> ASV5 45+ 107794
## 6 Bifidobacterium ASV6 45+ 90194
The “score” we will leverage is in the column p
. The score ranges from 0 to 1, with high scores indicating a better match with a real-sample origin, and low scores a better match with a contaminant origin.
Borrowing some code from the reproducible analyses associated with the decontam manuscript to visualize the distribution of scores:
histo <- ggplot(contam, aes(x=p)) +
labs(x = 'decontam Score', y='Number of ASVs') +
geom_histogram(binwidth=0.02)
histo + facet_wrap(~Prevalence)
## Warning: Removed 1 rows containing non-finite values (stat_bin).
This distribution of scores provides little to no evidence for the presence of a core microbiome that is distinguishable from contamination. There is no high-score (real-sequence-like) mode in the scores, outside of perhaps the low prevalence (present in 1-10 samples) category. And, more concerning, the high-score fraction seems to decrease as the higher-prevalence ASVs are considered, which is where we would expect to observe the core microbiome if there was one.
Given these results, and previous observations of very low DNA concentration in these samples, is to use the score to identify the best candidate non-contaminants, and investigate those further.
i.top10 <- order(contam$p, decreasing=TRUE)[1:10]
contam[i.top10,]
## freq prev p.freq p.prev p contaminant Phylum
## 274 0.0007062372 15 NA 0.9939534 0.9939534 FALSE Proteobacteria
## 111 0.0016812982 43 NA 0.9929038 0.9929038 FALSE Firmicutes
## 206 0.0012225258 14 NA 0.9915629 0.9915629 FALSE Proteobacteria
## 275 0.0006954256 14 NA 0.9915629 0.9915629 FALSE <NA>
## 360 0.0004252741 14 NA 0.9915629 0.9915629 FALSE <NA>
## 193 0.0013621147 13 NA 0.9883032 0.9883032 FALSE Proteobacteria
## 381 0.0003861779 13 NA 0.9883032 0.9883032 FALSE Proteobacteria
## 529 0.0001009390 13 NA 0.9883032 0.9883032 FALSE Firmicutes
## 198 0.0013314009 12 NA 0.9838844 0.9838844 FALSE Proteobacteria
## 326 0.0005174679 12 NA 0.9838844 0.9838844 FALSE Proteobacteria
## Genus ASV Prevalence Abundance
## 274 <NA> ASV274 11-25 2245
## 111 [Ruminococcus] torques group ASV111 26-45 7970
## 206 <NA> ASV206 11-25 3503
## 275 <NA> ASV275 11-25 2227
## 360 <NA> ASV360 11-25 1439
## 193 <NA> ASV193 11-25 3868
## 381 <NA> ASV381 11-25 1346
## 529 <NA> ASV529 11-25 636
## 198 <NA> ASV198 11-25 3710
## 326 <NA> ASV326 11-25 1665
Only one of these top 10 candidate non-contaminant ASVs was classified at the Genus level, and a couple weren’t even classified at the Phylum level.
Inspect the [BLAST against nt] results for the top10 candidates, excluding “Uncultured/environmental sample sequences”.
dada2:::pfasta(sq.filt[contam$ASV[i.top10]], id=contam$Phylum[i.top10])
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCTGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Firmicutes
## TACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGACTTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGGTCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGATACCCCTGTAGTCCCTGTCTCTTATA
## >NA
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCAGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >NA
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGATACCCCAGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTTGTAGTCCCTGTCTCTTATA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTGGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Firmicutes
## TACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTATGTAAGTCTGGAGTGAAAGCCCGGGGCCCAACCCCGGGACTGCTTTGGAAACTGTGTAACTGGAGTACAGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGCGGCGAAGGCGGCCTGCTGGACTGAAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCGGTAGTCCCTGTCTCTTATA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTTGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
On Dec 7, 2022, the best hits for these top 10 candidate non-contaminant ASVs were:
Candidates 2 and 8 are high-quality (100% coverage, >97%) hits to genuine bacterial sequences, but all others are clearly derived from the cat itself, likely off-target amplification of the mitochondria.
The prevalence score from decontam can be interpreted as a p-value. Let’s do that here, and perform Benjamini-Hochberg false-discovery rate correction to get an idea how strong the statistical evidence is supporting these top candidates.
sort(p.adjust(1-contam$p, method="BH"))[1:10]
## [1] 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035
## [8] 0.6612035 0.6612035 0.6612035
After FDR correction, the “adjusted p-value” is far from standard significance thresholds.
In sum, the totality of these results provides little to no evidence non-contaminant ASVs in these samples that can’t be explained by off-target amplification of cat DNA. Thus, we conclude that a core microbiome does not exist in feline bile.
That said, this analysis does not rule out transient or patchy opportunistic bacterial colonization of feline bile.