Load packages.

library(decontam); packageVersion("decontam")
## [1] '1.11.1'
library(ggplot2); packageVersion("ggplot2")
## [1] '3.3.5'
library(dada2); packageVersion("dada2")
## Loading required package: Rcpp
## [1] '1.19.2'

Import Curated Data

Note, the imported data has already been curated in the companion script FelineBileCuration.Rmd.

load(file="../RDS/bile.rda")

Using decontam to look for a core microbiome

The decontam package can be used to discriminate between contaminants, which could be likely given the very low biomass (often below-limit-of-detection DNA concentrations) samples we are working with here, and “real” sequences actually present in the sample.

We’ll apply the decontam package to score ASVs as being more contaminant-like or real-sequence-like using the prevalence approach (see the decontam manuscript for more detail).

contam <- isContaminant(st.filt, neg=df$NegativeControl, method="prevalence")

Add some additional annotations to the data.frame output by isContaminant.

sq.filt <- colnames(st.filt)
if(!identical(sq.filt, rownames(contam))) stop("Mismatch between sq.filt and contam.")
names(sq.filt) <- paste0("ASV", seq_along(sq.filt))
contam$Phylum <- tax.filt[,"Phylum"]
contam$Genus <- tax.filt[,"Genus"]
contam$ASV <- names(sq.filt)
contam$Prevalence <- cut(contam$prev, c(0, 10, 25, 45, 9999), labels=c("1-10", "11-25", "26-45", "45+"))
contam$Abundance <- colSums(st.filt)
rownames(contam) <- NULL # To avoid printing long sequences
head(contam)
##         freq prev p.freq    p.prev         p contaminant           Phylum
## 1 0.11387662   72     NA 0.2954461 0.2954461       FALSE       Firmicutes
## 2 0.05388086   69     NA 0.8003070 0.8003070       FALSE   Proteobacteria
## 3 0.05487483   73     NA 0.3851351 0.3851351       FALSE       Firmicutes
## 4 0.02419335   73     NA 0.3851351 0.3851351       FALSE       Firmicutes
## 5 0.02054138   70     NA 0.5594520 0.5594520       FALSE     Bacteroidota
## 6 0.01668738   69     NA 0.4683594 0.4683594       FALSE Actinobacteriota
##                  Genus  ASV Prevalence Abundance
## 1        Anoxybacillus ASV1        45+    611129
## 2 Escherichia-Shigella ASV2        45+    395226
## 3        Ileibacterium ASV3        45+    284515
## 4           Dubosiella ASV4        45+    124657
## 5                 <NA> ASV5        45+    107794
## 6      Bifidobacterium ASV6        45+     90194

The “score” we will leverage is in the column p. The score ranges from 0 to 1, with high scores indicating a better match with a real-sample origin, and low scores a better match with a contaminant origin.

Borrowing some code from the reproducible analyses associated with the decontam manuscript to visualize the distribution of scores:

histo <- ggplot(contam, aes(x=p)) + 
  labs(x = 'decontam Score', y='Number of ASVs') + 
  geom_histogram(binwidth=0.02)

histo + facet_wrap(~Prevalence)
## Warning: Removed 1 rows containing non-finite values (stat_bin).

This distribution of scores provides little to no evidence for the presence of a core microbiome that is distinguishable from contamination. There is no high-score (real-sequence-like) mode in the scores, outside of perhaps the low prevalence (present in 1-10 samples) category. And, more concerning, the high-score fraction seems to decrease as the higher-prevalence ASVs are considered, which is where we would expect to observe the core microbiome if there was one.

Given these results, and previous observations of very low DNA concentration in these samples, is to use the score to identify the best candidate non-contaminants, and investigate those further.

i.top10 <- order(contam$p, decreasing=TRUE)[1:10]
contam[i.top10,]
##             freq prev p.freq    p.prev         p contaminant         Phylum
## 274 0.0007062372   15     NA 0.9939534 0.9939534       FALSE Proteobacteria
## 111 0.0016812982   43     NA 0.9929038 0.9929038       FALSE     Firmicutes
## 206 0.0012225258   14     NA 0.9915629 0.9915629       FALSE Proteobacteria
## 275 0.0006954256   14     NA 0.9915629 0.9915629       FALSE           <NA>
## 360 0.0004252741   14     NA 0.9915629 0.9915629       FALSE           <NA>
## 193 0.0013621147   13     NA 0.9883032 0.9883032       FALSE Proteobacteria
## 381 0.0003861779   13     NA 0.9883032 0.9883032       FALSE Proteobacteria
## 529 0.0001009390   13     NA 0.9883032 0.9883032       FALSE     Firmicutes
## 198 0.0013314009   12     NA 0.9838844 0.9838844       FALSE Proteobacteria
## 326 0.0005174679   12     NA 0.9838844 0.9838844       FALSE Proteobacteria
##                            Genus    ASV Prevalence Abundance
## 274                         <NA> ASV274      11-25      2245
## 111 [Ruminococcus] torques group ASV111      26-45      7970
## 206                         <NA> ASV206      11-25      3503
## 275                         <NA> ASV275      11-25      2227
## 360                         <NA> ASV360      11-25      1439
## 193                         <NA> ASV193      11-25      3868
## 381                         <NA> ASV381      11-25      1346
## 529                         <NA> ASV529      11-25       636
## 198                         <NA> ASV198      11-25      3710
## 326                         <NA> ASV326      11-25      1665

Only one of these top 10 candidate non-contaminant ASVs was classified at the Genus level, and a couple weren’t even classified at the Phylum level.

Inspect the [BLAST against nt] results for the top10 candidates, excluding “Uncultured/environmental sample sequences”.

dada2:::pfasta(sq.filt[contam$ASV[i.top10]], id=contam$Phylum[i.top10])
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCTGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Firmicutes
## TACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGACTTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGGTCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGATACCCCTGTAGTCCCTGTCTCTTATA
## >NA
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCAGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >NA
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGATACCCCAGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTTGTAGTCCCTGTCTCTTATA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTGGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA
## >Firmicutes
## TACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTATGTAAGTCTGGAGTGAAAGCCCGGGGCCCAACCCCGGGACTGCTTTGGAAACTGTGTAACTGGAGTACAGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGCGGCGAAGGCGGCCTGCTGGACTGAAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGCCATACGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTACAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCCGGTAGTCCCTGTCTCTTATA
## >Proteobacteria
## CGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCAGCCGCGGTCATACGATTAGCCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAGAAAAAAATATACTAAAGTTAAATTTTAACTAGGCCGTAGAAAGCTAGAGTTAACATAAAAATACAGCACGAAAGTAACTTTAACACCTCCGACTACACGACAGCTAAGACCCAAACTGGGATTAGAAACCCTTGTAGTCCCTGTCTCTTATACACATCTCCGAGCCCACGA

On Dec 7, 2022, the best hits for these top 10 candidate non-contaminant ASVs were:

  1. Felis catus Senzu DNA, chromosome: D2, American Shorthair breed
  2. [Ruminococcus] torques ATCC 27756 partial 16S rRNA gene
  3. Felis catus voucher N22b 12S ribosomal RNA gene, partial sequence; mitochondrial
  4. Felis catus Senzu DNA, chromosome: D2, American Shorthair breed
  5. Felis catus Senzu DNA, chromosome: D2, American Shorthair breed
  6. Felis catus mitochondrion, complete genome
  7. Felis catus Senzu DNA, chromosome: D2, American Shorthair breed
  8. Tyzzerella sp. strain 1XD42-41 16S ribosomal RNA gene, partial sequence
  9. Felis catus voucher N22b 12S ribosomal RNA gene, partial sequence; mitochondrial
  10. Felis catus Senzu DNA, chromosome: D2, American Shorthair breed

Candidates 2 and 8 are high-quality (100% coverage, >97%) hits to genuine bacterial sequences, but all others are clearly derived from the cat itself, likely off-target amplification of the mitochondria.

The prevalence score from decontam can be interpreted as a p-value. Let’s do that here, and perform Benjamini-Hochberg false-discovery rate correction to get an idea how strong the statistical evidence is supporting these top candidates.

sort(p.adjust(1-contam$p, method="BH"))[1:10]
##  [1] 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035 0.6612035
##  [8] 0.6612035 0.6612035 0.6612035

After FDR correction, the “adjusted p-value” is far from standard significance thresholds.

In sum, the totality of these results provides little to no evidence non-contaminant ASVs in these samples that can’t be explained by off-target amplification of cat DNA. Thus, we conclude that a core microbiome does not exist in feline bile.

That said, this analysis does not rule out transient or patchy opportunistic bacterial colonization of feline bile.