Load packages.
library(decontam); packageVersion("decontam")
## [1] '1.18.0'
library(ggplot2); packageVersion("ggplot2")
## [1] '3.4.1'
library(phyloseq); packageVersion("phyloseq")
## [1] '1.42.0'
Note, the imported data has already been curated in the companion
scripts FelineBileCuration.Rmd
and
FelineBileCulture.Rmd
.
load(file="../RDS/bile_curated.rda")
if(!identical(rownames(tax.filt), colnames(st.filt))) stop("Table mismatch: tax.filt, st.filt")
if(!identical(rownames(df), rownames(st))) stop("Mismatched samples in df and st.")
if(!identical(rownames(df), rownames(st.filt))) stop("Mismatched samples in df and st.")
ft.filt <- sweep(st.filt, 1, rowSums(st.filt), "/")
In terms of the specific taxa detected by culture, is there any concordance with the frequencies of those same taxa as measured by the 16S community sequencing approach? We’ll start this investigation by augmenting the sample metadata with the proportion of each of the taxa tested for by culture – E. coli, Clostridium spp., Enterococcus, Streptococcus and Peptostreptococcus – as measured by 16S sequencing. Since each of these is effectively a genus-level taxon we will use the Silva taxonomic assignments at the genus level to identify the corresponding ASVs. E. coli appears as the genus Escherichia-Shigella in these taxonomic assignments. The true taxonomic range of the culture assay for Clostridium spp. is rather unclear, but we will associate it with the genus designation of Clostridium sensu stricto 1.
is.ecoli <- tax.filt[,"Genus"] %in% "Escherichia-Shigella"
is.clost <- tax.filt[,"Genus"] %in% "Clostridium sensu stricto 1"
is.entero <- tax.filt[,"Genus"] %in% "Enterococcus"
is.strep <- tax.filt[,"Genus"] %in% "Streptococcus"
is.pepto <- tax.filt[,"Genus"] %in% "Peptostreptococcus"
c(Ecoli=sum(is.ecoli), Clost=sum(is.clost), Entero=sum(is.entero), Strep=sum(is.strep), Pepto=sum(is.pepto))
## Ecoli Clost Entero Strep Pepto
## 2 3 2 6 0
All of these genera except for Peptostreptococcus were detected in the 16S sequencing results. With a relatively small number of ASVs representing each genus. Inspecting the distribution of the 16S-measured proportions of these genera across the samples, and relative to the culture results.
df$EcoliProportion <- rowSums(ft.filt[,is.ecoli])
df$ClostProportion <- rowSums(ft.filt[,is.clost])
df$EnteroProportion <- rowSums(ft.filt[,is.entero])
df$StrepProportion <- rowSums(ft.filt[,is.strep])
df$PeptoProportion <- rowSums(ft.filt[,is.pepto])
ggplot(data=df[order(df$EcoliProportion),],
aes(x=seq(nrow(df)), y=EcoliProportion, color=Ecoli)) +
geom_point() +
theme(axis.title.x=element_blank()) +
ylab("Ecoli Proportion (16S)") +
labs(color="Ecoli Present (Culture)")
Strong concordance between culture positivity for E. coli and measured 16S proportion. One outlier sample.
ggplot(data=df[order(df$ClostProportion),],
aes(x=seq(nrow(df)), y=ClostProportion, color=Clost)) +
geom_point() +
theme(axis.title.x=element_blank()) +
ylab("Clostridium Proportion (16S)") +
labs(color="Clostridium Present (Culture)")
Agreement again, albeit there was only one sample in which Clostridium was detected by culture.
ggplot(data=df[order(df$EnteroProportion),],
aes(x=seq(nrow(df)), y=EnteroProportion, color=Enter)) +
geom_point() +
theme(axis.title.x=element_blank()) +
ylab("Enterococcus Proportion (16S)") +
labs(color="Enterococcus Present (Culture)")
No concordance here, although again only one culture-positive sample.
ggplot(data=df[order(df$StrepProportion),],
aes(x=seq(nrow(df)), y=StrepProportion, color=Strep)) +
geom_point() +
theme(axis.title.x=element_blank()) +
ylab("Streptococcus Proportion (16S)") +
labs(color="Streptococcus Present (Culture)")
No concordance, but just one culture-positive sample.
And as a reminder, Peptostreptococcus was not identified in the 16S data.
Overall, the culture positivity results and the 16S relative abundance results are largely concordant, especially for E. coli, the only taxon for which multiple samples were positive by culture.
We now consider the community-level 16S results, and compare them to our culture results. We start by looking at “alpha-diversity” in the 16S sequencing data. Our expectation is that most of the samples are just contamination, and will have a relatively consistent and high level of diversity reflecting the spectrum of contaminants, but that some samples with a legitimate infection will be lower in diversity because they are dominated by one or a small number of real species infecting the bile.
Calcluate Shannon and Simpson diversities for all our samples.
if(!identical(rownames(df), rownames(ft.filt))) stop("Table mismatch: df, ft")
df$Shannon <- apply(ft.filt, 1, function(x) sum(-x*log(x), na.rm=TRUE))
df$Simpson <- apply(ft.filt, 1, function(x) 1-sum(x*x))
ggplot(data=df, aes(x=Shannon, y=Simpson, color=NegativeControl)) + geom_point()
As expected there is a lot of correlation between the two diversity measures, especially in the low-diversity samples that seem to be separating from the negative control samples (and the bulk of the bile samples).
Looking into the relationship between diversity and culture positivity, while also plotting library size to check on any influence from uncertainly measured (low depth) samples.
ggplot(data=df[seq(nrow(df), 1, -1),], aes(x=Shannon, y=LibrarySize, color=Culture_Positive)) + geom_point() + xlab("Shannon Diversity (16S)")
That is strikingly clear. Save for two outliers, alpha-diversity separates samples into two modes: A low-diversity mode that is entirely culture-positive, and a high-diversity mode that is entirely (save the one exception) culture-negative and that includes all of the negative control samples. Library size is not confounding this signal.
Let’s check on those two outlier culture-positive samples.
df[df$Culture_Positive %in% TRUE & df$Shannon > 2,
c("SampleType", "Organisms", "Bactibilia", "Culture_Positive", "Ecoli", "Shannon", "LibrarySize")]
## SampleType Organisms Bactibilia Culture_Positive Ecoli
## 719-515 Positive Peptostreptococcus Nobactibilia TRUE FALSE
## 720-513 Positive Staphepidermis Nobactibilia TRUE FALSE
## Shannon LibrarySize
## 719-515 4.655519 94239
## 720-513 4.609725 72306
As a reminder, Bactibilia and E. coli culture positivity are fully concordant, and there are two samples which were culture positive for some bacteria, but not for E. coli:
table(Ecoli_Positive=df$Ecoli, Bactibilia_Positive=df$Bactibilia_Positive)
## Bactibilia_Positive
## Ecoli_Positive FALSE TRUE
## FALSE 48 0
## TRUE 0 6
table(Ecoli_Positive=df$Ecoli, Culture_Positive=df$Culture_Positive)
## Culture_Positive
## Ecoli_Positive FALSE TRUE
## FALSE 47 2
## TRUE 0 6
Thus, these two low-diversity “culture positive” outliers are exactly those two samples that were culture-positive samples but no E. coli was detected by culture, and Bactibilia was not detected by microscopy. Or, put another way, there is full concordance in identifying the same set of six samples by microscopic identification of bactibilia, by culture-based detection of E. coli, and by 16S-based identification of low-diversity samples.
For the record, the six samples are:
sam.infected <- df$SampleID[df$Ecoli %in% TRUE]
sam.infected
## [1] "716-516" "716-520" "716-521" "718-515" "720-517" "721-520"
Creating the publication formatted and labelled figure.
df$Fig3 <- "Bile Samples"
df$Fig3[df$NegativeControl] <- "Negative Control Samples"
fig3 <- ggplot(data=df[seq(nrow(df), 1, -1),],
aes(x=Shannon, y=LibrarySize, color=Ecoli)) +
geom_point() +
facet_wrap(~Fig3) +
xlab("Shannon Diversity (16S)") + ylab("Sequencing Depth") +
theme_bw()
fig3
ggsave("../Figures/Figure3.pdf", fig3, width=6, height=4, units="in")
We’ll also take a look at how the samples organize using ordination and the Bray-Curtis community dissimilarity metric.
ps <- phyloseq(sample_data(df),
otu_table(ft.filt, taxa_are_rows=FALSE),
tax_table(tax.filt))
ord <- ordinate(ps, method="MDS", distance="bray")
f <- plot_ordination(ps, ord, color="Ecoli") + facet_wrap(~SampleType)
f
The six samples identified by culture, microscopy and alpha-diversity
also stand out in the Bray-Curtis ordination. It is also clear that the
negative control samples intermingle with the rest of the bile
(Positive
) samples, supporting the idea that those samples
are mostly just off-target amplification/contamination, with little to
no real signal.
Now combining this into one panel with publication formatting and labeling.
df$Fig2 <- "Negative Control" # Technical negative controls, all types
df$Fig2[df$Patient.Control %in% "Control"] <- "Control case" # Real samples, from healthy "control" cats
df$Fig2[df$Patient.Control %in% "Patient"] <- "Clinical case" # Real samples, from clinical cases
df$Fig2[df$Patient.Control %in% "Patient" & df$Ecoli] <- "Clinical case (E. coli culture positive)" # Real samples, from clinical cases, that were E. coli positive
df$Fig2 <- factor(df$Fig2, levels=c("Negative Control", "Control case", "Clinical case", "Clinical case (E. coli culture positive)"))
# Sanity check
table(df$Fig2, df$Patient.Control, useNA="ifany")
##
## Control ControlTechnical Patient
## Negative Control 0 2 0
## Control case 17 0 0
## Clinical case 0 0 34
## Clinical case (E. coli culture positive) 0 0 6
##
## PatientTechnical Technical <NA>
## Negative Control 2 9 4
## Control case 0 0 0
## Clinical case 0 0 0
## Clinical case (E. coli culture positive) 0 0 0
table(df$Fig2, df$NegativeControl, useNA="ifany")
##
## FALSE TRUE
## Negative Control 0 17
## Control case 17 0
## Clinical case 34 0
## Clinical case (E. coli culture positive) 6 0
Sanity check passed.
scale.fig2 <- scale_color_manual(values=c("Negative Control"="black",
"Control case"="dodgerblue",
"Clinical case"="darkorange3",
"Clinical case (E. coli culture positive)"="chartreuse3"))
ps.fig2 <- phyloseq(sample_data(df),
otu_table(ft.filt, taxa_are_rows=FALSE),
tax_table(tax.filt))
ord.fig2 <- ordinate(ps.fig2, method="MDS", distance="bray")
fig2 <- plot_ordination(ps.fig2, ord.fig2, color="Fig2") +
scale.fig2 +
theme_bw() +
theme(legend.title = element_blank(),
legend.position = c(0.7, 0.84),
legend.spacing.y = unit(0.0, 'in'),
legend.background = element_blank(),
legend.key = element_blank(),
legend.box.spacing = unit(0.0, "in"))
fig2
ggsave("../Figures/Figure2.pdf", fig2, width=6, height=4, units="in") ## uncomment to regenerate figure
Finally, what is in the putatitive contaminant background? And what is in those six standout samples?
ps.contam <- subset_samples(ps, !Ecoli %in% TRUE)
ps.infect <- subset_samples(ps, Ecoli %in% TRUE)
genera.tot <- sort(tapply(colSums(ft.filt), tax.filt[,"Genus"], sum), decreasing=TRUE)
genera.contam <- sort(tapply(colSums(ft.filt[!df$Ecoli %in% TRUE,]), tax.filt[,"Genus"], sum), decreasing=TRUE)
genera.infect <- sort(tapply(colSums(ft.filt[df$Ecoli %in% TRUE,]), tax.filt[,"Genus"], sum), decreasing=TRUE)
head(genera.contam, n=20)
## Anoxybacillus Bacteroides
## 10.8275051 4.9363788
## Ileibacterium Dubosiella
## 4.1332786 1.9949405
## Bifidobacterium Ligilactobacillus
## 1.8639262 1.1732673
## Coriobacteriaceae UCG-002 Lachnospiraceae NK4A136 group
## 1.1013964 1.0996733
## Helicobacter Alistipes
## 1.0381067 1.0172288
## Paenibacillus Enorma
## 0.9505098 0.9422189
## Brevibacillus Geobacillus
## 0.8971653 0.8959039
## Enterorhabdus Mucispirillum
## 0.8687622 0.8290046
## Rikenellaceae RC9 gut group Subdoligranulum
## 0.7703743 0.7572878
## Olsenella Anaerobacillus
## 0.7552610 0.7255287
Anoxybacillus is the most abundant across the “non-infected” samples, followed by a ong tail of other genera.
head(genera.infect, n=10)
## Escherichia-Shigella Bacillus
## 3.734797210 0.517202487
## Rothia Bacteroides
## 0.446614995 0.413100951
## Actinomyces Clostridium sensu stricto 1
## 0.337290579 0.269699840
## Streptococcus Enterococcus
## 0.139063342 0.044901842
## Subdoligranulum Bifidobacterium
## 0.002805228 0.002041338
These samples look very different. As might be expected (since all
six were culture-positive for E. coli),
Esherichia-Shigella
is the most abundant genus. This is
followed by 6-7 other genera, before dropping off dramatically to
minimal levels.
Create a bar plot of the composition in the six “infected” samples, keeping just the genera with >0.5% average abundance:
genera.infect.keep <- names(genera.infect)[genera.infect/6 > 0.005]
ps.infect.top <- subset_taxa(ps.infect, Genus %in% genera.infect.keep)
plot_bar(ps.infect.top, fill="Genus")
Interestingly, while E. coli is the largest contributor to
most of these samples, it is often appearing as what might be a
co-infection with 1-3 other bacterial genera. And in sample
716-520
, it is not E. coli that dominates the
sample, but rather a mixture of Bacillus and
Rothia.
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.0.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] phyloseq_1.42.0 ggplot2_3.4.1 decontam_1.18.0
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.58.0 sass_0.4.5 jsonlite_1.8.4
## [4] splines_4.2.3 foreach_1.5.2 bslib_0.4.2
## [7] highr_0.10 stats4_4.2.3 GenomeInfoDbData_1.2.9
## [10] yaml_2.3.7 pillar_1.9.0 lattice_0.20-45
## [13] glue_1.6.2 digest_0.6.31 XVector_0.38.0
## [16] colorspace_2.1-0 htmltools_0.5.5 Matrix_1.5-3
## [19] plyr_1.8.8 pkgconfig_2.0.3 zlibbioc_1.44.0
## [22] scales_1.2.1 tibble_3.2.1 mgcv_1.8-42
## [25] generics_0.1.3 farver_2.1.1 IRanges_2.32.0
## [28] cachem_1.0.7 withr_2.5.0 BiocGenerics_0.44.0
## [31] cli_3.6.1 survival_3.5-5 magrittr_2.0.3
## [34] crayon_1.5.2 evaluate_0.20 fansi_1.0.4
## [37] nlme_3.1-162 MASS_7.3-58.3 vegan_2.6-4
## [40] textshaping_0.3.6 tools_4.2.3 data.table_1.14.8
## [43] lifecycle_1.0.3 stringr_1.5.0 Rhdf5lib_1.20.0
## [46] S4Vectors_0.36.2 munsell_0.5.0 cluster_2.1.4
## [49] Biostrings_2.66.0 ade4_1.7-22 compiler_4.2.3
## [52] jquerylib_0.1.4 GenomeInfoDb_1.34.9 systemfonts_1.0.4
## [55] rlang_1.1.0 rhdf5_2.42.0 grid_4.2.3
## [58] RCurl_1.98-1.12 iterators_1.0.14 rhdf5filters_1.10.1
## [61] biomformat_1.26.0 rstudioapi_0.14 igraph_1.4.1
## [64] bitops_1.0-7 labeling_0.4.2 rmarkdown_2.21
## [67] gtable_0.3.3 codetools_0.2-19 multtest_2.54.0
## [70] DBI_1.1.3 reshape2_1.4.4 R6_2.5.1
## [73] knitr_1.42 dplyr_1.1.1 fastmap_1.1.1
## [76] utf8_1.2.3 ragg_1.2.5 permute_0.9-7
## [79] ape_5.7-1 stringi_1.7.12 parallel_4.2.3
## [82] Rcpp_1.0.10 vctrs_0.6.1 tidyselect_1.2.0
## [85] xfun_0.38