Which microbes go with each other?

Created: 2018-04-08 ; Updated: 2018-04-08

Microbes in the body are always part of a community, an ecology of interdependent organisms, some rising, some falling as the environment changes. By watching the shifts over time, I would expect to see some patterns: microbes that consume that same kinds of nutrients may go up and down together as the amounts of those nutrients change. Similarly, a microbe that depends on some waste product secreted by another organism should go up or down depending on the abundance levels of the other organism.

To find patterns, I’ll line up the abundances of every sample I’ve taken, then run a simple correlation analysis to see which microbe levels are most highly-correlated.

Prerequisites: The following example is written in R and I’ve written an R package, actino that converts between uBiome raw data files and Phyloseq, an excellent microbiome analysis tool developed at Stanford’s Bioconductor program.

I start with a Phyloseq object called gut.best that contains the normalized abundances of all my gut microbes. If you follow the actino directions, you should be able to create a similar object for your own data.

Then simply turn that Phyloseq object into a single matrix:

gut.mat <- as.matrix(otu_table(gut.best))

But some taxa are quite rare, occuring in just a few out of hundreds of samples. Let’s ignore any sample where a taxa has fewer than 10 reads; finally, of those that remain, let’s arbitrarily eliminate any taxa that occur fewer than five times. Here we show the resulting total number of samples.

g <- apply(gut.mat,1,function(x) ncol(gut.mat)-sum(x<10)) >= 5
gut.mat <- gut.mat[g,]
nrow(gut.mat)

## [1] 175

Now run the correlations, showing the top 10…

matCorrs<-cor(t(gut.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
                   row = dimnames(matCorrs)[[1]][ind[,1]] ,
                   val = matCorrs[ ind ] )

#head(mCorr[order(mCorr$val, decreasing = TRUE),],10)

mCorr %>% arrange(desc(val)) %>% head(10) %>% knitr::kable()

col	row	val
Aquabacterium	Methylobacterium	0.9995393
Methylobacterium	Bradyrhizobium	0.9993668
Aquabacterium	Bradyrhizobium	0.9992332
Methylobacterium	Phyllobacterium	0.9991739
Aquabacterium	Phyllobacterium	0.9986580
Bradyrhizobium	Phyllobacterium	0.9985602
Weissella	Veillonella	0.9951920
Veillonella	Peptostreptococcus	0.9942449
Weissella	Peptostreptococcus	0.9907539
Veillonella	Campylobacter	0.9838069

…and the bottom:

tail(mCorr[order(mCorr$val, decreasing = TRUE),],10) %>% knitr::kable()

	col	row	val
1354	Butyricimonas	Collinsella	-0.4043263
1315	Blautia	Akkermansia	-0.4108318
767	Akkermansia	Pseudobutyrivibrio	-0.4135541
12254	Fusicatenibacter	Acidaminococcus	-0.4140983
776	Akkermansia	Dorea	-0.4193126
6814	Papillibacter	Collinsella	-0.4442487
769	Akkermansia	Collinsella	-0.4580287
8413	Anaerovorax	Collinsella	-0.4769236
1880	Terrisporobacter	Oscillibacter	-0.5099724
747	Akkermansia	Roseburia	-0.5741032

For fun, let’s run the same correlation on other people. I have a private collection of a few hundred samples that others have sent me, stored in the Phyloseq object people.norm. Let’s run the above calculations on those samples to see if the results are similar

people.gut <- subset_samples(people.norm, Site == "gut" & Reads > 10000 & Condition == "Healthy")
people.best <- prune_taxa(taxa_sums(people.gut)>42,people.gut) 
people.mat <- as.matrix(otu_table(people.best))

g <- apply(people.mat,1,function(x) ncol(people.mat)-sum(x<10)) >= 5
people.mat <- people.mat[g,]

nrow(people.mat)

## [1] 154

matCorrs<-cor(t(people.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr.people<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
                   row = dimnames(matCorrs)[[1]][ind[,1]] ,
                   val = matCorrs[ ind ] )

# head(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)
# tail(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)

Here are the most and least-correlated taxa for all people:

mCorr.people %>% arrange(desc(val)) %>% head() %>% knitr::kable()

col	row	val
Aerococcus	Actinobaculum	0.9991366
Aerococcus	Solobacterium	0.9970899
Ochrobactrum	Delftia	0.9966713
Actinobaculum	Solobacterium	0.9962508
Pyramidobacter	Actinobaculum	0.9959956
Aerococcus	Pyramidobacter	0.9955293

mCorr.people %>% arrange(desc(val)) %>% tail() %>% knitr::kable()

	col	row	val
11776	Erysipelatoclostridium	Sarcina	-0.3354350
11777	Blautia	Sarcina	-0.3383840
11778	Sarcina	Bacteroides	-0.3400012
11779	Terrisporobacter	Blautia	-0.3431375
11780	Peptococcus	Blautia	-0.3572107
11781	Subdoligranulum	Bacteroides	-0.3743589

There are more unique taxa in the sample of people than there are in me. That makes sense, since you’d expect more diversity amount lots of people. Here are the taxa that are in me but not in people.best:

setdiff(rownames(gut.mat),rownames(people.mat))

##  [1] "Oligella"           "Achromobacter"      "Stenotrophomonas"  
##  [4] "Ralstonia"          "Shinella"           "Neisseria"         
##  [7] "Pasteurella"        "Rothia"             "Johnsonella"       
## [10] "Phyllobacterium"    "Acinetobacter"      "Planomicrobium"    
## [13] "Pediococcus"        "Tissierella"        "Bradyrhizobium"    
## [16] "Methylobacterium"   "Methanosphaera"     "Sphingomonas"      
## [19] "Aquabacterium"      "Pelomonas"          "Anaerobacter"      
## [22] "Azospira"           "Trueperella"        "Parasporobacterium"
## [25] "Raoultella"         "Hafnia"             "Rahnella"          
## [28] "Sedimentibacter"    "Tessaracoccus"      "Fretibacterium"    
## [31] "Caldicoprobacter"   "Geobacillus"        "Anaerobacterium"   
## [34] "Coprobacillus"      "Desulfitibacter"    "Enorma"            
## [37] "Proteiniphilum"

setdiff(rownames(people.mat),rownames(gut.mat))

##  [1] "Senegalimassilia" "Eremococcus"      "Negativicoccus"   "Dermabacter"     
##  [5] "Butyricicoccus"   "Olsenella"        "Howardella"       "Acetivibrio"     
##  [9] "Syntrophococcus"  "Megamonas"        "Parvibacter"      "Cellulosilyticum"
## [13] "Actinobaculum"    "Alloscardovia"    "Anaeroglobus"     "Aerococcus"

Let’s see if a few common taxa are similarly correlated:

Here are the microbes that are most and least correlated with Blautia

in all people:

mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()

col	row	val
Blautia	Anaerostipes	0.2092081
Blautia	Pseudobutyrivibrio	0.1837263
Blautia	Dorea	0.1725876
Blautia	Anaerotruncus	0.1682866
Blautia	Klebsiella	0.1620616
Blautia	Marvinbryantia	0.1250385

mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()

col	row	val
Blautia	Sarcina	-0.3383840
Blautia	Oscillibacter	-0.3260838
Blautia	Odoribacter	-0.2945652
Blautia	Fibrobacter	-0.2704612
Blautia	Victivallis	-0.2434587
Blautia	Oscillospira	-0.2403851

and just in me:

mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()

col	row	val
Blautia	Dorea	0.6234608
Blautia	Collinsella	0.5392529
Blautia	Anaerostipes	0.5102585
Blautia	Pseudobutyrivibrio	0.4512732
Blautia	Hespellia	0.4249490
Blautia	Roseburia	0.3655861

mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()

col	row	val
Blautia	Akkermansia	-0.4108318
Blautia	Barnesiella	-0.2996382
Blautia	Thalassospira	-0.2952619
Blautia	Alistipes	-0.2675867
Blautia	Bacteroides	-0.2160841
Blautia	Odoribacter	-0.2121982

Interestingly, at least among the top microbes, there does seem to be some agreement (Blautia - Dorea, Blautia Anaerostipes). Just a coincidence? Hmm..

If I can think of a better way to present this information – or if you have any suggestions for me – I’ll update this post.