9.3 Activity - Exploring 16S rRNA Data with phyloseq
9.3.1 Purpose
To analyze bacterial diversity based on 16S rRNA gene sequencing using data from the “Impact of a 7-day homogeneous diet on interpersonal variation in human gut microbiomes and metabolomes” by Guthrie et al., 2023. This study is also referred to as the MISO study for “Microbiome Individuality and Stability Over Time” because the study aims to understand variation (or stability) of microbiomes across individuals.
This activity aims to analyze metagenomic diversity using the following R packages:
- phyloseq: A popular package for taxonomy profiling
- DESeq2: A package originally designed for differential expression which we will use for differential abundance
- ggplot2: A tidyverse package that produces high-quality figures
9.3.2 Learning objectives
- Use phyloseq to cluster samples based on the similarity of their microbial compositions using multidimensional scaling methods (NMDS or PCoA)
- Use phyloseq to profile alpha diversity (Shannon and Simpson’s indices)
- Use DESeq2 to perform differential abundance analysis
9.3.3 Introduction
16S data can be manipulated and visualized in a variety of ways. In Google Sheets, we explored the data manually to gain an understanding at the level of 1-10 ASVs. Through phyloseq we collapsed ASVs with the same taxonomic annotation to survey microbial diversity and perform the same exploration across the entire dataset. In this activity, we will use the functions of phyloseq and DESeq2 that are more advanced. These analyses would be extremely difficult and tedious to do manually and will allow us deeper insights into our dataset. We will cluster samples based on similarity of ASV counts, survey alpha diversity, and search for differentially abundant ASVs between different groups.
9.3.4 Activity 1 – Analyze 16S rRNA Data with phyloseq tutorial
Estimated time: 35 min
Activity 1. Explore a phyloseq object through the “Analyze 16S rRNA Data with phyloseq” tutorial on SciServer.
- Access the C-MOOR Tutorials
If you are using SciServer, log into SciServer, click on compute and open your “C-MOOR LearnR” container. Visit SciServer Guides and FAQs if you need to jog your memory on how to do this.
If you are using AnVIL, log into AnVIL, navigate to your class Workspace, start up an RStudio Cloud Environment, and open RStudio. Visit the AnVIL Guides and FAQs if you need to jog your memory on how to do this. This module can be found in the “2-analyze-ps” folder of the “16s” curriculum folder.
If you are using an alternative setup, follow the instructions provided by your instructor.
- Start the “Analyze 16S rRNA Data with phyloseq” tutorial. Visit SciServer Guides and FAQs. If you need assistance accessing the tutorial.
- To move through the activities click “Continue” at the bottom of the screen. When you are done with a topic, click “Next Topic” to move on.
- This tutorial has small boxes in which you can enter and run short lines of code to analyze the data.
- As you work through the tutorial, take snapshots of your work and paste your answers in the grey boxes below:
| 1-1. Alpha diversity section: Take a snapshot and paste the code for an alpha diversity plot (Simpson) from the quiz question: What male subject has the data point for the LOWEST alpha diversity? HINT: Use subset_samples() to subset males, and specify individuals (subject) on x-axis of the alpha diversity plot |
|---|
| 1-2. PCoA section: Take a snapshot and paste the code for a PCoA plot from the quiz question: In a PCoA with only data from ASVs with the class Bacteroidia, what is the percent of variance in the dataset explained by principal coordinate 1? HINT: You will need to change the code subsetting phylum and Firmicutes |
|---|
| 1-3. Differential abundance: Take a snapshot and paste the code for the differential abundance plot for the quiz question: Which of the following Phylum have ASVs that are differentially abundant between the subject S02 and subject S03? |
|---|
9.3.5 Activity 2 – Try it out questions
Estimated time: 60 min
With your group, perform some exploratory data analysis selecting from one of the four questions below or coming up with your own question. Each question has its own section with code templates for you to use below. Select a question here, then go to the page specific to your question.
9.3.5.1 Question 1. How sensitive is microbial diversity to variables like diet, age and gender?
Alpha diversity is a measure that estimates how the distribution of microbes changes due to a variable (or metadata category). Alpha diversity measures changes in the richness (the number of different organisms or ASVs) and evenness (how evenly are these organisms distributed in terms of their abundance). Using the “MISO” study dataset we will use Simpson (or specifically, Gini-Simpson) alpha diversity to evaluate changes in microbial diversity in individuals due to different metadata variables.
Approach: Plot Simpson alpha diversity using plot_richness() command in phyloseq and assess the impact of different study variables on changes in microbial diversity. Identify variables that impact alpha diversity. Visible shifts in alpha diversity suggest a shift in microbial diversity, and a higher alpha diversity value indicates an increase in alpha diversity (either richness or abundance).
- Evaluate the impact of diet on alpha diversity by plotting ASVs based on “diet” variable.
- Evaluate the impact of individuality on alpha diversity by plotting ASVs based on “subject” variable.
- Evaluate the impact of gender on alpha diversity by plotting ASVs based on “age” variable.
- Evaluate the impact of gender on alpha diversity by plotting ASVs based on “gender” variable.
- Evaluate the impact of gender on alpha diversity by plotting ASVs based on the 5 different levels of metabolites in the study (Creatinine, PCS, IS, HIPP, PAG).
9.3.5.2 Question 2. Do diet, age, gender and levels of metabolites correlate with microbe variation between individuals?
A PCoA plot is a principal coordinate analysis used to represent similarity between samples (sample microbiomes in our case). Using the “MISO” study dataset, we will use the PCoA plot to summarize individuals based on ASVs and plot the resulting relationships between individuals. Based on how well color-coding the different variables matches the sample distribution on the PcOA plot, we will aim to help explain potential sources of sample similarity.
Approach: Perform multidimensional scaling (also known as principal component analysis) to establish a relationship between the samples given multivariate data (metadata variables). Using a PcOA plot (via commands ordinate() and plot_ordination() in phyloseq), you will condense the original high-dimensional data into a low-dimensional one by converting data to distance map (matrix) with 2 dimensions, x and y, that best explain variability in your data. From your PCoA plot you will assess the contribution of different study variables to sample diversity and identify variables that correlate with sample diversity. In a PcOA plot, samples with similar microbial profiles will be plotted close and may appear as groups.
- Make a PCoA plot, ordinate on the entire dataset (all ASVs) and color by individual. Investigate the shape of the resulting PCoA plot.
- Correlate PCoA plot shape with metadata variables by coloring the PCoA plot with different variables including diet, subject, age and gender. Do any of the variables correlate with the shape of PCoA plot and data groupings?
- Correlate PCoA plot shape with the levels of metabolites in the study (Creatinine, PCS, IS, HIPP, PAG). From the color-coding pattern, identify 1. which variables help potentially explain the data groups formed in the PCoA plot. Subset “HD”, ordinate on “HD”, and make a new PCoA plot.
9.3.5.3 Question 3. What microbes (ASVs) differ between males and females, and does age have an impact?
Approach: Perform DESeq2 differential abundance analysis between females and males and determine how many differentially abundant microbes are there between the sexes. Then examine if age has a further impact on the differential abundance of the microbes between the sexes.
- Perform DESeq2 analysis between females and males and identify differences.
- Test if younger age contributes to differential microbe abundance between females and males by subsetting younger (< 50 years old) individuals.
- Test if older age contributes to differential microbe abundance between females and males by subsetting older (> 50 years old) individuals.
9.3.5.4 Question 4. Is there diet and age interaction and what microbes (ASVs) correlate with changes in diet-age interaction?
Variables and conditions can interact with each other in complex ways. Here we will try to tease apart the relationship between the gut microbiome, diet, and age. What does an interaction look like? In this example, we’re looking at whether or not a change in diet affects people of different ages in different ways. We might imagine for example that a young person’s gut is more resilient to change than an older person’s or vice versa. Changes at one age group may be different in the other.
Approach: Use alpha diversity measure and DESeq2 tools to answer this question. Using alpha diversity, determine if there is an interaction between diet and age. Then use DESeq2 to see if any ASVs are associated with changes in die-age interaction. Using Simpson alpha diversity measure (or specifically, Gini-Simpson) evaluate how microbial diversity changes with age and diet in general, or age and BD, HD and WO diet specifically. Then, use DESeq2 to evaluate if younger or older age changes ASVs associated with diet.
- Plot alpha diversity based age for the population in general, and then for individuals subsetted for BD, HD and WO diets. Look for shifts in alpha diversity with change in diet.
- Perform DESeq2 analysis based on the diet for the population in general, establishing baseline differential abundance between dietary groups HD and BD.
- Perform DESeq2 analysis based on the diet for the younger (<= 50 yo) and older (>=50 yo) individuals, evaluating age-associated changes in differential abundance between dietary groups HD and BD.
9.3.6 Grading criteria
- Download the assignment to your local computer as a .docx, complete it, and upload the assignment to your LMS (Blackboard, Canvas, Google Classroom).