8.3 Activity - Exploring 16S rRNA Data with phyloseq

8.3.0.1 Purpose

To explore bacterial diversity based on 16S rRNA gene sequencing using data from the “Impact of a 7-day homogeneous diet on interpersonal variation in human gut microbiomes and metabolomes” by Guthrie et al., 2023. This study is also referred to as the MISO study for “Microbiome Individuality and Stability Over Time” because the study aims to understand variation (or stability) of microbiomes across individuals.

This activity aims to explore and profile the metagenomic diversity using the R/Bioconductor package phyloseq which leverages various R-based tools to produce publication-quality taxonomy profiling and analysis graphics.

8.3.1 Learning objectives

  1. Use phyloseq to explore data associated with Amplicon Sequence Variants (ASVs).
  2. In phyloseq, subset data based on metadata and taxonomy.
  3. In phyloseq, profile and plot taxonomy.

8.3.2 Introduction

The most popular sequencing technique for the analysis of bacterial diversity is targeted sequencing, or sequencing of a specific gene (or region of a gene, e.g. a hypervariable region of the bacterial 16S ribosomal rRNA gene) using polymerase chain reaction (PCR) to create sequences called amplicons. Sequence variation in the resulting amplicons creates amplicon sequence variants (ASVs). ASVs varying from as little as one single nucleotide can be defined as separate ASVs, which can be further clustered into OTUs (Operational Taxonomic Units) based on sequence similarity; e.g. as ASVs within 1% sequence difference can clustered into the same species/OTU.

8.3.3 Activity 1 – Explore 16S rRNA Data with phyloseq tutorial

Estimated time: 25 min

Explore a phyloseq object through the “Explore 16S rRNA Data with phyloseq” tutorial on SciServer.

Log into SciServer, click on compute, and create a new C-MOOR LearnR container. When creating a container, remember to:

  1. Access the C-MOOR Tutorials
  • If you are using SciServer, log into SciServer, click on compute and open your “C-MOOR LearnR” container. Visit SciServer Guides and FAQs if you need to jog your memory on how to do this.

  • If you are using AnVIL, log into AnVIL, navigate to your class Workspace, start up an RStudio Cloud Environment, and open RStudio. Visit the AnVIL Guides and FAQs if you need to jog your memory on how to do this. This module can be found in the “1-explore-ps” folder of the “16s” curriculum folder.

  • If you are using an alternative setup, follow the instructions provided by your instructor.

  1. Start the “Explore 16S rRNA Data with phyloseq” tutorial. Visit SciServer Guides and FAQs. If you need assistance accessing the tutorial.
  2. To move through the activities click “Continue” at the bottom of the screen. When you are done with a topic, click “Next Topic” to move on.
  3. This tutorial has small boxes in which you can enter and run short lines of code to analyze the data.
  4. As you work through the tutorial, take snapshots of your work and paste your answers in the grey boxes below:
1-1. OTU_table section – Take a snapshot and paste your code and the output for the following question in the tutorial: “What is the normalized count for ASV115 in sample 10?”


1-2. Visualizing Taxonomy section – Take a snapshot and paste your code and the output for the following question in the tutorial: “What timepoint has the lowest proportion of phylum Verrucomicrobia?”


Using subset_taxa() section – Take a snapshot and paste your code and the output for the following question in the tutorial: “What are the 2 most abundant orders in the class Gammaproteobacteria?”


8.3.4 Activity 2 – Try it out questions

Estimated time: 60 min

With your group, perform some exploratory data analysis selecting from one of the four questions below or coming up with your own question. Each question has its own section with code templates for you to use below. Select a question here, then go to the page specific to your question.

8.3.4.1 Question 1. What has more impact on human microbial variation, diet or individuality?

Diet is suggested to play an important role in shaping the human gut microbiome. However, other factors that are specific to an individual such as their physical fitness, metabolism, and genotype are also suspected to play a strong role. Here we will explore the impact of diet and individuality on human gut microbiome.

Approach: Examine high-level (Phylum) taxonomy for 5 timepoints and 21 individuals and evaluate if the change of diet or inter-individual differences have more impact on microbe variation.

  1. Evaluate the effect of diet – Plot the three dietary groups BD, HD, and WO corresponding to 5 experimental timepoints (where timepoints 1 and 2 are BD, timepoints 3 and 4 are HD and timepoint 5 is WO).
  2. Evaluate the effect of individuality – Plot the microbiome for all 21 individuals but focus on just the HD timepointgroup (where diet is the same).

8.3.4.2 Question 2. What are some of the most abundant microbes when viewed at different taxonomic resolutions?

Microbes within the core microbiome - the most common and abundant species across samples in a given group - are likely to be involved in key functions of the holobiont. Here we will explore some of the most common microbes at increasing levels of taxonomic resolution.

Approach: Examine microbial composition for most abundant taxonomic ranks in the human gut of the 21 individuals. To do so, you will use progressive sub-setting of the most abundant taxonomic ranks ranging from Phylum down to Species.

  1. Plot all the Phyla and identify the most abundant Phylum
  2. Subset the most abundant Phylum, and plot all the Orders
  3. Subset the most abundant Order, and plot all the Families
  4. Subset the most abundant Family, and plot all the Genera
  5. Subset the most abundant Genus, and plot all the Species

8.3.4.3 Question 3. Does gender have an impact on the human gut microbiome?

The biological sex of the host has been suggested to help shape or influence its gut microbiome. Some candidate microbes have been implicated in sex differences. Here we will explore the potential gender-based differences in the human gut microbiome.

Approach: Test if Phyla and/or species vary based on gender. First you will survey the differences between male and female subjects in the composition of their Phylum and species. Then you will select some candidate microbes of interest to examine in further detail.

  1. Survey differences in phylum and species composition by gender.
  2. Plot differences in species for a high abundance phylum by gender.
  3. Plot differences in species for a low abundance phylum by gender.
  4. Plot two candidate microbes reported in the literature to vary with gender.

8.3.4.4 Question 4. Does age have an impact on human gut microbiomes?

The human gut microbiome has been associated with age-related disease states, immune-system changes, and metabolic function. Here we will explore the potential microbiome changes associated with aging.

Approach: Test if Phylum and/or species composition differs by age across the dataset. First you will plot the relationship between age and phylum; we will see that age is a more challenging variable to work with than our categorical variables. You will collapse and transform the data to fix this issue to create a final plot for your interpretation.

  1. Plot all Phyla based on age.
  2. Normalize age that includes multiple individuals to avoid overestimation (e.g., 2 individuals are of age 27, 46 and 58, and 3 individuals are 54).
  3. Profile candidate Phyla Firmicutes based on age.
  4. Profile candidate Phyla Bacteroidetes based on age.

8.3.5 Grading criteria

  • Download the assignment to your local computer as a .docx, complete it, and upload the assignment to your LMS (Blackboard, Canvas, Google Classroom).

8.3.6 Footnotes

8.3.6.2 Contributions and affiliations

  • Valeriya Gaysinskaya, Johns Hopkins University
  • Gauri Paul, Clovis Community College
  • Frederick Tan, Johns Hopkins University
  • Sayumi York, Notre Dame of Maryland University