7.3 Activity - Introducing 16S rRNA Data Using Spreadsheets
7.3.1 Purpose
To explore bacterial diversity based on 16S rRNA gene sequencing using data from the “Impact of a 7-day homogeneous diet on interpersonal variation in human gut microbiomes and metabolomes” by Guthrie et al., 2023. This study is also referred to as the MISO study for “Microbiome Individuality and Stability Over Time” because the study aims to understand variation (or stability) of microbiomes in individuals.
This spreadsheet activity aims to orient you towards understanding the data and metadata content of this study. In subsequent activities, you will explore and analyze this data using the R/Bioconductor package phyloseq. Google Sheets and R are both useful for doing different analyses, so keep in mind the tools at your disposal and what will be the most efficient way to answer your research questions.
Please note that this activity takes place in Google Sheets. It is possible to do the same analysis in Microsoft Excel although the individual steps you will take are different. To help make things easier for everyone, we strongly recommend everyone uses Google Sheets; all our instructions will be specific to Sheets. Students who do not have a Google-associated email address can sign up for a free Gmail account and gain access to Google Sheets this way.
A Google Sheets cheatsheet that covers the necessary steps for this activity is available in our Spreadsheets 101 guide. Students who are not experienced using spreadsheet software should note that the instructions build on prior steps.
7.3.2 Learning objectives
- Explore Amplicon Sequence Variants (ASVs) through the taxonomy profile of select ASVs, and sample metadata for a subset of ASVs.
- Ask research questions based on data and metadata
- How many different microbes (ASVs) are there in a given sample?
- Is there an even abundance of microbes (ASVs) in a sample or are there a few that dominate?
- How varied are ASVs and their abundance across subjects?
- How varied are ASVs and their abundance across diets?
7.3.3 Introduction
The most popular sequencing technique for the analysis of bacterial diversity is targeted sequencing, or sequencing of a specific gene (or region of a gene, e.g. a hypervariable region of the bacterial 16S ribosomal rRNA gene) using Polymerase Chain Reaction (PCR) to create sequences called amplicons. Sequence variation in the resulting amplicons creates Amplicon Sequence Variants (ASVs). ASVs varying from as little as one single nucleotide are defined as separate ASVs and as little as 1% difference in ASV sequence can be associated with different species. In this activity you will be exploring ASVs generated with sequencing 250 nucleotides (nt) Illumina sequencing, where Amplicon sequence variants (ASVs) were identified.
7.3.4 Activity 1 – Quick data overview based on 1 sample
Estimated time: 10 min
Although the full dataset contains 105 samples from 21 subjects and 5 timepoints (representing 3 dietary conditions), in this activity we will familiarize ourselves with the data by exploring data from 1 sample only.
7.3.4.1 Activity 1-1. Explore data contents using one sample only, namely MISO1-S02-1 (subject 2, timepoint 1).
Access ‘MISO1-S02-1’ file and open it with Google Sheets.
- Rows = ASVs
- Columns = Samples and taxonomy:
- Col A = ASV id
- Cols B-H = Taxonomic information
- Col I = ASV counts (for sample MISO1-SO2-1)
| 1-1.1 How many microbes (ASVs) are there? This is the # of rows there are in the spreadsheet. Scroll down to the bottom of the sheet and look at the row number or select column A and see the count of all the cells in a row at the bottom right of the window. Remember to not count the header row. |
|---|
| 1-1.2 How many microbes are there in this sample? In other words, what is the total number of ASVs per sample? |
|---|
We can count cells with a specific condition using the following formula: =COUNTIF(CELLS, “<>0”), where CELLS is replaced by the range of cells you want to use in the calculation.
For example, for this sample (MISO1-S02-1) in cell J1 enter the formula =COUNTIF(I2:I2185, “<>0”). This formula says, count cells in the range between I2 and I2185 that are not equal to 0. Note that we do not include I1 in the calculation as that is the name of the sample name.
| # Microbes: |
| 1-1.3 What taxonomy is associated with ASV1? Include all taxonomic ranks, even those that are listed as NA. |
|---|
| Kingdom: |
| Phylum: |
| Class: |
| Order: |
| Family: |
| Genus: |
| Species: |
| 1-1.4 What taxonomy is associated with ASV2? Include all taxonomic ranks, even those that are listed as NA. |
|---|
| Kingdom: |
| Phylum: |
| Class: |
| Order: |
| Family: |
| Genus: |
| Species: |
| 1-1.5 For sample 1 (MISO1-S02-1) what is the count of ASV 1? |
|---|
| 1-1.6 For sample 1 (MISO1-S02-1) what is the count of ASV 2? |
|---|
| 1-1.7 Are ASVs equally abundant throughout the sample? Plot the ASV count distribution (range of ASV abundance) for sample MISO1-S02-1. |
|---|
| A) Make a bar graph of ASV distribution. Your chart should have the same format as the example. |
|---|
To make a bar graph with this data, hold the Command key on a Mac or Ctrl on Windows to select data that are not immediately next to each other and then click on columns A (“ASV”, which contains ASV ids) and I ( “MISO1-S02-1”, which contains counts)
In the menu at the top of the screen, select Insert > Chart.
Make sure your graph is a bar plot: click on the 3 dots in the top right corner of the plot → edit chart and in the Chart editor, under Chart type, selecting Column Chart.
Copy the bar graph into a box below: click on the 3 dots in the top right corner of the plot, select copy chart, then, paste it below.
Your graph should look similar to the example graph with the ASVs on the x-axis and counts on the y-axis (although the Y axis is titled the same name of the MISO sample, since this is the name of the column).
| Insert graph here: |
| B) What is the ASV with the highest abundance (count) in sample 1? To find this answer hover over the tallest bar in your ASV distribution plot from activity at its highest point. |
|---|
| C) Describe the pattern of abundance in ASVs for this sample. Is the abundance of ASVs evenly distributed (left in the example graph) where each ASV is approximately the same abundance (count) or is the abundance of ASVs unevenly distributed (right in the example graph), where some ASVs have very high and/or very low counts? |
|---|
7.3.5 Activity 2 – Explore variation between individuals
Estimated time: 10 min
One aspect of the Guthrie et al. study is the extent to which diet contributes to interpersonal microbiome variation (variation in microbiome between individuals). How does what you eat affect your gut microbiome?
To this end, authors standardized the diet for all individuals with the same diet to evaluate how much of an effect the diet has on the microbiome. If people all eat the same diet, will their gut microbiome become more similar to each other? The study period was 28 days long and had 3 dietary conditions:
- BD (Baseline diet): which lasted 14 days (2 time points collected) and has participants eating what they usually eat.
- HD (Homogenized diet): which lasted 7 days (2 time points collected), and has all participants on the same diet
- WO (washout diet): which lasted for 7 days where participants choose their own diet again
To explore variation between individuals, here we simplify the dietary variation to a single condition, WO.
For this activity, the data file MISO_WO contains samples pre-filtered for WO condition only (and excludes samples corresponding to BD and HD).
Since there is only 1 WO datapoint per individual, each WO sample corresponds to a different individual, so you should see 21 samples.
7.3.5.1 Activity 2-1 - What are the most common ASVs between individuals?
IMPORTANT: An ASV is associated with an individual if it is NOT a zero. Zero means that particular ASV was not found in the individual, so you need to exclude zeroes from your calculations.
| A) Calculate the number of samples associated with each ASV. Record first 5 values below. |
|---|
- Create an empty new first column for temporary calculations by right clicking column A and selecting “Insert 1 column left”.
- In cell A1, label this new column “Occurrence”.
- We will use the formula =COUNTIF(range, “<>0”) again to count how many cells have a non-0 value. For example, for ASV1 enter the formula = COUNTIF(C3:W3, “<>0”) into cell A3.
- If the formula and its placement are correct, cell A3 should calculate to 17 (so 17 of 21 WO samples have ASV1). Confirm your calculation is working correctly.
- Apply the calculation to all cells in the column by copy pasting the formula. For a reminder on how to do this, see the Google Sheets cheatsheet.
| ASV1: |
| ASV2: |
| ASV3: |
| ASV4: |
| ASV5: |
| B) Calculate % of samples associated with each ASV and record the first top 5. record the first top 5. |
|---|
- Create another new column by right clicking on column A and selecting “Insert 1 column left”.
- In cell A1, give the new column a name, “% of samples with this ASV”.
- Calculate % by dividing a value you obtained in part A) by 21 (samples) and multiplying by 100. To do this, starting with cell A3 (corresponding to ASV1), use the formula =100*(B3/21).
- Apply the calculation to all cells in the column by copy pasting the formula. For a reminder on how to do this, see the Google Sheets cheatsheet.
| ASV1: |
| ASV2: |
| ASV3: |
| ASV4: |
| ASV5: |
| C) How many ASVs are present in all 21 individuals and what are their ids? |
|---|
To do so, find the number of ASVs that have a % of samples with this ASV = 100%.
- In cell A2 use the formula = COUNTIF(A3:A2186, “=100”). This formula says, count a cell if it is equal to 100.
7.3.5.2 Activity 2-2. Explore the taxonomy of the most common ASVs
| 2-2.1 Which ASV is found in 100% of samples? This ASV has a % of samples with this ASV = 100%. Hint: This ASV is found within the first 20 ASVs. |
|---|
| 2-2.2 What is the taxonomy assigned to the most common ASV you found in Activity above? |
|---|
- Return to your copy of ‘MISO1-S02-1’.
- Columns B-H correspond to taxonomic information.
| Kingdom: |
| Phylum: |
| Class: |
| Order: |
| Family: |
| Genus: |
| Species: |
| 2-2.3 This ASV was found in every subject. What might this mean about the function this ASV plays in the gut microbiome? |
|---|
7.3.6 Activity 2-3. Plot individual variation in abundance of the most common ASV.
The most common ASV is represented in all samples, but does each sample have the same abundance of this ASV?
| A) Make a bar graph of most common ASV distribution across 21 individuals |
|---|
1.Return to your version of the MISO_WO spreadsheet.
Hold the Command key on a Mac or Ctrl on Windows to select data that are not immediately next to each other to select the header row (D2:X2, sample names) and the row corresponding to the most common ASV you identified above.
In the menu at the top of the screen, select Insert > Chart. Make sure your graph is a bar plot: click on the 3 dots in the top right corner of the plot → edit chart and in the Chart editor, under Chart type, selecting Column Chart.
Copy the bar graph into a box below: click on the 3 dots in the top right corner of the plot, select copy chart, then, paste it below. Your graph should look similar to the example graph below, with multicolored bars that each represent one sample’s count of the ASV. The example shows a different ASV plotted with 5 samples.
Your chart will have more bars and the legend may not be in the same place; do not worry if there is no legend or the legend text is not visible as in the example.
| Insert chart here |
| B) Is the abundance of this ASV similar or dissimilar across samples? Each bar represents the count (abundance) of this ASV for a given subject. Ignore the axis title, which only has 1 sample name. Compare your graph to the example figure. Which is more like the graph you made? |
|---|
| C) What might this mean for the function of this ASV in the gut microbiome? |
|---|
7.3.7 Activity 3 – Explore variation within a single individual associated with diet
Estimated time: 10 min
Although the full dataset contains 105 samples from 21 subjects and 5 timepoints (representing 3 dietary conditions), in this activity we will zoom into ASV counts data for 1 individual (comprised of 5 samples representing the 5 timepoints collected from that individual).
- Timepoints 1 and 2: BD (Baseline diet) prior to normalization with homogenized diet
- Timepoints 3 and 4: HD (Homogenized diet): normalized to be same for all
- Timepoint 5: WO (Washout diet), subjects can choose to eat what they want again
7.3.7.1 Activity 3-1 Explore differences between dietary changes in one subject
| 3-1.1 Calculate variation in ASV abundance within individual 1 (MISO1-S02) and record the first 5 values below. |
|---|
How varied is the ASV abundance across timepoints?
- This involves plotting counts for ASV 1-10 for MISO1-SO2 timepoints 1 through 5.
- To calculate variation within a sample, we will use the VAR.S function in google sheets to calculate the variance of a sample. For more information on variance, see the Google Sheets cheatsheet.
- Access ‘MISO_Subject1’
- Label an empty Column G by putting “Variance” in cell G1. Use this column to calculate variance.
- Starting with the first ASV use the following formula: =VAR.S(range) to calculate ASV abundance variation in the 5 samples/timepoints collected for individual MISO1-SO2. For example, for ASV1 enter formula =VAR.S(B2:F2) in cell G2.
- If imputed correctly, cell G2 should calculate to 14947547.8. Confirm this.
- Apply the calculation to all cells in the column by copy pasting the formula. For a reminder on how to do this, see the Google Sheets cheatsheet.
- Record the first 5 variance values below.
| ASV1 Variance: |
| ASV2 Variance: |
| ASV3 Variance: |
| ASV4 Variance: |
| ASV5 Variance: |
| 3-1.2 Plot ASVs with highest variance. |
|---|
These are the ASVs whose abundance changed the most over the course of the study and could potentially depend heavily on diet. By plotting their counts, we will be able to see how the abundance of a given ASV changed throughout the experiment.
| A) Sort the ASVs by the variance column with the largest variance values at the top. |
|---|
- Click on cell G2.
- Go to Data > Sort sheet > Sort sheet by column G (Z to A)
- Record the top 5 ASVs with the highest Variance and their Variance values below.
| ASV# Variance: |
| ASV# Variance: |
| ASV# Variance: |
| ASV# Variance: |
| ASV# Variance: |
| B) After sorting ASVs based on high to low variance, make a plot of ASV counts corresponding to top 15 ASVs with the highest variance. |
|---|
- To make a bar graph highlight top 16 rows corresponding to sample names and the 15 ASVs with highest variance, for samples MISO1-S02-1 through 5. Do not include the “Variance” column.
- In the menu at the top of the screen, select Insert > Chart.
- Make sure your graph is a bar plot: click on the 3 dots in the top right corner of the plot → edit chart and in the Chart editor, under Chart type, selecting Column Chart.
- Copy the bar graph into a box below: click on the 3 dots in the top right corner of the plot, select copy chart, then, paste it below. Your chart should look similar to the example chart with different ASVs
| Insert chart here |
7.3.7.2 Activity 3-2. Are there any ASVs whose abundance changes with different diets?
Recall that there samples 1 and 2 are from the baseline diet (BD), samples 3 and 4 are on the uniform homogenized diet (HD) and sample 5 is from the washout diet.
- ASV_A in the example graph seems tightly linked to diet - timepoints from the same diet are similar to each other, and timepoints from different diets are different from each other.
- ASV_B shows a pattern that doesn’t seem connected to diet. Samples from timepoints within the same diet are not similar, while samples from different diets are.
| A) List an ASV that changes with diet and one that changes seemingly without the influence of diet: |
|---|
| ASVs that seem to change in abundance based on diet: |
| ASVs that seem to change in abundance that do not appear connected to diet: |
| B) What factors other than diet do you think might change the abundance of a microbe in the gut microbiome? |
|---|
7.3.8 Grading criteria
- Download the assignment to your local computer as a .docx, complete it, and upload the assignment to your LMS (Blackboard, Canvas, Google Classroom).
7.3.9 Footnotes
7.3.9.1 Resources
- Google Doc
- Spreadsheets 101
- ‘MISO1-S02-1’ spreadsheet
- MISO_WO spreadsheet
- ‘MISO_Subject1’ spreadsheet