4.3 Lab Activity: RNA-seq Analysis
4.3.1 Purpose
In this lab, students will complete a tutorial on RNA-seq data and learn how to analyze, graph and interpret the data. In the following lab, we will use these skills to compare two RNA-seq data sets to investigate gene expression patterns.
4.3.2 Learning Objectives
- Use R to analyze HTSeq files
- Create and analyze histograms from HTSeq files
4.3.3 Introduction
Today’s lab will investigate how scientists use computer science to analyze RNA-seq data. In general, the sequences are first aligned to a reference genome. For RNA-seq, the sequences will align to exons of the expressed genes. The data you will look at today has already been processed using a program called HTSeq. This program aligns the sequences to the reference genome and counts how many sequences align to each gene, producing files known as HTSeq files. The more sequences that align to the gene, the higher the expression level of the gene. The following tutorial will walk you through how to analyze an HTSeq file using the programming language R.
The RNA-seq libraries from today’s lab are from: eLife 2013;2:e00886 DOI: 10.7554/eLife.00886. The paper analyzes genes expression in the drosophila midgut.
4.3.4 Activity 1 - Introduction to RNA-seq Data Tutorial
Estimated time: 20 min
4.3.4.1 Instructions
- Log into SciServer, click on compute and open your C-MOOR LearnR” container.
- Start the “Introduction to RNA-seq Data” tutorial. Visit SciServer Guides and FAQs. if you need to jog your memory on how to do this.
- To move through the activities click “Continue” at the bottom of the screen. When you are done with a topic, click “Next Topic” to move on.
- This tutorial has small boxes in which you can enter and run short lines of code to analyze the data.
- As you work through the tutorial, answer the questions below. When you get to “Try it Out!” move on to Activity 2.
4.3.4.2 Questions
What are the two columns (V1 and V2) in an HT-seq file? What data is stored in each column? |
---|
Explain what is readCount and what is GeneID |
---|
Share a screenshot of the row showing the readCount of the lab gene in the “Reproduce Results for a Single Gene” section and explain in your own words what the code in your screenshot is doing. |
---|
4.3.5 Activity 2 - Analyze an HT-seq file
Estimated time: 15-20 min
4.3.5.1 Instructions
- In groups of two, analyze the HTSeq samples assigned to you.
Assigned Sample | |
---|---|
Name | |
Name |
- Use the code blocks on the “Try it out!” page to analyze the data.
- The codeblocks on the “Try it out!” page, has this code typed out for you:
readCounts <- read.table( "data/FILENAMEHERE.htseq")
- Change
FILENAMEHERE
to the filename for your file. Once you have done this, readCounts will have the new HTSeq file loaded into it- Example:
readCounts <- read.table( "data/SRR891602.htseq")
- Example:
- The codeblocks on the “Try it out!” page, has this code typed out for you:
- The code above loads the data set into
readCounts
. If you run this code alone, nothing will happen. Try requesting some analysis to get a look at the data and answer the questions below. - Answer the questions below as you analyze the data. Consult the “Cheat Sheet” to figure out which code to use.
4.3.5.2 Questions
Determine the total reads across all genes, and the mean, median and max read counts for a single sample. Each student reports on one of the samples analyzed.
Assigned Sample | |
---|---|
Total Reads | |
Mean Read Count | |
Median Read Count | |
Max Read Count |
Look up the GeneID of the gene you presented on from the Biological Databases Lab. Use the filter command to find the readCount in both samples assigned to your group.
How many reads does the gene have in your assigned dataset? |
---|
Share a screenshot. |
How many reads does the gene have in your partner’s dataset? |
---|
Share a screenshot. |
Compare this number to the mean; is it average, high or low? |
---|
4.3.6 Footnotes
4.3.6.1 Resources
4.3.6.2 Contributions and Affiliations
- Stephanie R. Coffman, Ph.D. Clovis Community College
- Rosa Alcazar, PH.D. Clovis Community College
- Katie Cox, Ph.D. Carnegie Institute at John Hopkins University
Last Revised: March 2022