4.3 Lab Activity: RNA-seq Analysis

4.3.1 Purpose

In this lab, students will complete a tutorial on RNA-seq data and learn how to analyze, graph and interpret the data. In the following lab, we will use these skills to compare two RNA-seq data sets to investigate gene expression patterns.

4.3.2 Learning Objectives

  1. Use R to analyze HTSeq files
  2. Create and analyze histograms from HTSeq files

4.3.3 Introduction

Today’s lab will investigate how scientists use computer science to analyze RNA-seq data. In general, the sequences are first aligned to a reference genome. For RNA-seq, the sequences will align to exons of the expressed genes. The data you will look at today has already been processed using a program called HTSeq. This program aligns the sequences to the reference genome and counts how many sequences align to each gene, producing files known as HTSeq files. The more sequences that align to the gene, the higher the expression level of the gene. The following tutorial will walk you through how to analyze an HTSeq file using the programming language R.

The RNA-seq libraries from today’s lab are from: eLife 2013;2:e00886 DOI: 10.7554/eLife.00886. The paper analyzes genes expression in the drosophila midgut.

4.3.4 Activity 1 - Introduction to RNA-seq Data Tutorial

Estimated time: 20 min

4.3.4.1 Instructions

  1. Log into SciServer, click on compute and open your C-MOOR LearnR” container.
  2. Start the “Introduction to RNA-seq Data” tutorial. Visit SciServer Guides and FAQs. if you need to jog your memory on how to do this.
  3. To move through the activities click “Continue” at the bottom of the screen. When you are done with a topic, click “Next Topic” to move on.
  4. This tutorial has small boxes in which you can enter and run short lines of code to analyze the data.
  5. As you work through the tutorial, answer the questions below. When you get to “Try it Out!” move on to Activity 2.

4.3.4.2 Questions

What are the two columns (V1 and V2) in an HT-seq file? What data is stored in each column?


Explain what is readCount and what is GeneID


Share a screenshot of the row showing the readCount of the lab gene in the “Reproduce Results for a Single Gene” section and explain in your own words what the code in your screenshot is doing.

4.3.5 Activity 2 - Analyze an HT-seq file

Estimated time: 15-20 min

4.3.5.1 Instructions

  1. In groups of two, analyze the HTSeq samples assigned to you.
Assigned Sample
Name
Name
  1. Use the code blocks on the “Try it out!” page to analyze the data.
    1. The codeblocks on the “Try it out!” page, has this code typed out for you:
      1. readCounts <- read.table( "data/FILENAMEHERE.htseq")
    2. Change FILENAMEHERE to the filename for your file. Once you have done this, readCounts will have the new HTSeq file loaded into it
      1. Example: readCounts <- read.table( "data/SRR891602.htseq")
  2. The code above loads the data set into readCounts. If you run this code alone, nothing will happen. Try requesting some analysis to get a look at the data and answer the questions below.
  3. Answer the questions below as you analyze the data. Consult the “Cheat Sheet” to figure out which code to use.

4.3.5.2 Questions

Determine the total reads across all genes, and the mean, median and max read counts for a single sample. Each student reports on one of the samples analyzed.

Assigned Sample
Total Reads
Mean Read Count
Median Read Count
Max Read Count


Look up the GeneID of the gene you presented on from the Biological Databases Lab. Use the filter command to find the readCount in both samples assigned to your group.

How many reads does the gene have in your assigned dataset?
Share a screenshot.


How many reads does the gene have in your partner’s dataset?
Share a screenshot.


Compare this number to the mean; is it average, high or low?

4.3.6 Footnotes

4.3.6.1 Resources

4.3.6.2 Contributions and Affiliations

  • Stephanie R. Coffman, Ph.D. Clovis Community College
  • Rosa Alcazar, PH.D. Clovis Community College
  • Katie Cox, Ph.D. Carnegie Institute at John Hopkins University

Last Revised: March 2022