Chapter 5 Taxonomy Profiling

Overview

Purpose

Learning Objectives

Introduction

[slides] [video]

Activity 1 – Trim and QC Reads

Estimated time: 40 min (~25 min computing)

Instructions

1. Import dataset

Confirm Barcode10_Spike2.fastq.gz exists in your history by clicking on the Home button in the top menu

2. Run workflow
  • Open the Trim and QC Reads public workflow
  • Click on Run and then Run Workflow on the Barcode10_Spike2.fastq.gz dataset

Wait ~25 minutes as the NanoPlot, Porechop, and fastp jobs are scheduled, run, and complete

3. View results
  • Click on the Display icon (eyeball) next to the dataset tagged NanoPlot-Original
  • Compare the number of reads, length, quality, etc. with the report tagged NanoPlot-QCed

Questions

You can refer to this completed history to answer these questions while you wait for your jobs to complete.

1A. How many megabases were sequenced? What percentage was removed by the trimming step?

1B. What are the median and mean read lengths? Why is the mean is longer than the median?

Activity 2 – Taxonomy Profiling

Estimated time: 50 min (~25-35 min computing)

Instructions

1. Run workflow
  • Open the Taxonomy Profiling public workflow
  • Click on Run and then Run Workflow with the following parameters
    • Dataset: fastp on data 5: Read 1 output
    • kraken_database: Prebuilt Refseq indexes: PlusPF
  • Wait ~35 minutes as the Kraken2, KrakenTools, and Krona jobs are scheduled, run, and complete

2. View results
  • Click on the Display icon (eyeball) next to the dataset tagged taxonomy_profiling_visualization_krona_pie_chart
  • Examine how many reads were classified as Salmonella, Escherichia coli, Homo sapiens, and Unclassified

Questions

2A. What is the percentage of unclassified for the sample?


2B. What are the kindgoms found for the sample?


2C. Where might the eukaryotic DNA come from?


2D. How is the diversity of Proteobacteria versus Firmicutes Phylum?


2E. How much E. coli and Salmonella are present in the sample?

Footnotes

Resources

Contributions and Affiliations

  • Jennifer Kerr, Notre Dame of Maryland University
  • Rosa Alcazar, Clovis Community College
  • Frederick Tan, Johns Hopkins University
  • Based on “Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition” (GTN)

Last Revised: September 2024