Chapter 5 Taxonomy Profiling

Estimated time: 40 min (~25 min computing)

Confirm Barcode10_Spike2.fastq.gz exists in your history by clicking on the Home button in the top menu

Open the Trim and QC Reads public workflow
Click on Run and then Run Workflow on the Barcode10_Spike2.fastq.gz dataset

Wait ~25 minutes as the NanoPlot, Porechop, and fastp jobs are scheduled, run, and complete

Click on the Display icon (eyeball) next to the dataset tagged NanoPlot-Original
Compare the number of reads, length, quality, etc. with the report tagged NanoPlot-QCed

You can refer to this completed history to answer these questions while you wait for your jobs to complete.

1A. How many megabases were sequenced? What percentage was removed by the trimming step?

1B. What are the median and mean read lengths? Why is the mean is longer than the median?

Estimated time: 50 min (~25-35 min computing)

Open the Taxonomy Profiling public workflow
Click on Run and then Run Workflow with the following parameters
- Dataset: fastp on data 5: Read 1 output
- kraken_database: Prebuilt Refseq indexes: PlusPF
Wait ~35 minutes as the Kraken2, KrakenTools, and Krona jobs are scheduled, run, and complete

Click on the Display icon (eyeball) next to the dataset tagged taxonomy_profiling_visualization_krona_pie_chart
Examine how many reads were classified as Salmonella, Escherichia coli, Homo sapiens, and Unclassified

2A. What is the percentage of unclassified for the sample?

2B. What are the kindgoms found for the sample?

2C. Where might the eukaryotic DNA come from?

2D. How is the diversity of Proteobacteria versus Firmicutes Phylum?

2E. How much E. coli and Salmonella are present in the sample?

Jennifer Kerr, Notre Dame of Maryland University
Rosa Alcazar, Clovis Community College
Frederick Tan, Johns Hopkins University
Based on “Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition” (GTN)

Last Revised: September 2024