3.4 Project: Microbial Genomes

3.4.1 Purpose

Explore information about new bacterial MAGs from Xue, et al. 2023 Scientific Data (pubmed.gov/37563185).

3.4.2 Learning Objectives

  1. Explore and better understand MAGs and contigs by following up one MAG and one contig in NCBI.
  2. Utilize knowledge learned by navigating MAG information using previously learned tools such as GenBank, Sequence Browser, BV-BRC, Taxonomy Browser, and Lifemap.
  3. Go deeper and see what you can find on the organism from your MAG of choice and a gene from a MAG of choice using any tools available to you (Pubmed, Google, AI, BV-BRC, other databases).

3.4.3 Activity 1 – MAGs and Taxonomy

Estimated time: 40 min

3.4.3.1 Part 1 - Explore a MAG

3.4.3.2 Instructions

  1. In the research study by Xue, et al. 2023, 17 of 103 uncovered MAGs from ballast water or sediment were of very high quality and completeness. You will follow up on one of them!
  2. Choose one of these 17 MAGs (see their GenBank IDs below) to follow up in this activity.
  3. Use GenBank ncbi.nlm.nih.gov/nucleotide to answer questions.
GenBank IDs for 17 high-quality MAGs
1. GCF_030147545.1 10. GCF_030149085.1
2. GCF_030148435.1 11. GCF_030149145.1
3. GCF_030149225.1 12. GCF_030148125.1
4. GCF_030148195.1 13. GCF_030149425.1
5. GCF_030148855.1 14. GCF_030149235.1
6. GCF_030148245.1 15. GCF_030148515.1
7. GCF_030149045.1 16. GCF_030147875.1
8. GCF_030148385.1 17. GCF_030149465.1
9. GCF_030147715.1

3.4.3.3 Questions

1. Record the GenBank ID for one of the 17 MAGs that you will research.

2. Record the GENOME name for the MAG assembly associated with the GenBank ID you entered.

Hints:

  1. In GenBank ncbi.nlm.nih.gov/nucleotide, under the Nucleotide search tab enter the GenBank number of your MAG (e.g. GCF_030147545.1) and click Search.

Image test

  1. For that MAG (e.g. 1st MAG with GenBank ID GCF_030147545.1), the GENOME name is “Alcanivorax sp. genome ASM3014754v1”:

Image test

For the following questions, click on the GENOME name of your MAG to explore genome assembly summary information.

3. What is the Taxon of your MAG?

4. What is the genome size of your MAG?

5. How many contigs contributed to your MAG assembly?

6. How many genes were annotated?

7. How comparable is your MAG genome to the E. coli genome from the microbial-genomes-pre-lab, in terms of genome size and number of genes?

8. As you can see, MAGs are made up of Contigs. Based on your lecture and reading material, and this exercise, in your own words define MAGs and Contigs below.
MAGs:
Contigs:

3.4.3.4 Part 2 - Explore a Contig

3.4.3.5 Instructions

  1. MAGs are made up of Contigs. To see which contigs make up your MAG, go back to GenBank ncbi.nlm.nih.gov/nucleotide, and search for your MAG using its GenBank ID (e.g.GCF_030147545.1).
    • For example, you will see that MAG GCF_030147545.1 is composed of 41 contigs.

3.4.3.6 Questions

1. Choose a contig of reasonably large size (> 75 kb), click on the contig, then find and record below the Contig’s ID (Accession number/GenBank ID) for further examination.
Tip: You can sort the entries by length via’ Sort by Sequence Length’ on top!

2. Record 7 core taxonomy ranks for your Contig.
Kingdom:
Phylum:
Class:
Order:
Family:
Genus:
Species:

Hints:

  1. For the contig you chose above, under Related Information on the right, click on Taxonomy and then click again on the provided link.
  2. Find Lineage information. Full Lineage information contains 7 core taxonomy ranks: Kingdom, Phylum, Class, Order, Family, Genus and Species, plus any additional classification ranks. To just get the 7 core lineage names, click on Lineage link for the abbreviated Lineage, or, simply hover over lineage names.

3.4.3.7 Part 3 - Visualize contig in a tree of life

3.4.3.8 Instructions

  1. Use Lifemap to visually explore the contig taxonomy in the context of the tree of life. Go to lifemap-ncbi.univ-lyon1.fr and enter the lowest taxonomy rank observed for your contig (most likely the species or genus level, but can also correspond to order or family).
  2. On the tree map, use plus and minus tabs to zoom in and out and visualize your Contig entry relative to other organisms on the map. Zoom in and find nodes corresponding to the higher taxonomic ranks. For example, if your contig corresponds to genus level classification, you will not be able to identify species level information, but you will be able to identify the corresponding Family, Order, Class and Phylum.

3.4.3.9 Questions

1. Record 7 core taxonomy ranks for your Contig.
Kingdom:
Phylum:
Class:
Order:
Family:
Genus:
Species:
2. What are some other members of the Genus to which your Contig belongs?

3. What are some other members of the Family to which your Contig belongs?

4. What are some other members of the Order to which your Contig belongs?

5. What are some other members of the Class to which your Contig belongs?

6. What are some other members of the Phylum to which your Contig belongs?

3.4.4 Activity 2 – Genomes, Genes, and Databases

Estimated time: 20 min

3.4.4.1 Instructions

  1. In GenBank ncbi.nlm.nih.gov/nucleotide, for the contig you chose in Activity 1 click on Graphics to explore the genome browser and the genes.

Image test

  1. Select genes of interest. A lot of genes found in the Contigs will have no ‘familiar’ short symbol and instead have a long alphabetical-and-numerical- name. Such genes are either uncharacterized, hypothetical or have functional or structural similarity to known genes/proteins, but have not been confirmed. However, some Contig genes will be annotated with a ‘familiar’ short gene symbol, matching known genes. Please use the genes with the short symbols for this activity, since the ‘other’ genes will not be found in the databases.

Image test

  1. For your genes of interest, use BV-BRC bv-brc.org to find more information.

3.4.4.2 Questions

1. Record 3 genes of interest for your Contig.
Gene1:
Gene2:
Gene3:
2. Record one organism (Genome Name) for which the gene of interest is available.
Gene Genome Name
Gene1:
Gene2:
Gene3:

Hint:

In the bv-brc.org Search space:

  1. From a dropdown menu select Pathways,
  2. Type in the gene name and click enter. This may result in a lot of gene entries for different organisms,
  3. Note one host species/strain (Genome Name).
3. Record gene Product associated with your genes of interest.
Gene Product
Gene1:
Gene2:
Gene3:
4. Record Pathway Name associated with your genes of interest.
Gene Pathway Name
Gene1:
Gene2:
Gene3:

3.4.5 Activity 3 – Go Deeper

Estimated time: 30 min

3.4.5.1 Instructions

Use any tools available at your disposal to follow up on your MAG and gene of interest from activities 1 and 2 above. Some suggested tools include PubMed, Google, AI, MBGD, BV-BRC, BacDive.

3.4.5.2 Questions

  1. For the taxa you identified for your chosen MAG in activity 1, what can you learn about this organism (species or genus for example) in 15 minutes using any tools at your disposal?
a. What did you learn?
b. What tools did you use?
  1. For one of the genes you identified for your chosen contig in activity 2, what can you learn about this gene in 15 minutes using any tools at your disposal?
a. What did you learn?
b. What tools did you use?

3.4.6 Grading Criteria

  • Download as Microsoft Word (.docx) and upload on Canvas

3.4.7 Footnotes

Resources

Contributions and Affiliations

  • Valeriya Gaysinskaya, Johns Hopkins University
  • Gauri Paul, Clovis Community College
  • Frederick Tan, Johns Hopkins University

Last Revised: January 2026