Download File
Click here to download a zip file containing RNA-seq read count data for 30 samples from the Drosophila midgut.
Raw data are available on SRA; see NCBI BioProject PRJNA207813.
Introduction
Why study regionalization of the gut?
Often, we think of our intestines as long, uniform tubes, and we assume that the beginning of the tube is similar to the middle and the end of the tube. Sometimes we treat diseases (like cancers or Crohn’s disease) by cutting out part of the tube and sewing the healthy parts back together. But what if different regions of the tube have different jobs? If this is true, we would like to learn more about what regions exist, where the boundaries are, and what the job of each region is. This would help us have a better understanding of healthy digestion, and could lead to better understanding and treatment of digestive diseases.
Why use fruit-flies?
While we would love to be able to study these processes in humans, we can’t exactly ask hundreds of people to let us dissect their intestines! What we do instead is to use model organisms - we study the same process in different organisms in order to learn about how it might work in humans.
Fruit-flies (Drosophila melanogaster) are a useful model organism because they breed very quickly (2 weeks!) and are easy to care for. Many human genes have matching genes in fruit-flies. Scientists have been using fruit-flies to study genetics for several decades, so we know a lot about them. This makes it easier to understand any new information that we learn.
In this case, we can ask whether there are different regions in the fruit-fly gut, and if so, what do these different regions do? In future experiments, we could then look at what happens when these different regions are damaged in flies, and try to understand the outcomes. As we build an understanding of these processes in fruit-flies, we can start asking whether humans are similar - do they have matching genes?
What can you investigate?
Drs. Marianes and Spradling showed that there are differences between the regions in this dataset, and that these differences seem to match up with other scientific research about the fruit-fly midgut. But there are thousands of genes in these samples that we can use for further investigations. Here are a few suggestions for questions you could investigate:
What nutrient processing pathways dominate in different regions of the gut?
- Sterol metabolism?
- Lipid metabolism?
- Carbohydrate metabolism?
- Protein metabolism?
- Other, smaller nutrient categories, such as salts, metals, or vitamins?
You can use databases such as Flybase and BioMART to identify interesting genes or groups of genes to investigate.
Are there other groups of genes that dominate in particular regions?
This is a more open-ended, “unbiased” question - rather than looking for specific categories of genes, look at the data as a whole and see what stands out. You might find something surprising!
- What are the most highly expressed genes in different regions and what are their functions? Are they expressed across the whole midgut or just in specific regions?
- What are the most differentially expressed genes between regions?
- What genes behave similarly to each other across regions (heatmaps, clustering)?
- What groups or categories of genes are significantly different between regions, or behave similarly across all regions (Gene Set Analysis of differential expression or clustering results)?
Talk to us!
Join our Slack community to talk to other people working on C-MOOR datasets, and meet other scientists, data analysts and learners. E-mail us for an invitation. We’d love to hear about what you’re working on!
Dataset Details
Samples
Here is an image from the Marianes and Spradling (2013) paper showing 3 large regions (top) and 7 smaller subregions (bottom) of the Drosophila midgut.
There are a total of 30 read count files from 30 different samples from the Drosophila midgut.
Each midgut region (or group of regions) has 3 replicate samples (3 data files, containing RNA-seq data from different flies). We collect multiple samples from each region to make sure genes behave consistently. If a particular gene is highly expressed in a1
in one sample, we’re not sure if that’s normal or not. If the same gene is highly expressed in 3 different samples of a1
(from different flies), then we’re much more confident in the data.
The first 9 samples (am1-am9) are large sections of the midgut:
- The three anterior regions(
a1_3
) - The middle regions (
CuLFCFe
) - The four posterior regions (
p1_4
)
The remaining 21 samples (am10-am30
) are individual regions (or pairs of regions, if it was not possible to dissect out a single region):
- Anterior region 1 (
a1
) - Anterior regions 2 through 3 (
a2_3
) - The copper region (
Cu
) - The large flat cell and iron regions (
LFCFe
) - The iron region alone (
Fe
) - Posterior region 1 (
p1
) - Posterior regions 2 though 4 (
p2_4
)
You can read more about these different regions and subsections in the paper.
The table below shows each sample, the name of the file containing data for that sample, and the region the sample came from.
sample | filename | condition |
---|---|---|
am1 | SRR891601.htseq | a1_3 |
am2 | SRR891602.htseq | a1_3 |
am3 | SRR891603.htseq | a1_3 |
am4 | SRR891604.htseq | CuLFCFe |
am5 | SRR891605.htseq | CuLFCFe |
am6 | SRR891606.htseq | CuLFCFe |
am7 | SRR891607.htseq | p1_4 |
am8 | SRR891608.htseq | p1_4 |
am9 | SRR891609.htseq | p1_4 |
am10 | SRR891610.htseq | a1 |
am11 | SRR891611.htseq | a1 |
am12 | SRR891612.htseq | a1 |
am13 | SRR891613.htseq | a2_3 |
am14 | SRR891614.htseq | a2_3 |
am15 | SRR891615.htseq | a2_3 |
am16 | SRR891616.htseq | Cu |
am17 | SRR891617.htseq | Cu |
am18 | SRR891618.htseq | Cu |
am19 | SRR891619.htseq | LFCFe |
am20 | SRR891620.htseq | LFCFe |
am21 | SRR891621.htseq | LFCFe |
am22 | SRR891622.htseq | Fe |
am23 | SRR891623.htseq | Fe |
am24 | SRR891624.htseq | Fe |
am25 | SRR891625.htseq | p1 |
am26 | SRR891626.htseq | p1 |
am27 | SRR891627.htseq | p1 |
am28 | SRR891628.htseq | p2_4 |
am29 | SRR891629.htseq | p2_4 |
am30 | SRR891630.htseq | p2_4 |