In: Biology
You are looking to find regions of statistically significant differential expression, you consider two distinct ways of looking at the problem. In the first, you look at all windows of length 10 kb. In the second, you consider only the 20,000 annotated protein coding genes. Give the pros and cons of these two approaches, being sure to comment on how the desired statistical cutoff is influenced by testing multiple regions. (Recall that the human genome is 3.0 ×109 bp.)
Ans. As per the information we need to require the study of various sizes of the fragments to identify the differential expression regions. And we are using two ways that is considering the larger fragments of 10kb and then using the 20,000 annotated protein coding genes.
Both the approaches when used simultaneously, then there will be complete information about the expression of the gene. But both the techniques have their pros and cons in terms of statistical cut off
Pros of looking at windows of length 10 kb: It will be more convenient and statistically significant that this will be able to search the longer fragments and that will be rapid for whole genome sequencing.
Cons: there will be more standard deviations when we are working on larger fractions as the smaller one will be skipped and the information will be less overlapping. As well as the longer fragment analysis characterize the poor annotated gene coding sequence.
Pros of 20000 annotated protein coding genes: These are very important for the analysis of smaller fragments.
Cons: The larger fragment analysis will require more time as well as there will be more chances of errors.