In: Biology
(a) Explain what over dispersion is in RNA-seq data.
(b) Explain why small indels often appear as SNP dense regions after an initial read mapping.
(c) For each of these problems, which technique is best: nascent transcription or RNA-seq and Why?
i. Identifying the immediate transcriptional targets of a perturbation
ii. Identifying isoforms utilized.
iii. Detecting alternative 3’ end (cleavage site) usage.
A] Overdispersion means a non uniform distribution of sequencing read counts for each base position in a gene.
RNA sequencing is a more reliable method to quantify gene expression than micro-arrays. In RNA sequencing experiments , the expression level of genes is measured by the count of short reads that are mapped to the gene region. When overdispersion occurs, these genes are distributed randomly and in an unplanned and overcrowded manner on the RNA.
b] Indels or insertions and deletions, are an important source of genetic as well as phenotypic divergence and diversity. SNP or single nucleotide polymorphisms occur by the replacement or addition of any base pairs on the chromosomes. SNPs may fall within coding sequence of genes, non-coding sequence of genes or in the intergenic regions i.e regions between genes . Since the genomic distribution of SNPs is not homogenous, it might appear as small indels.
c [i] RNA -seq because it helps to reveal the presence and quantity of RNA i a biological sample at a given moment and helps to analyse the continuously changing cellular transcriptome.
[ii] RNA- seq because isoforms were identified as DE during sequencing of genes.
[iii] Nascent transcription because NET - seq is based on deep sequencing of 3' ends of nascent transcripts .
Nascent transcription captures splicing intermediates i.e the 5' exon