In: Biology
You and your lab mate, Eugene Yous, are performing expression-profiling experiments using RNA-Seq. You have extracted mRNA from a mouse liver. Both you and Eugene profile the same exact mRNA sample, but you decide to use polyT primer to make your cDNA whereas Eugene decides to use random priming. You obtain the exact same results across the genome except at one locus, the gene Ipt25. You find 250 reads map to Ipt25 whereas Eugene finds 45,000 reads mapping to Ipt25.
a. Propose an explanation for the discrepancy.
b. After scaling both data sets so that the total number of reads are identical in both yours and Eugene's experiments (e.g. RPKM), what will be the effect of the difference in Ipt25 expression on the observed expression of all the OTHER genes?
c. Suggest an alternative normalization scheme that is more appropriate for this problem
Conversion of mRNA to cDNA requires the enzyme reverse transcriptase and primers, on which the enzyme will build the DNA structure.
The primer selected during the process, is essential for the specificity of the enzyme, reverse transcriptase action. The type of primer also determines the length of mRNA that will be transcribed to cDNA.
a.
In this experiment, two types of primers that were selected for separate generation of cDNA from same mRNA sequence.
1. Poly T primer: Poly T primers, include poly or oligo deoxy Thymidine (d T) ribonucleotide (N). They anneal to poly Adenosine (poly A) tail of mRNA. They are suited for two step cDNA synthesis, because they are specific to mRNA.
However, if there are other poly A sequences, within the mRNA, then the Poly T primer may bind to them. This will lead to mis-priming and full-length transcript will not be generated. This also prevents transcriptions of sequences with poly A.
Moreover, since poly T primers, initiate cDNA synthesis from 3’ end, only proximal genes may be transcribed. The secondary structure of mRNA, may prevent complete cDNA synthesis.
2. Random primer: Random primers are oligo ribonucleotide sequences (usually hexamer). The contains mixture of nucleotides, ad these primers ca bind to different complementary sequences, throughout entire length of the mRNA.
They can synthesize cDNA from mRNA, with or without the presence of poly A tail, or using Klenow fragment. The ratio of random primers and reverse transcriptase may be adjusted to get a complete cDNA transcribe.
But, random primers can simultaneously recognize sequences in other RNA forms, like, r RNA, or t-RNA, apart from mRNA.
For these variations in the properties of primer there may have been discrepancies in the sequence, obtained by the two-different process using different primers.
b. RNA date represented as RPKM or reads per kilo base per million map reads, helps in normalizing the variation in the counts for the same sequence, obtained from different data.
This will result in the shorter transcript (for ltp25 gene), of 250 read maps and 45000 read maps, to have similar expression level by normalization. This is done by dividing the read count with one million (RPM value), follwed by division with base count of reads. Thus, the other sequences will also be expressed in similar manner, as per million counts, and not by the exact sequence expression. Thus, only exonic reads are obtained, for the sequences.
RPKM = number of reads/exon reads x 1000000/ no. of map reads in the sample.
c. Alternative method for normalization could be transcript per kilobase million (TPM). This method involves, first normalization of gene length, then normalizing by sequence depth.