In: Biology
Sequencing TechniquesWhat do the terms ‘assembly’ and ‘mapping’ mean in the context of sequencing experiments? How do Sanger and Illumina sequencing work (just in general terms)What sample preparation steps are generally shared among Next Generation Sequencing technologies?
ANSWER
Assembly means construction of longer sequence from smaller sequence reads.Powerful computer algorithms are utilized to piece the resulting sequence reads back together into longer continuous stretches of sequence (contigs), a process known as de novo assembly. For correct assembly, it is important that there is sufficient overlap between the sequence reads at each position in the genome, which requires high sequencing coverage (or read depth). Naturally, for longer sequence reads, more overlap can be expected, reducing the required raw read depth. Usually, longer fragments (several hundred base pairs) are sequenced from both ends (paired-end sequencing) to provide additional information on correct read placement in the assembly.
The mapping is the process of comparing each one of the reads with the reference genome. We will obtain one alignment, or more, between each read and the genome.When studying an organism sequence with a reference sequence , it is possible to infer which transcripts are expressed by mapping the reads to the reference genome (genome mapping) or transcriptome (transcriptome mapping). Mapping reads to the genome requires no knowledge of the set of transcribed regions or the way in which exons are spliced together. This approach allows the discovery of new, unannotated transcripts.
ANSWER )
Sanger sequencing, also known as the “chain termination method”, is a method for determining the nucleotide sequence of DNA. The method was developed by two time Nobel Laureate Frederick Sanger and his colleagues in 1977, hence the name the Sanger Sequence. Sanger sequencing is a targeted sequencing technique that uses oligonucleotide primers to seek out specific DNA regions. Sanger sequencing begins with denaturation of the double-stranded DNA. The single-stranded DNA is then annealed to oligonucleotide primers and elongated using a mixture of deoxynucleotide triphosphates (dNTPs), which provide the needed arginine (A), cytosine (C), tyrosine (T), and guanine (G) nucleotides to build the new double-stranded structure. In addition, a small quantity of chain terminating dideoxynucleotide triphosphates (ddNTPs) for each nucleotide is included. The sequence will continue to extend with dNTPs until a ddNTP attaches. As the dNTPs and ddNTPs have an equal chance of attaching to the sequence, each sequence will terminate at varying lengths. Each ddNTP (ddATP, ddGTP, ddCTP, ddTTP) also includes a fluorescent marker. When a ddNTP is attached to the elongating sequence, the base will fluoresce based on the associated nucleotide. By convention, A is indicated by green fluorescence, T by red, G by black, and C by blue. A laser within the automated machine used to read the sequence detects a fluorescent intensity that is translated into a “peak.” When a heterozygous variant occurs within a sequence, loci will be captured by two fluorescent dyes of equal intensity. When a homozygous variant is present, the expected fluorescent color is replaced completely by the new base pair’s color
Illumina dye sequencing is a molecular technique used to determine the series of base pairs in DNA, also known as DNA sequencing. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single bases as they are introduced into DNA strands.Illumina sequencing is based on the incorporation of reversible dye terminators that enable the identification of single bases as they are incorporated into DNA strands. The basic procedure is as follows. DNA molecules are first attached to primers on a slide and amplified so that local clusters are formed. The four types (A/T/C/G) of reversible terminating nucleotides are added, and each nucleotide is fluorescently labeled with a different color and attached to a blocking group. The four nucleotides then compete for binding sites on the template DNA to be sequenced, and nonincorporated molecules are washed away. After each synthesis, a laser is applied to remove the blocking group and the probe. A detectable fluorescent color specific to one of the four bases then becomes visible, allowing for sequence identification and initiating the beginning of the next cycle. The process is repeated until the entire DNA molecule is sequenced.
ANSWER ) THE SAMPLE PREPARATION STEP FOR NEXT GENERATION SEQUENCING INCLUDE
EXTRACTION OF DNA
The first step is extracting the genetic material– DNA or RNA– from cells and tissues. Extraction entails breaking down the extracellular matrix and opening the cell membranes using enzymes, solvents, or surfactants. The DNA in the resulting mixture must then be isolated. The traditional gold standard in DNA isolation is phenol-based extraction. Phenol is a hydrophobic solvent that denatures and dissolves proteins, removing them from the DNA-containing aqueous phase.Spin columns that specifically bind DNA provide an alternative and are an easy-to-use, but expensive, method to wash away the debris. Chloroform-based extraction, another alternative, enables you to isolate high-quality DNA without phenol, and commercial kits can include a resin that minimizes the risk of contamination
AMPLIFICATION(OPTIONAL)
Amplification after extraction is optional, depending on your application and sample size. For example, whole genome sequencing (WGA) with 2 µg of starting material does not necessarily require further amplification. But, with nanograms—or even picograms—of starting material, amplification becomes essential to obtain sufficient coverage for reliable sequence calls. Isothermal amplification and polymerase chain reaction (PCR) are two common methods to increase the amount of input DNA. PCR uses generic primers to amplify the starting material in a highly uniform manner, but tends to be more error-prone than multiple displacement amplification (MDA).
LIBRARY PREPARATION
Most NGS platforms analyze DNA in uniform, bite-size pieces, created by DNA fragmentation. This process generates a ‘library’ of fragments with a narrow length distribution that is optimal for the sequencing platform.
Both mechanical fragmentation (shearing) and enzymatic methods are suitable for NGS. Mechanical methods enable random shearing to produce a variety of overlapping fragments for any given region of the genome. This is ideal for de novo assembly
The fragments generated have single-stranded, ‘sticky’ ends. The next step, end-repair, fills in these sticky ends to create blunt ends, ready for adaptor ligation.
Adaptors are then bound to both the 5’ and the 3’ ends of the library fragments. They are specific to the sequencing platform, but ultimately all serve to enable in-platform clonal amplification, i.e. Illumina’s bridge amplification or BGI’s rolling circle amplification.
NOTE:These library preparation steps are generally applicable to whole genome sequencing.In amplicon-based target enrichment, the fragmentation and end-repair steps tend to be unnecessary. Pulling the targeted regions out as amplicon fragments with blunt ends enables you to go directly to adaptor ligation.