In: Biology
Answer a)
ENCODE drives home, however, that there are many “genes” out there in which DNA codes for RNA, not a protein, as the end product. The big surprise of the pilot project was that 93% of the bases studied were transcribed into RNA; in the full genome, 76% is transcribed. ENCODE defined 8800 small RNA molecules and 9600 long noncoding RNA molecules, each of which is at least 200 bases long. Thomas Gingeras of Cold Spring Harbor Laboratory in New York has found that various ones home in on different cell compartments, as if they have fixed addresses where they operate. Some go to the nucleus, some to the nucleolus, and some to the cytoplasm, for example. “So there's quite a lot of sophistication in how RNA works,” says Ewan Birney of the European Bioinformatics Institute in Hinxton, U.K., one of the key leaders of ENCODE.
As a result of ENCODE, Gingeras and others argue that the fundamental unit of the genome and the basic unit of heredity should be the transcript—the piece of RNA decoded from DNA—and not the gene. “The project has played an important role in changing our concept of the gene,” Stamatoyannopoulos says.
Answer b)Throughout the 1990s, various researchers called the idea of junk DNA into question. With the human genome in hand, the National Human Genome Research Institute (NHGRI) in Bethesda, Maryland, decided it wanted to find out once and for all how much of the genome was a wasteland with no functional purpose. In 2003, it funded a pilot ENCODE, in which 35 research teams analyzed 44 regions of the genome—30 million bases in all, about 1% of the total genome. In 2007, the pilot project's results revealed that much of this DNA sequence was active in some way. The work called into serious question our gene-centric view of the genome, finding extensive RNA-generating activity beyond traditional gene boundaries. But the question remained whether the rest of the genome was like this 1%. “We want to know what all the bases are doing,” says Yale University bioinformatician Mark Gerstein.
Another way to test for functionality of DNA is to evaluate whether specific base sequences are conserved between species, or among individuals in a species. Previous studies have shown that 5% of the human genome is conserved across mammals, even though ENCODE studies implied that much more of the genome is functional. So MIT's Lucas Ward and Kellis compared functional regions newly identified by ENCODE among multiple humans, sampling from the 1000 Genomes Project. Some DNA sequences not conserved between humans and other mammals were nonetheless very much preserved across multiple people, indicating that an additional 4% of the genome is newly under selection in the human lineage, they report in a paper published online by Science. Two such regions were near genes for nerve growth and the development of cone cells in the eye, which underlie distinguishing traits in humans. On the flip side, they also found that some supposedly conserved regions of the human genome, as high-lighted by the comparison with 29 mammals, actually varied among humans, suggesting these regions were no longer functional.
Beyond transcription, DNA's bases function in gene regulation through their interactions with transcription factors and other proteins. ENCODE carried out several tests to map where those proteins bind along the genome. Two, DNase-seq and FAIRE-seq, gave an overview of the genome, identifying where the protein-DNA complex chromatin unwinds and a protein can hook up with the DNA, and were applied to multiple cell types. ENCODE's DNase-seq found 2.89 million such sites in 125 cell types. Stamatoyannopoulos and his colleagues describe their more extensive DNase-seq studies in Science: His team examined 349 types of cells, including 233 60- to 160-day-old fetal tissue samples. Each type of cell had about 200,000 accessible locations, and there seemed to be at least 3.9 million regions where transcription factors can bind in the genome. Across all cell types, about 42% of the genome can be accessible, he and his colleagues report. In many cases, the assays were able to pinpoint the specific bases involved in binding.