In: Biology
1.Write notes on the National Centre for Biotechnology Information (NCBI) database.
2. Describe how the principle of parsimony is employed in order to infer phylogenetic trees. Illustrate your answer with a hypothetical example.
3. When assembling a genome what does the N50 score measure?
4. With respect to phylogenetic trees, write notes on bootstrap support values.
Please answer all 4 questions
1. Answer:
NATIONAL CENTER FOR BIOTECHNOLOGY DATA BASE (NCBI):
It is a multi-disciplinary research team that is being served as a resource for molecular biology information. It was formed in the year 1988 to conduct complementary activities of the National Institutes Of Health (NIH) and the National Library Of Medicine (NLM). It’s facilities are located in BETHESDA, MARYLAND, USA.
Previously, NCBI’s creation was being intended to aid in understanding the molecular mechanisms that affect human health and disease with the following concepts:
· To create and maintaining public databases.
· Develop software to analyze genomic data.
· To conduct research in computational biology.
Later on, due to wide spread of internet, NCBI focused on the role of pure biological research. Molecular biology became prominent as biomedical research and various specialized databases were being created by theNCBI, to compliment those that control directly with human health.
NCBI providing access to analysis and computing tools, which allows to researchers and the public.
NCBI formed database standards like database nomenclature which are used by other non-NCBI databases. The most useful software is GenBank. It is GenBank as the model database nucleic acid sequence database that contains sequence information nearly 2 lakh different organisms.
Role:
To maintain available databases and open to public
It is one of the key criteria for a biological database is persistent data.
Fast and inexpensive database.
GenBank, a database containing all known nucleic acid sequences, is one of the members of the “Triple Entente” of sequence databases.
The other two are the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ), all three of which are part of the International Nucleotide Sequence Database project.
Various methods are used to generate the sequence information found in Genbank.
Around 70% of all sequences in GenBank are being ESTs (Expressed Sequence Tags), which are generated by reverse transcribing mRNAs into complementary cDNAs, and then performing single-pass sequencing on those cDNAs. ESTs thus represent segments of DNA that code for an mRNA.
Entrez is the nucleotide database Genbank, which links to the following databases like PubMed, Protein Sequence, Genomes, Taxonomy, Structure, Population, Online Mendelian Inheritance in Man (OMIM), Books, and 3D Domains. Connections between these entries in a database are being as called neighbours, and connections between these entries of different databases are being called as hardlinks.
Other database like LocusLink which is a retrieval systems offered by NCBI .It contains the Taxonomy Browser, and Gene and descriptive information about genes and is based on curated data is provided. The Taxonomy Browser offers information on linkeage of organisms that have corresponding sequences in GenBank.
2. Answer:
Hypothetical example:
Four species of humming birds, all of which is having well toned, but only three of which can easily hum or sing. Based on parsimonious possible model would be that all four species have one ancestor, and that assumption will be true, if we observe that whether it is well toned or not. But when we add the presence or absence of humming or singing. When we add the second feature, it is more likely that the three species that humming or singing have a common ancestor than that the trait of humming or singing arose from two different evolutionary paths. Based on parsimonious tree would have a branch linking the three humming or singing species with a single common ancestor and then link that common ancestor with a common ancestor for all four species, the root species.
3. Answer:
The N50 is defined as the minimum contig (a set of overlapping DNA segment or sequence data) length needed to cover 50% of the genome.
N50 is a measure to describe the quality of assembled genomes that are fragmented in contigs (a set of overlapping DNA segment or sequence data) of different length.
It is meant by that half of the genome sequence is in contigs larger than or equal the N50 contig size or that the sum of the lengths of all contigs of size N50 or longer contains at least 50 percent of its total genome sequence.
4. Answer:
Bootstrapping is a procedure where allow to take a random subset of the data and conducting re-run the phylogenetic analysis, and the reported value is the percentage of bootstrap replicates.
Bootstrap support values must be analyzed carefully. Most researchers consider 70% or above as a good. Support, but others consider as low as 50% as probably significant. However, you may obtain a higher support if you include more information in your phylogenetic analyses, i.e. more loci or a longer fragment of the same gene. One interesting thing is that sometimes you can get bootstrap values below 70%, but when you analyze the same dataset with a different method, for example Bayesian analysis, you get clearly good statistical support (posterior probabilities of 0.95 or higher).
If probabilities are higher after cross checking with other methods then boot strap values supports. Thus, 100 mean that the node is well-supported in all bootstrap replicates.
Note: The provided answer is as per my knowledge may be or may not be 100% correct, but definitely revelant to your question thank you so much.