| Problem Statement |
|
You have the mRNA sequence that results from the transcription of the Homo sapiens Hemoglobin subunit beta gene. Knowing that the 5' and 3' ends of the mRNA are processed post-transcriptionally, you know that the start codon and termination codon lie somewhere inside the sequence. A manual inspection of the mRNA sequence should reveal the locations of the start and stop codons, but to ensure you don't miss anything you decide to write a Python script to analyze the mRNA sequence and find the positions of both codons. You have the mRNA sequence, in the 5' to 3' direction, in a text file:
acauuugcuucugacacaacuguguucacuagcaaccucaaacagacaccauggugcauc From the lecture you know that the canonical start codon is AUG, and you know the 3 stop codons are UAA, UAG, and UGA. |
| Requirements |
|
We have covered enough Python to accomplish this task. The basic idea is to store the mRNA sequence as a string value, then take advantage of the string's find() function to locate the start codon and the first stop codon. We haven't covered how to read data from a file in Python yet, but you can copy and paste the sequence into a script. Your script's output should follow the template shown below: Homo sapiens HBB mRNA: Translation start: <position of first AUG codon> Translation Stop: <position of first stop codon after the
start codon> # of amino acids in the HBB protein: <number of amino acids encoded from Translation start to Translation stop> Commenting Be sure to comment your code meaningfully! This not only helps you to understand your code, it also helps me understand your thought processes, which is important for awarding partial credit when necessary. Commenting is also one of the rubric items, so if you do not comment your code you will lose points. Data Storage in the Script How you store the data inside a script is very important. In general, you want to minimize hardcoding data values, especially if they will be used repeatedly. "Hardcoding" means to use the literal data value in your code instead of storing it in a variable. Every place where a data value is hardcoded represents a potential source of error. If that data value has to be changed, and it is hardcoded, every instance of that value in the script must be changed to avoid errors. If you instead store the value in a variable, and use the variable name in the script instead of the data value itself, you only have to change the data value once, where the variable is initialized. With this in mind, you should store the following initial data at the top of your script, to be used later:
Use of Upper vs. Lower Case Whether you use upper case or lower case for the sequence data and codons is entirely up to you. Just be sure that you are consistent throughout your script. Displaying the mRNA Sequence The first item in the output is the display of the mRNA sequence itself. On one line you should display the species, Homo sapiens, followed by the abbreviation ("HBB") of the gene. Below this line the mRNA sequence itself should be displayed at 60 bases per line, which is the same convention used by GenBank. You do not need to include numbering or spaces every 10 bases like GenBank does, however. Hint: Use a for loop combined with the range function with an increment of 60, and print each line as a slice, or substring, of 60 bases beginning with the current position in the loop. Finding and Displaying the Position of the Translation Start Codon The translation start codon will be the first occurrence of the canonical start codon, AUG, as the mRNA is read from left to right. You can use the string's find() function, which we covered in the Module 1 Python lecture to do this. One important thing to keep in mind is that Python treats strings as 0-based in terms of indexing, meaning the first base in the mRNA is at position 0, not 1. When you display the position of the start codon you must remember to add 1 to the position returned by the find() function, since we read nucleotide sequences as 1-based, with the first base starting at position 1. Caution: Do not add 1 to the position of the codon when you store it, or you will run the risk of error when you use the position for searches, etc. Only add 1 to the position when you are displaying the codon's position; e.g.: print("Translation start:", start_codon_pos + 1) In the example above, the variable, start_codon_pos is not changed; the values of start_codon_pos and the "+ 1" are dynamically added in a different, local variable that is passed as an argument to the print() function, and this local variable is lost once the print() function is done. Finding and Displaying the Position of the Translation Stop Codon There are 3 possible stop codons, UAA, UAG, and UGA, and any one of these will signal translation to terminate. You can find the stop codon using a similar approach to finding the start codon. There are a few things to bear in mind, however:
Be sure to store the position of the stop codon in a variable so you can display it after you have found it. Hint: This is another good use of a loop with the range function. The range function should begin at the first codon after the start codon, and use an increment of 3 to read the sequence one codon at a time. Inside the loop use an if-elif-elif block to check for each of the stop codons. The stop codons are stored in a list, so you can use list indexes (0, 1, 2) to access individual stop codons. Once a stop codon is found, use the break statement to terminate the loop immediately. Don't forget to store the position of the stop codon in a variable, since you will need to display it. Calculating and Displaying the Number of Amino Acids in the HBB Protein Once you have the positions of both the start and stop codons, you can calculate how many amino acids are encoded by the HBB mRNA. Keep in mind the positions of the start and stop codons give the length of the mRNA in bases, not codons, but the number of amino acids will always be equal to the number of codons. Hint: The math involved here is pretty straightforward, but Python will end up giving you a result that is a floating point value. To convert the floating point value to an integer, use Python's int() function: int_value = int(floating_point_value) |
In: Computer Science
8. Tuberculosis
The genus mycobacterium contains over 50 species with several human pathogens of concern.
Mycobacteria are distinguishable from other types of bacteria by the presence of wax layers and
high molecular weight fatty acids (mycolic acids). This complex, external structure offers
protection from acids, drying and some germicides. In fact, mycobacteria are also referred to as
acid-fast bacilli because acid treatments will not result in decolorization during staining.
One of the mycobacteria species of medical interest is Mycobacterium tuberculosis, the
causative agent of tuberculosis (TB). TB is a disease that affects millions of people worldwide.
In fact, the Center for Disease Control (CDC) estimates that one third of the total world
population is infected with TB.
M. tuberculosis normally attacks the lungs, but can infrequently infect other areas of the body.
An infection with M. tuberculosis can result in latent TB infection or TB disease. Latent TB
infections occur when M. tuberculosis is present but not active. People with latent TB infections
do not exhibit any symptoms, do not feel sick, and are not infectious. If the M. tuberculosis
becomes active and multiplies, the person will develop TB disease.
People with TB disease are infectious. TB is spread primarily by M. tuberculosis becoming
airborne in droplets of respiratory mucus when a person with TB disease coughs, sneezes,
sings or speaks. By breathing in the airborne bacteria, the new person is inoculated. Symptoms
of TB disease include pain in the chest, coughing up blood or sputum, or a bad cough that lasts
three weeks or longer. TB is tested for by either a TB skin test (TST) or by a TB blood test.
Direct identification of acid-fast bacilli in sputum is also used to detect M. tuberculosis. Current
treatments for TB include long term use (6 to 24 months) of a combination of medications.
Questions:
1. What are some of the other symptoms of TB disease not mentioned above?
2. In the U.S., Certain populations have a disproportionate rate of TB. Which populations
are these?
3. How have antibiotic resistant strains of M. tuberculosis hindered treatment?
4. Why are many health care workers required to get tested for TB?
5. A chest x-ray is used occasionally to detect lung damage in TB patients. What is the
radiologist looking for in the x-ray?
6. How are giant African rats being used to detect TB?
7. What other sites in the body can be infected by M. tuberculosis?
8. What is the primary habitat for M. tuberculosis?
In: Biology
1. The restriction enzyme Sau3AI recognizes the following sequence: 5'-GATC-3'. On average, how often should this enzyme cleave DNA? In contrast, the restriction enzyme Natl recognizes the following sequence: 5'-GCGGCCGC-3'. On average, how often should this enzyme cleave DNA? Does Natl cleave DNA more frequently than Sau3AI?
2. An uncharacterized plasmid DNA was cleaved using several restriction enzymes individually and in various combinations. The DNA fragment sizes were determined by agarose gel electrophoresis and the restriction enzyme recognition sites were mapped. Subsequently, the DNA was sequenced, and an extra recognition site was found for one of the enzymes. However, all the other mapping data was consistent with sequence data. What are the simplest explanations for this discrepancy? Assume the DNA sequence had no errors.
3. A plasmid was cleaved with several restriction enzymes, individually and in combinations. The following fragment sizes (base pairs) were determined by agarose gel electrophoresis.
Note: There may be some slight discrepancy in summing up the total base pairs. Indicate the distances between sites.
|
Eco RI |
4363 |
|
|
Ava I |
2182 |
|
|
Pvu II |
4363 |
|
|
Pstl |
4363 |
|
|
Eco RI -Ava I |
2938 |
1425 |
|
Eco RI - Pst I |
3609 |
754 |
|
Ava I - Pvu II |
3722 |
641 |
|
Ava I - Pst I |
2182 |
|
|
Pvu II - Pst I |
2820 |
1543 |
|
Eco RI - Pvu II |
2297 |
2066 |
Make a restriction map based on this data. Hand draw on separate paper and submit the picture of your drawing map.
Note: There may be some slight discrepancy in summing up the total base pairs. Indicate the distances between sites.
4. Why is only one band detected in the Ava I - Pst I co-digest?
In: Biology
AlCl3 and BCl3 are strong Lewis Acids. But, CaCl2 is not a Lewis acid. Explain why. (Hint: Write the reactions of these substances with H2O and explain).
In: Chemistry
The same study reported that the following reaction also tends to go forward:
AsH2–+ H2S ––––> AsH3+ HS–
Which is the stronger acid, H2S or AsH3? Is this in agreement with the general trends for binary acids? This is an interesting example in that both the trends across the periodic table and down the periodic table are involved. Which has a greater influence, the trend across the table or the trend down the table? Does this fit with what you know about the relative acidities ofbinary acids of the second row (CH4, NH3, OH2, etc) and the halogen group(HF, HCl, HBr, etc.)?
In: Chemistry
1. What are the subunits found in carbohydrates? What are the properties of these subunits? How do the properties of these subunits influence the structure and properties of carbohydrates?
2. What are the subunits found in proteins? What are the properties of these subunits? How do the properties of these subunits influence the structure and properties of proteins?
3. What are the subunits found in lipids? What are the properties of these subunits? How do the properties of these subunits influence the structure and properties of lipids?
4. What are the subunits found in nucleic acids? What are the properties of these subunits? How do the properties of these subunits influence the structure and properties of nucleic acids?
In: Biology
Use this data:
234 235 345 234 678 56 21 347 674 231 67 89 876 875 457 357 991 667 643
Using the above data do:
Population Mean, Geometric Mean and Population Variance. Also, solve for the Mode and Median
2. Using the first 4 numbers above, solve for
Sample Mean, Sample Variance
3. Using this data:
23 12 4 5 6 7 9 12 34
Solve for Mean, Mode, Median, Population Variance and Geometric Mean
4. Using this data:
44 23 12 16
Solve for the Standard Deviation, Population Variance
In: Statistics and Probability
The table t-value associated with 8 degrees of freedom and used to calculate a 99% confidence interval is _______.
Select one:
a. 3.355
b. 1.860
c. 1.397
d. 2.896
Cameron Sinclair, Information Services Manager with Global Financial Service (GFS), is studying employee use of GFS email for non-business communications. He plans to use a 95% confidence interval estimate of the proportion of email messages that are non-business; he will accept a 0.05 error. Previous studies indicate that approximately 30% of employee email is not business related. Cameron should sample _______ email messages.
Select one:
a. 14
b. 323
c. 457
d. 12
In: Math
How does FRET signaling work? Does the R0 value occur only when the
enzyme efficiency is at 50%? or 100%?
In: Biology
Explain how the enzyme AMPK responds to low cellular energy and regulates any one of its target metabolic pathways
In: Biology