In: Computer Science
Please write a basic function using Python. Please comment all steps. Thank you!
Experimentally determined molecular structures are stored in the Protein Data Bank. Protein Data Bank format is a standard for files containing atomic coordinates which are stored in the “ATOM” record. Write a Python function to extract x coordinate of each atom for a given PDB file. Test your function with the provided “1a3d.pdb” file as the example. Also, give a good thought what would be the proper data type for x coordinate.
Contents of the 1a3d.pdb file:
ATOM 1 N ASN A 1 57.429 52.566 66.234 1.00 11.00 N
ATOM 2 CA ASN A 1 57.401 52.920 64.787 1.00 13.62 C
ATOM 3 C ASN A 1 57.241 51.647 63.993 1.00 12.30 C
ATOM 4 O ASN A 1 57.394 50.528 64.511 1.00 13.98 O
ATOM 5 CB ASN A 1 58.652 53.792 64.352 1.00 11.78 C
ATOM 6 CG ASN A 1 59.934 53.013 64.271 1.00 14.53 C
ATOM 7 OD1 ASN A 1 59.937 51.833 63.961 1.00 17.83 O
ATOM 8 ND2 ASN A 1 61.041 53.653 64.610 1.00 18.58 N
ATOM 9 N LEU A 2 56.891 51.813 62.733 1.00 15.05 N
ATOM 10 CA LEU A 2 56.682 50.689 61.820 1.00 16.69 C
ATOM 11 C LEU A 2 57.715 49.557 61.882 1.00 16.96 C
ATOM 12 O LEU A 2 57.354 48.393 61.884 1.00 16.70 O
ATOM 13 CB LEU A 2 56.635 51.236 60.401 1.00 20.56 C
ATOM 14 CG LEU A 2 56.295 50.233 59.323 1.00 26.35 C
ATOM 15 CD1 LEU A 2 54.885 49.716 59.572 1.00 27.75 C
ATOM 16 CD2 LEU A 2 56.411 50.917 57.970 1.00 27.62 C
ATOM 17 N TYR A 3 59.006 49.894 61.877 1.00 15.94 N
ATOM 18 CA TYR A 3 60.075 48.891 61.913 1.00 16.81 C
ATOM 19 C TYR A 3 60.140 48.155 63.238 1.00 16.30 C
ATOM 20 O TYR A 3 60.425 46.938 63.288 1.00 16.54 O
ATOM 21 CB TYR A 3 61.442 49.547 61.609 1.00 16.00 C
ATOM 22 CG TYR A 3 61.665 49.814 60.140 1.00 15.35 C
ATOM 23 CD1 TYR A 3 62.265 48.869 59.327 1.00 14.53 C
ATOM 24 CD2 TYR A 3 61.317 51.039 59.577 1.00 18.90 C
ATOM 25 CE1 TYR A 3 62.525 49.123 57.992 1.00 14.89 C
ATOM 26 CE2 TYR A 3 61.569 51.315 58.216 1.00 17.39 C
ATOM 27 CZ TYR A 3 62.179 50.348 57.437 1.00 17.30 C
ATOM 28 OH TYR A 3 62.427 50.624 56.104 1.00 17.17 O
ATOM 29 N GLN A 4 59.836 48.870 64.313 1.00 14.44 N
ATOM 30 CA GLN A 4 59.850 48.231 65.615 1.00 15.12 C
ATOM 31 C GLN A 4 58.665 47.258 65.724 1.00 15.52 C
ATOM 32 O GLN A 4 58.781 46.210 66.362 1.00 14.91 O
ATOM 33 CB GLN A 4 59.802 49.295 66.685 1.00 15.14 C
ATOM 34 CG GLN A 4 61.021 50.221 66.666 1.00 17.99 C
ATOM 35 CD GLN A 4 60.921 51.273 67.741 1.00 20.05 C
ATOM 36 OE1 GLN A 4 59.978 52.055 67.719 1.00 18.28 O
ATOM 37 NE2 GLN A 4 61.855 51.274 68.714 1.00 18.47 N
ATOM 38 N PHE A 5 57.530 47.594 65.111 1.00 13.82 N
ATOM 39 CA PHE A 5 56.363 46.691 65.118 1.00 13.83 C
ATOM 40 C PHE A 5 56.764 45.436 64.296 1.00 14.36 C
ATOM 41 O PHE A 5 56.530 44.298 64.693 1.00 15.15 O
ATOM 42 CB PHE A 5 55.148 47.419 64.487 1.00 13.48 C
ATOM 43 CG PHE A 5 53.863 46.590 64.416 1.00 13.88 C
ATOM 44 CD1 PHE A 5 53.558 45.607 65.376 1.00 11.38 C
ATOM 45 CD2 PHE A 5 52.911 46.884 63.410 1.00 12.99 C
ATOM 46 CE1 PHE A 5 52.287 44.920 65.339 1.00 12.60 C
ATOM 47 CE2 PHE A 5 51.645 46.217 63.342 1.00 13.09 C
ATOM 48 CZ PHE A 5 51.335 45.245 64.313 1.00 11.98 C
ATOM 49 N LYS A 6 57.402 45.664 63.154 1.00 15.35 N
ATOM 50 CA LYS A 6 57.867 44.578 62.324 1.00 15.33 C
First, let's understand the 1a3d.pdb file. This file contains the records in a specific format. For example, let's take the first record.
ATOM 1 N ASN A 1 57.429 52.566 66.234 1.00 11.00 N
Each record in the file includes 12 elements that are described as follows:
ATOM: Description Name
1: Count
N: Element
ASN: Amino Acid
A: Chain Name
1: Sequential Number
57.429: X - Coordinate
52.566: Y - Coordinate
66.234: Z - Coordinate
1.00: Occupancy
11.00: Temperature factor
N: Atom Type
Hence, we can easily conclude that X-coordinate is stored in column number 7. Now, we know that coordinates are a matter of precision. Therefore, the correct data type for the molecular coordinates must be float or double.
Python code for the provided problem statement
# main
def main():
# open the data file in read mode
file = open('1a3d.pdb', 'r')
# now, store each line of the data file to the list using readlines
Lines = file.readlines()
# display headers for the output
print("X-coordinate",'%20s'%'Data Type')
# now, iterate each line one by one from the list
for line in Lines:
# initialize empty list that will store the elements of the current line
lst = []
# now, store the elements of the line to the list
# this will store element as strings
lst = line.strip().split()
# we know that in the provide file format, X-coordinate is stored
# at index 6, hence display that element
print(lst[6], end='')
# now, convert string to float
temp = float(lst[6])
# and then display the type of temp
print('%30s' % type(temp))
# program start
main()
Program Output