In: Computer Science
You shall implement six static methods in a class named BasicBioinformatics. Each of the methods will perform some analysis of data considered to be DNA. DNA shall be represented arrays of chars containing only the characters A, C, G and T. In addition to the six methods you will implement, six other methods exist in the class, which use Strings instead of char arrays to represent DNA. These other methods simply invoke the methods you are to implement, so all methods of this class will work correctly once your own methods are complete.
Specifically, start with this code for the BasicBioinformatics class. Six methods in this class currently have method bodies containing only a “to-do” comment, indicating work to be completed:
// TODO
Write code for the body of each to-be-completed method that satisfies the descriptions of the method given in the associated comment (also available as an API-style documentation page here).
Testing
I encourage you to create your own main method in the BasicBioinformatics class (or in a separate class, if you'd like) in order to test your methods. Create small arrays/Strings, pass them to your methods, and inspect the return values. Even better, watch your methods execute in a debugger. However, when you turn in your code, do not include main or any other methods beyond the 12 required.
If you want to create arrays with predetermined values at the time of creation, Java supports the following syntax:
char[] testValues = { 'A', 'E', 'I', 'O', 'U' };
Also remember that the String class has a method named toCharArray that you may find useful.
You could then pass values to your methods and inspect the return values:
String testData1 = "GCCTGTCGTAGCTTATC", testData2 = "GGCTGACGTAGCGTAAC"; int[] baseCounts = nucleotideCounts(testData1); System.out.printf("Nucleotide counts for %s: A: %d C: %d G: %d T: %d%n", testData1, baseCounts[0], baseCounts[1], baseCounts[2], baseCounts[3]); System.out.printf("%s <-- complement --> %s%n", testData1, complement(testData1)); System.out.printf("%s <-- reverse complement --> %s%n", testData1, reverseComplement(testData1)); System.out.printf("%s GC-content: %f%n", testData1, gcContent(testData1)); System.out.printf("Hamming distance between %s and %s: %d%n", testData1, testData2, hammingDistance(testData1, testData2)); System.out.printf("Mutation points between %s and %s:%n%s%n", testData1, testData2, Arrays.toString(mutationPoints(testData1, testData2)));
This is the start of it public class BasicBioinformatics { /** * Calculates and returns the complement of a DNA sequence. In DNA sequences, 'A' and 'T' are * complements of each other, as are 'C' and 'G'. The complement is formed by taking the * complement of each symbol (e.g., the complement of "GTCA" is "CAGT"). * * @param dna a char array representing a DNA sequence of arbitrary length, * containing only the characters A, C, G and T * * @return a char array representation of the complement of the given sequence */ public static char[] complement(char[] dna) { // TODO } /** * Calculates and returns the complement of a DNA sequence. In DNA sequences, 'A' and 'T' are * complements of each other, as are 'C' and 'G'. The complement is formed by taking the * complement of each symbol (e.g., the reverse complement of "GTCA" is "CAGT"). * * @param dna a String representing a DNA sequence of arbitrary length, containing only the * characters A, C, G and T * * @return a String representation of the complement of the given sequence */ public static String complement(String dna) { return new String(complement(dna.toCharArray())); // We just call the other complement method } /** * The GC-content of a DNA sequence is given by the percentage of symbols in the sequence that are * 'C' or 'G'. For example, the GC-content of "AGCTATAG" is .375 (37.5%). * * @param dna a char array representing a DNA sequence of arbitrary length, * containing only the characters A, C, G and T * * @return the GC-content of the sequence, to double precision */ public static double gcContent(char[] dna) { // TODO } /** * The GC-content of a DNA sequence is given by the percentage of symbols in the sequence that are * 'C' or 'G'. For example, the GC-content of "AGCTATAG" is .375 (37.5%). * * @param dna a String representing a DNA sequence of arbitrary length, * containing only the characters A, C, G and T * * @return the GC-content of the sequence, to double precision */ public static double gcContent(String dna) { return gcContent(dna.toCharArray()); // Let the other gcContent method do this work for us } /** * Calculates and returns the Hamming distance between two DNA sequences of equal length. The * Hamming distance between two sequences is the number of points in the sequences where the * corresponding symbols differ. For example, the Hamming distance between "ATTATGC" and "ATGATCC" * is 2. * * @param dna1 a char array representing a DNA sequence of arbitrary length (equal to dna2's * length), containing only the characters A, C, G and T * @param dna2 a char array representing a DNA sequence of arbitrary length (equal to dna1's * length), containing only the characters A, C, G and T * * @return the Hamming distance between the two sequences */ public static int hammingDistance(char[] dna1, char[] dna2) { // TODO } /** * Calculates and returns the Hamming distance between two DNA sequences of equal length. The * Hamming distance between two sequences is the number of points in the sequences where the * corresponding symbols differ. For example, the Hamming distance between "ATTATGC" and "ATGATCC" * is 2. * * @param dna1 a String representing a DNA sequence of arbitrary length (equal to dna2's * length), containing only the characters A, C, G and T * @param dna2 a String representing a DNA sequence of arbitrary length (equal to dna2's * length), containing only the characters A, C, G and T * * @return the Hamming distance between the two sequences */ public static int hammingDistance(String dna1, String dna2) { return hammingDistance(dna1.toCharArray(), dna2.toCharArray()); } /** * Calculates and returns where two DNA sequences of equal lengths differ. For example, given * sequences "ATGT" and "GTGA", the result should be array { true, false, false, true }. * * @param dna1 a char array representing a DNA sequence, containing only the characters A, C, G * and T, with the same length as parameter dna2 * @param dna2 a char array representing a DNA sequence, containing only the characters A, C, G * and T, with the same length as parameter dna1 * * @return an array of boolean values, of length equivalent to both parameters' lengths, * containing true in each subscript where the parameter strings differ, and false where * they do not differ */ public static boolean[] mutationPoints(char[] dna1, char[] dna2) { // TODO } /** * Calculates and returns where two DNA sequences of equal lengths differ. For example, given * sequences "ATGT" and "GTGA", the result should be array { true, false, false, true }. * * @param dna1 a String representing a DNA sequence, containing only the characters A, C, G * and T, with the same length as parameter dna2 * @param dna2 a String representing a DNA sequence, containing only the characters A, C, G * and T, with the same length as parameter dna1 * * @return an array of boolean values, of length equivalent to both parameters' lengths, * containing true in each subscript where the parameter strings differ, and false where * they do not differ */ public static boolean[] mutationPoints(String dna1, String dna2) { return mutationPoints(dna1.toCharArray(), dna2.toCharArray()); } /** * Calculates and returns the number of times each type of nucleotide occurs in a DNA sequence. * * @param dna a char array representing a DNA sequence of arbitrary length, containing only the * characters A, C, G and T * * @return an int array of length 4, where subscripts 0, 1, 2 and 3 contain the number of 'A', * 'C', 'G' and 'T' characters (respectively) in the given sequence */ public static int[] nucleotideCounts(char[] dna) { // TODO } /** * Calculates and returns the number of times each type of nucleotide occurs in a DNA sequence. * * @param dna a String representing a DNA sequence of arbitrary length, containing only the * characters A, C, G and T * * @return an int array of length 4, where subscripts 0, 1, 2 and 3 contain the number of 'A', * 'C', 'G' and 'T' characters (respectively) in the given sequence */ public static int[] nucleotideCounts(String dna) { return nucleotideCounts(dna.toCharArray()); } /** * Calculates and returns the reverse complement of a DNA sequence. In DNA sequences, 'A' and 'T' * are complements of each other, as are 'C' and 'G'. The reverse complement is formed by * reversing the symbols of a sequence, then taking the complement of each symbol (e.g., the * reverse complement of "GTCA" is "TGAC"). * * @param dna a char array representing a DNA sequence of arbitrary length, containing only the * characters A, C, G and T * * @return a char array representation of the reverse complement of the given sequence */ public static char[] reverseComplement(char[] dna) { // TODO } /** * Calculates and returns the reverse complement of a DNA sequence. In DNA sequences, 'A' and 'T' * are complements of each other, as are 'C' and 'G'. The reverse complement is formed by * reversing the symbols of a sequence, then taking the complement of each symbol (e.g., the * reverse complement of "GTCA" is "TGAC"). * * @param dna a string representing a DNA sequence of arbitrary length, containing only the * characters A, C, G and T * * @return a String representation of the reverse complement of the given sequence */ public static String reverseComplement(String dna) { return new String(reverseComplement(dna.toCharArray())); } }
I've coded main in the same class. Below is the code:
import java.util.Arrays;
public class BasicBioinformatics {
/**
* Calculates and returns the complement of a DNA
sequence. In DNA sequences,
* 'A' and 'T' are complements of each other, as are
'C' and 'G'. The complement
* is formed by taking the complement of each symbol
(e.g., the complement of
* "GTCA" is "CAGT").
*
* @param dna
* a char array representing a DNA sequence of
arbitrary length,
* containing only the characters A, C, G and T
*
* @return a char array representation of the
complement of the given sequence
*/
public static char[] complement(char[] dna)
{
/* creates a new char array of same
length */
char[] complement = new
char[dna.length];
for (int i = 0; i < dna.length;
i++) {
/* form a
complement of dna sequence */
if (dna[i] ==
'G')
complement[i] = 'C';
else if (dna[i]
== 'C')
complement[i] = 'G';
else if (dna[i]
== 'T')
complement[i] = 'A';
else if (dna[i]
== 'A')
complement[i] = 'T';
}
return complement;// return the
complemented sequence
}
/**
* Calculates and returns the complement of a DNA
sequence. In DNA sequences,
* 'A' and 'T' are complements of each other, as are
'C' and 'G'. The complement
* is formed by taking the complement of each symbol
(e.g., the reverse
* complement of "GTCA" is "CAGT").
*
* @param dna
* a String representing a DNA sequence of arbitrary
length,
* containing only the characters A, C, G and T
*
* @return a String representation of the complement of
the given sequence
*/
public static String complement(String dna) {
return new
String(complement(dna.toCharArray())); // We just call the other
complement method
}
/**
* The GC-content of a DNA sequence is given by the
percentage of symbols in the
* sequence that are 'C' or 'G'. For example, the
GC-content of "AGCTATAG" is
* .375 (37.5%).
*
* @param dna
* a char array representing a DNA sequence of
arbitrary length,
* containing only the characters A, C, G and T
*
* @return the GC-content of the sequence, to double
precision
*/
public static double gcContent(char[] dna)
{
/* variable to store sum and avg
*/
double gcContent;
int numOfgc = 0;
for (int i = 0; i < dna.length;
i++) {
if (dna[i] ==
'G' || dna[i] == 'C') {
numOfgc += 1;
}
}
gcContent = (double) numOfgc /
dna.length;
return gcContent; // return the
gcContent
}
/**
* The GC-content of a DNA sequence is given by the
percentage of symbols in the
* sequence that are 'C' or 'G'. For example, the
GC-content of "AGCTATAG" is
* .375 (37.5%).
*
* @param dna
* a String representing a DNA sequence of arbitrary
length,
* containing only the characters A, C, G and T
*
* @return the GC-content of the sequence, to double
precision
*/
public static double gcContent(String dna) {
return
gcContent(dna.toCharArray()); // Let the other gcContent method do
this work for us
}
/**
* Calculates and returns the Hamming distance between
two DNA sequences of
* equal length. The Hamming distance between two
sequences is the number of
* points in the sequences where the corresponding
symbols differ. For example,
* the Hamming distance between "ATTATGC" and "ATGATCC"
is 2.
*
* @param dna1
* a char array representing a DNA sequence of
arbitrary length
* (equal to dna2's length), containing only the
characters A, C, G
* and T
* @param dna2
* a char array representing a DNA sequence of
arbitrary length
* (equal to dna1's length), containing only the
characters A, C, G
* and T
*
* @return the Hamming distance between the two
sequences
*/
public static int hammingDistance(char[] dna1,
char[] dna2) {
/* variable to store hamming
distance */
int hammingDistance = 0;
/*
* loop through both dna's and
increment hammingDistance when characters aren't
* equal
*/
for (int i = 0; i < dna1.length
&& i < dna2.length; i++) {
if (dna1[i] !=
dna2[i]) {
hammingDistance += 1;
}
}
return hammingDistance; // return
the calculated hamming distance
}
/**
* Calculates and returns the Hamming distance between
two DNA sequences of
* equal length. The Hamming distance between two
sequences is the number of
* points in the sequences where the corresponding
symbols differ. For example,
* the Hamming distance between "ATTATGC" and "ATGATCC"
is 2.
*
* @param dna1
* a String representing a DNA sequence of arbitrary
length (equal to
* dna2's length), containing only the characters A, C,
G and T
* @param dna2
* a String representing a DNA sequence of arbitrary
length (equal to
* dna2's length), containing only the characters A, C,
G and T
*
* @return the Hamming distance between the two
sequences
*/
public static int hammingDistance(String dna1, String
dna2) {
return
hammingDistance(dna1.toCharArray(), dna2.toCharArray());
}
/**
* Calculates and returns where two DNA sequences of
equal lengths differ. For
* example, given sequences "ATGT" and "GTGA", the
result should be array {
* true, false, false, true }.
*
* @param dna1
* a char array representing a DNA sequence, containing
only the
* characters A, C, G and T, with the same length as
parameter dna2
* @param dna2
* a char array representing a DNA sequence, containing
only the
* characters A, C, G and T, with the same length as
parameter dna1
*
* @return an array of boolean values, of length
equivalent to both parameters'
* lengths, containing true in each subscript where the
parameter
* strings differ, and false where they do not
differ
*/
public static boolean[] mutationPoints(char[]
dna1, char[] dna2) {
/* create a boolean array */
boolean[] mutationPoints = new
boolean[dna1.length];
/*
* loop through both dna's and form
mutation points array containing true in
* each subscript where the
parameter strings differ, and false where they do
* not differ
*/
for (int i = 0; i < dna1.length
&& i < dna2.length; i++) {
if (dna1[i] !=
dna2[i])
mutationPoints[i] = true;
else
mutationPoints[i] = false;
}
return mutationPoints;
}
/**
* Calculates and returns where two DNA sequences of
equal lengths differ. For
* example, given sequences "ATGT" and "GTGA", the
result should be array {
* true, false, false, true }.
*
* @param dna1
* a String representing a DNA sequence, containing
only the
* characters A, C, G and T, with the same length as
parameter dna2
* @param dna2
* a String representing a DNA sequence, containing
only the
* characters A, C, G and T, with the same length as
parameter dna1
*
* @return an array of boolean values, of length
equivalent to both parameters'
* lengths, containing true in each subscript where the
parameter
* strings differ, and false where they do not
differ
*/
public static boolean[] mutationPoints(String dna1,
String dna2) {
return
mutationPoints(dna1.toCharArray(), dna2.toCharArray());
}
/**
* Calculates and returns the number of times each type
of nucleotide occurs in
* a DNA sequence.
*
* @param dna
* a char array representing a DNA sequence of
arbitrary length,
* containing only the characters A, C, G and T
*
* @return an int array of length 4, where subscripts
0, 1, 2 and 3 contain the
* number of 'A', 'C', 'G' and 'T' characters
(respectively) in the
* given sequence
*/
public static int[] nucleotideCounts(char[]
dna) {
/* create an int array to store the
frequency of A, C, G, T */
int[] nucleotideCounts = new
int[4];
for (int i = 0; i < dna.length;
i++) {
if (dna[i] ==
'A')
nucleotideCounts[0] += 1;
else if (dna[i]
== 'C')
nucleotideCounts[1] += 1;
else if (dna[i]
== 'G')
nucleotideCounts[2] += 1;
else
nucleotideCounts[3] += 1;
}
return nucleotideCounts; // return
the int array
}
/**
* Calculates and returns the number of times each type
of nucleotide occurs in
* a DNA sequence.
*
* @param dna
* a String representing a DNA sequence of arbitrary
length,
* containing only the characters A, C, G and T
*
* @return an int array of length 4, where subscripts
0, 1, 2 and 3 contain the
* number of 'A', 'C', 'G' and 'T' characters
(respectively) in the
* given sequence
*/
public static int[] nucleotideCounts(String dna)
{
return
nucleotideCounts(dna.toCharArray());
}
/**
* Calculates and returns the reverse complement of a
DNA sequence. In DNA
* sequences, 'A' and 'T' are complements of each
other, as are 'C' and 'G'. The
* reverse complement is formed by reversing the
symbols of a sequence, then
* taking the complement of each symbol (e.g., the
reverse complement of "GTCA"
* is "TGAC").
*
* @param dna
* a char array representing a DNA sequence of
arbitrary length,
* containing only the characters A, C, G and T
*
* @return a char array representation of the reverse
complement of the given
* sequence
*/
public static char[] reverseComplement(char[]
dna) {
/* create a char array to store the
reverse complement */
char[] reverseComplement = new
char[dna.length];
int reverseLength = dna.length -
1;// we will store the complement in a reverse fashion
/* loop through dna sequence and
complement it and store in reverse */
for (int i = 0; i < dna.length;
i++) {
if (dna[i] ==
'G') {
reverseComplement[reverseLength] = 'C';
} else if
(dna[i] == 'C') {
reverseComplement[reverseLength] = 'G';
} else if
(dna[i] == 'T') {
reverseComplement[reverseLength] = 'A';
} else if
(dna[i] == 'A') {
reverseComplement[reverseLength] = 'T';
}
reverseLength -=
1; // it is decreased by 1 so as to traverse back
}
return reverseComplement;
}
/**
* Calculates and returns the reverse complement of a
DNA sequence. In DNA
* sequences, 'A' and 'T' are complements of each
other, as are 'C' and 'G'. The
* reverse complement is formed by reversing the
symbols of a sequence, then
* taking the complement of each symbol (e.g., the
reverse complement of "GTCA"
* is "TGAC").
*
* @param dna
* a string representing a DNA sequence of arbitrary
length,
* containing only the characters A, C, G and T
*
* @return a String representation of the reverse
complement of the given
* sequence
*/
public static String reverseComplement(String dna)
{
return new
String(reverseComplement(dna.toCharArray()));
}
public static void main(String[] args)
{
String testData1 =
"GCCTGTCGTAGCTTATC", testData2 = "GGCTGACGTAGCGTAAC";
int[] baseCounts =
nucleotideCounts(testData1.toCharArray());
System.out.printf("Nucleotide
counts for %s: A: %d C: %d G: %d T: %d%n", testData1,
baseCounts[0],
baseCounts[1], baseCounts[2],
baseCounts[3]);
System.out.printf("%s <--
complement --> %s%n", testData1,
String.valueOf(complement(testData1.toCharArray())));
System.out.printf("%s <--
reverse complement --> %s%n", testData1,
String.valueOf(reverseComplement(testData1.toCharArray())));
System.out.printf("%s GC-content:
%f%n", testData1, gcContent(testData1.toCharArray()));
System.out.printf("Hamming distance
between %s and %s: %d%n", testData1, testData2,
hammingDistance(testData1.toCharArray(),
testData2.toCharArray()));
System.out.printf("Mutation points
between %s and %s:%n%s%n", testData1, testData2,
Arrays.toString(mutationPoints(testData1.toCharArray(),
testData2.toCharArray())));
}
}
for screenshot of the above code(i've removed the comments, please compare only the methods which are implemented, there is no change):
output when you run above code:
please test for various inputs.
to format the code: Ctrl+a then Ctrl+Shift+f (for eclipse IDE)