Question

In: Computer Science

In this problem you will design and implement C++ code that identifies overlap in strings. Specifically,...

In this problem you will design and implement C++ code that identifies overlap in strings. Specifically, design and implement a C++ program that does the following:
1. Asks a user to input a filename and then opens that file. If the file open fails, then print the message “Unable to open file” and terminate the program using exit(1).
2. Reads the file contents, in order, into an array of strings. (See the file format explanation below.)
3. Computes the string overlapping order described below, and then prints the strings out, one per line, in that order.
4. Closes the file.
File Format: The data file consists of a number of strings, each on its own line. You do not know in advance how many strings will be in the file, other than there will be no more than 30 strings. Assume (i) there is at least one string in the file, (ii) the strings do not have any whitespace in their interior, and (iii) the strings consist entirely of alphabetic, upper case characters. Here is an example of a data file containing four strings:
AGGTGTGGA AAAATTA AATTGTCGCTGA GGAAAA
Overlapping Order: Here is the explanation of the string overlapping your program is to find in Step (3) above. Suppose we have the strings above read, in order, into an array of strings. We start with the first string in the array, AGGTGTGGA. What we want to determine is which of the other strings’ beginning overlaps the most with the end of the this first string. Note this is a nontrivial problem since we do not know without further analysis what the size of the overlapping substring will be. For example, if we just look at the last character of AGGTGTGGA, the A, there are two other strings that begin with A. If we look at the last two characters, GA, there are no strings starting with GA. If we
1
look at the last three, GGA, there is one other string starting at GGA. If we continue along this line, we notice that the largest overlap is the three characters GGA, and the overlap is with the fourth string. So move the fourth string into the second position in the array by exchanging it with the current second string:
AGGTGTGGA GGAAAA AATTGTCGCTGA AAAATTA
Now which of the remaining strings (i.e., the third and fourth ones), has the largest overlap at its beginning with the end of GGAAAA? Both of the remaining strings have an overlap of one character (A), and of two characters (AA). But only one has a larger overlap, the fourth string, with maximum overlap AAAA. So move the fourth string into the third position by exchanging it with the third string:
AGGTGTGGA GGAAAA AAAATTA AATTGTCGCTGA
Then we look for the maximum overlap between AAAATT and the remaining strings. However, since there is only one string left, the final string trivially has the maximum overlap. So the strings are now ordered as desired, and the program’s output should be the strings printed one per line in the order immediately above. Additional clarification, requirements, and advice: • If more than one string has the maximum overlap (e.g., two string overlap with the current string in a substring of 3 characters) choose the one that occurs first in the current array. • It is possible for the maximum overlap to be of length 0. • Your program should work on a data file with any name, and with any reasonable number of strings, as long as the number of strings is at most 30. • Your program just needs to print the strings in the final order. It does not need to print any additional information such as the amount of overlap. • Make sure you understand how the overlap works here with the end of the “current” string overlapping with the beginning of the other strings. • Finding the length of overlap between any two strings is not a simple task, but should not be extremely lengthy or difficult either. Think about the problem carefully, and remember you can use string functions as useful. Moreover, to organize your program, write a function int findOverlap(string s1, string s2) that finds the number of characters of overlap between the beginning of s2 and end of s1, and returns the length of this overlap. For example, when s1 = AGGTGTGGA and s2 = GATTTAG the function should return two since the largest overlap is two characters long, namely GA.
2
Here’s a complete example using another data file. Suppose the file is the following:
AGCTG GCGGACGGG GGG TGAGCGGA GGGGC
Then here is the program input and output. As usual, user input is underlined, and there is a single space between the input prompt and the input file name. And — also as usual — follow the input and output example carefully. Input filename: testData.dat AGCTG TGAGCGGA GCGGACGGG GGG GGGGC
And here is an example of an invalid filename: Input filename: testData.cat Unable to open file

Solutions

Expert Solution

So basically we have to read some strings from a file and then find overlaps and give output . I've created a code file which is explained with the comments which will really help you . code:)-

#include<iostream>
#include<fstream>
#include<stdlib.h>
using namespace std;
int findoverlaps(string s1,string s2) // this will check maximum overlap length
{
   int l1=s1.length();
   int l2=s2.length();
   int count=0;
   int len=min(l1,l2);
   for(int i=1;i<=len;i++) //this loop run for checking overlap length which can go from 1 to minimum length
   {
       int q=0;
       int c=0;
       while(q<i && s1[l1-(i-q)]==s2[q])
       {
           q++;
           c++;
       }
       if(q==i) // if i length is a overlap
       {
           count=q;

       }
   }
   return count;

}
int main()
{
   fstream file;
   string s[31];
   char file1[50];
   cout<<"enter the file name:";
   cin>>file1;                       //this will read the filename
   file.open(file1,ios::in);
   if(file==NULL)
   {
       cout<<"unable to open the file";
       exit(0);
   }
   int len=0;
   while(file)
   {
       file>>s[len++];           //this read a string from the file
      
   }
   for(int i=0;i<len;i++)   // here we go for overlap at every position
   {
       int m=0;
       int k=i+1;
       for(int j=i+1;j<len;j++) // this loop check all possible overlap with the remaining strings
       {
           int overlap=findoverlaps(s[i],s[j]);
           if(overlap>m)
           {
               m=overlap;
               k=j;
           }
       }
       if(k!=i+1)
       {
           swap(s[i+1],s[k]);
       }
   }
   for(int i=0;i<len;i++)
   {
       cout<<s[i]<<"\n";
   }

}

Here is the input file:


Related Solutions

For this program you will implement the following utility functions to test mastery of C strings....
For this program you will implement the following utility functions to test mastery of C strings. *******you MUST use these these function***** void removeBlanks(char *src, char *dest); void replaceChar(char *src, char oldChar, char newChar); char *flipCase(const char *src); Please read the description of these functions carefully. In the removeBlanks function you will implement a routine that takes a string in as src and outputs the same string into dest but removing any blank space character encountered. For example, if the...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can assume that any char array passed into the functions will contain valid, null-terminated data. Your functions must have the signatures listed below. 1. This function returns the last index where the target char can be found in the string. it returns -1 if the target char does not appear in the string. For example, if s is “Giants” and target is ‘a’ the function...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can assume that any char array passed into the functions will contain valid, null-terminated data. Your functions must have the signatures listed below. 1. This function returns the last index where the target char can be found in the string. it returns -1 if the target char does not appear in the string. For example, if s is “Giants” and target is ‘a’ the function...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can assume that any char array passed into the functions will contain valid, null-terminated data. Your functions must have the signatures listed below. 1. This function returns the last index where the target char can be found in the string. it returns -1 if the target char does not appear in the string. For example, if s is “Giants” and target is ‘a’ the function...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can...
A C++ question: Implement the following functions. Each function deals with null terminated C-strings. You can assume that any char array passed into the functions will contain valid, null-terminated data. Your functions must have the signatures listed below. 1. This function returns the last index where the target char can be found in the string. it returns -1 if the target char does not appear in the string. For example, if s is “Giants” and target is ‘a’ the function...
Problem 2 (C++): Design and implement a class complexType, that can be used to process complex...
Problem 2 (C++): Design and implement a class complexType, that can be used to process complex numbers. A number of the form a +ib, in which i2 = -1 and a and b are real numbers, is called a complex number. We call a the real part and b the imaginary part of a + ib. Complex numbers can also be represented as ordered pairs (a, b). The class you will design should have the following features. Constructor Your class...
Code in C Instructions For this programming assignment you are going to implement a simulation of...
Code in C Instructions For this programming assignment you are going to implement a simulation of Dijkstra’s solution to the Dining Philosophers problem using threads, locks, and condition variables. Dijkstra’s Solution Edsgar Dijkstra’s original solution to the Dining Philosophers problem used semaphores, but it can be adapted to use similar mechanisms: • Each philosopher is in one of three states: THINKING, HUNGRY, or EATING. • Every philosopher starts out in the THINKING state. • When a philosopher is ready to...
Write C++ code that prompts the user for a username and password (strings), and it calls...
Write C++ code that prompts the user for a username and password (strings), and it calls the login() function with the username and password as arguments. You may assume that username and password have been declared elsewhere. Your code should print error messages if login() generates an exception: LoginException: "Username not found or password incorrect" ServerException: "The login server is busy and you cannot log in" AccessException: "Your account is forbidden from logging into this server" Any other exception: "Unknown...
Write C++ code that prompts the user for a username and password (strings), and it calls...
Write C++ code that prompts the user for a username and password (strings), and it calls the login() function with the username and password as arguments. You may assume that username and password have been declared elsewhere. Use virtual functions, pure virtual functions, templates, and exceptions wherever possible. Please explain the code. Your code should print error messages if login() generates an exception: LoginException: "Username not found or password incorrect" ServerException: "The login server is busy and you cannot log...
DEVELOP A C++ PROGRAM IN PROCEDURAL CODE that will recursively alphabetize a set of strings in...
DEVELOP A C++ PROGRAM IN PROCEDURAL CODE that will recursively alphabetize a set of strings in a user-specified file using the tree sort algorithm explained in the lectures while maintaining a balanced binary tree at all times. The following will test your program with an input file containing: Max Hank Jet Frisky Chata Richard Nan Sam Thomas Karen Gerri Ingrid Alan Dana When done print out the contents of the tree with inorder traversal in a tabular format similar to...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT