Question

In: Computer Science

Data Preprocessing is an important area in order to have quality input data. The four methods...

Data Preprocessing is an important area in order to have quality input data. The four methods discussed are data cleaning, data integration, data transformation and data reduction.

a.Explain each of these methods in 2 or 3 paragraphs. (16 points)

b.Suppose we were to develop software to implement each of these techniques. Discuss how easy or difficult it would be to develop software for each technique. Give this some thought and write a well-thought answer. Imagine that you had to write such software, how easy or hard would it be? (7 points)

c.After the software was developed, we are now going to run the programs. Explain for each of the programs implementing the four techniques, how much CPU processing time is required. I am not looking for an actual number. Rather, I am looking for your analysis how much CPU processing is needed for each technique. (7 points)

Solutions

Expert Solution


Related Solutions

Data quality and date integrity are important issues as they have a significant impact on the...
Data quality and date integrity are important issues as they have a significant impact on the output of information and knowledge used by the organization. A. Discuss data quality by considering its major attributes. B. Why data integrity is imposed within a database? State four ways data integrity is compromised.
Why is it important to consider the quality of the data in analytical procedures? How important...
Why is it important to consider the quality of the data in analytical procedures? How important to this question are client internal controls?
You have to write a program that computes the area of a triangle. Input consists of...
You have to write a program that computes the area of a triangle. Input consists of the three points that represent the vertices of the triangle. Points are represented as Cartesian units i.e. X,Y coordinates as in (3,5). Output must be the three points, the three distances between vertices, and the area of the triangle formed by these points. The program must read the coordinates of each point, compute the distances between each pair of them and print these values....
Write a program in java that uses methods to input data for calculation, calculate, and display...
Write a program in java that uses methods to input data for calculation, calculate, and display the results of the calculations. That is, there are at least three methods. The problem is to write a program that calculates the area of a rectangle. This action should be repeatable.
WHY IS IT IMPORTANT TO HAVE A QUALITY IMPROVEMENT PLAN IN RADIOLOGY?
WHY IS IT IMPORTANT TO HAVE A QUALITY IMPROVEMENT PLAN IN RADIOLOGY?
WHY IS IT IMPORTANT TO HAVE A QUALITY IMPROVEMENT PLAN IN RADIOLOGY
WHY IS IT IMPORTANT TO HAVE A QUALITY IMPROVEMENT PLAN IN RADIOLOGY
Why is it important to conduct a quality literature review? Characterize the four major types of...
Why is it important to conduct a quality literature review? Characterize the four major types of reviews. What are the essential components of a quality literature review?
Quality Products Company has a broad range of products it sells in a four-state area. Sales...
Quality Products Company has a broad range of products it sells in a four-state area. Sales for the four quarters of 2019 are expected to be $575,000, $412,000, $749,000, and $638,000, for an annual total of $2,374,000. Cost of goods sold averages 57% of sales. Ending inventory for each quarter should be 10% of cost of goods sold for the following quarter. Inventory at January 1 is expected to be $40,000. Required: Show calculations for ending inventory, purchases, and cost...
We instinctively know that clean data = quality data. However, it is important to understand how...
We instinctively know that clean data = quality data. However, it is important to understand how data that has not been cleansed can negatively impact business decisions. a. Share your interpretation of data cleansing. Use your own words and avoid overly complex explanations. b. Describe at least 3 decisions a business with which you are familiar (either through your work or your research) may make as a result of data mining. c. Discuss the negative impact of data that has...
Suppose we are using this data to attempt to pass a quality test in order to...
Suppose we are using this data to attempt to pass a quality test in order to market this material. 2.58     2.51     4.04     6.43     1.58     4.32     2.2       4.19     4.79     6.2       1.52     1.38     3.87     4.54     5.12     5.15 5.5       5.92     4.56     2.46     6.9       1.47     2.11     2.32 6.75 5.84     8.8       7.4       4.72     3.62     2.46     8.75 a....
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT