In: Accounting
AP7.2 (LO 2) Planning an ADA application Timothy Steele, a recent college graduate and new audit staff member, is having lunch with Michael Watts, an audit senior. Both are working on the audit engagement of a retailer that has operations in North America and Europe. Timothy says to Michael, “I have been doing some reading about audit data analytics, and there is something I don't understand. While I understand the importance of internal controls to the reliability of the client's data, I also keep reading that the auditor needs to clean the data before it can be analyzed. I don't understand what people are talking about with when they talk about ‘clean’ data. Is there a difference between ‘clean’ and ‘dirty’ data? Can you explain this to me with a practical illustration? I just don't get what the discussion of ‘clean’ data is about.”
Required
Answer Timothy's questions. As Timothy asks, explain the concept of “clean” versus “dirty” data with a practical illustration.
Dirty data is inaccurate incomplete or inconsistent data , It can contain dupicate data , punctutation or spelling errors. It is any type of data which takes away the intergrity from the data range or set
Dirty Data can be cleaned or cleansed using Data Cleaning process. This process converts Dirty Data into Clean Data . It is the process of spotting and correcting inaccurate data . It detects and corrects corrupt or in accurate records from a recod set by replacing modifying or deleting the dirty or coarse data . There are various data cleansing tools .It can be performed using data wrangling tools or batch processing .
With the help of data cleasing , dirty data is converted to clean or high quality data which is valid , accurate , complete , consistent and uniform
A few examples of Dirty Data
Not adhering to a standard date /month /year format.
Incomplete customer records where postal code , gender code are missing or incorrectly entered.
Inaccurate data where there the state code and city are mismatched .
Duplication of records of same customers due to spelling mistakes with their name