Question

In: Computer Science

Recommend appropriate data analytic techniques for security prevention at IMC. [5 marks]

Recommend appropriate data analytic techniques for security prevention at IMC. [5 marks]

Solutions

Expert Solution

In general, the primary reason to use data analytics techniques is to tackle security issues since many internal control systems have serious weaknesses. For example, the currently prevailing approach employed by many law enforcement agencies to detect companies involved in potential cases of security breach consists in receiving circumstantial evidence or complaints from whistleblowers. As a result, a large number of security cases remain undetected and unprosecuted. In order to effectively test, detect, validate, correct error and monitor control systems against fraudulent activities, businesses entities and organizations rely on specialized data analytics techniques such as data mining, data matching, sounds like function, Regression analysis, Clustering analysis and Gap. Techniques used for security prevention fall into two primary classes: statistical techniques and artificial intelligence.

Statistical techniques

Examples of statistical data analysis techniques are:

  • Data preprocessing techniques for detection, validation, error correction, and filling up of missing or incorrect data.
  • Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month and average delays in bill payment.
  • Models and probability distributions of various business activities either in terms of various parameters or probability distributions.
  • Computing user profiles.
  • Time-series analysis of time-dependent data.
  • Clustering and classification to find patterns and associations among groups of data.
  • Data matching Data matching is used to compare two sets of collected data. The process can be performed based on algorithms or programmed loops. Trying to match sets of data against each other or comparing complex data types. Data matching is used to remove duplicate records and identify links between two data sets for marketing, security or other uses.
  • Sounds like Function is used to find values that sound similar. The Phonetic similarity is one way to locate possible duplicate values, or inconsistent spelling in manually entered data. The ‘sounds like’ function converts the comparison strings to four-character American Soundex codes, which are based on the first letter, and the first three consonants after the first letter, in each string.
  • Regression analysis allows you to examine the relationship between two or more variables of interest. Regression analysis estimates relationships between independent variables and a dependent variable. This method can be used to help understand and identify relationships among variables and predict actual results.
  • Gap analysis is used to determine whether business requirements are being met, if not, what are the steps that should be taken to meet successfully.
  • Matching algorithms to detect anomalies in the behavior of transactions or users as compared to previously known models and profiles. Techniques are also needed to eliminate false alarms, estimate risks, and predict future of current transactions or users.

Some forensic accountants specialize in forensic analytics which is the procurement and analysis of electronic data to reconstruct, detect, or otherwise support a claim of financial fraud. The main steps in forensic analytics are data collection, data preparation, data analysis, and reporting. For example, forensic analytics may be used to review an employee's purchasing card activity to assess whether any of the purchases were diverted or divertible for personal use.

Artificial intelligence techniques.

Fraud detection is a knowledge-intensive activity. The main AI techniques used for fraud detection include:

  • Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud.
  • Expert systems to encode expertise for detecting fraud in the form of rules.
  • Pattern recognition to detect approximate classes, clusters, or patterns of suspicious behavior either automatically (unsupervised) or to match given inputs.
  • Machine learning techniques to automatically identify characteristics of fraud.
  • Neural nets to independently generate classification, clustering, generalization, and forecasting that can then be compared against conclusions raised in internal audits or formal financial documents such as 10-Q.
  • Other techniques such as link analysis, Bayesian networks, decision theory, and sequence matching are also used for fraud detection. A new and novel technique called System properties approach has also been employed where ever rank data is available.

Statistical analysis of research data is the most comprehensive method for determining if data fraud exists. Data fraud as defined by the Office of Research Integrity (ORI) includes fabrication, falsification and plagiarism.

Machine learning and data mining

Main articles: Machine learning and Data mining

Early data analysis techniques were oriented toward extracting quantitative and statistical data characteristics. These techniques facilitate useful data interpretations and can help to get better insights into the processes behind the data. Although the traditional data analysis techniques can indirectly lead us to knowledge, it is still created by human analysts.

To go beyond, a data analysis system has to be equipped with a substantial amount of background knowledge, and be able to perform reasoning tasks involving that knowledge and the data provided. In effort to meet this goal, researchers have turned to ideas from the machine learning field. This is a natural source of ideas, since the machine learning task can be described as turning background knowledge and examples (input) into knowledge (output).

If data mining results in discovering meaningful patterns, data turns into information. Information or patterns that are novel, valid and potentially useful are not merely information, but knowledge. One speaks of discovering knowledge, before hidden in the huge amount of data, but now revealed.

The machine learning and artificial intelligence solutions may be classified into two categories: 'supervised' and 'unsupervised' learning. These methods seek for accounts, customers, suppliers, etc. that behave 'unusually' in order to output suspicion scores, rules or visual anomalies, depending on the method.

Whether supervised or unsupervised methods are used, note that the output gives us only an indication of fraud likelihood. No stand alone statistical analysis can assure that a particular object is a fraudulent one, but they can identify them with very high degrees of accuracy.


Related Solutions

. Recommend appropriate data analytic techniques for security prevention at IMC.
. Recommend appropriate data analytic techniques for security prevention at IMC.
According to the EY Foundation, over which data analytic techniques should accountants gain mastery? A. Data...
According to the EY Foundation, over which data analytic techniques should accountants gain mastery? A. Data mining, artificial intelligence B. Correlation, regression C. Querying, trends, forecasting D. Cluster analysis, inferential statistics
Using appropriate literature, properly discuss a specific security threat, and recommend appropriate solution(s). The topic discussion...
Using appropriate literature, properly discuss a specific security threat, and recommend appropriate solution(s). The topic discussion should include at least the following: ▪ Explanation of the security threat. ▪ Impact of the security threat. ▪ Recommended solution ___________________________________________ I choose the topic is : ( computer viruses ) I NEED ONLY THE FITST POINT IS Explanation of the security threat + make sure include distinction between viruses and other typed of malware.
WAN Technology b. Listing and explanation on all security techniques covered in this course (6 marks)...
WAN Technology b. Listing and explanation on all security techniques covered in this course c. Discussion on how each of them is associated to the network security goals.
Task 8 - Hamming & SECDED Code (5+20= 25 marks) (a) (5 marks) For data, using...
Task 8 - Hamming & SECDED Code (5+20= 25 marks) (a) For data, using 3 Hamming code parity bits determine the maximum number of data bits that can be protected. (b) A SECDED encoded character has been retrieved, with the hexadecimal value of 409 to power of 16. You may assume that the SECDED parity is even. 1. (1 + 4 marks) Was there an error in transmission? Explain your answer. 2. If there was an error, either correct it...
The followings are the most common security threats in networks. a. DoS Attack [5 marks] b....
The followings are the most common security threats in networks. a. DoS Attack [5 marks] b. Sniffer [5 marks] c. Spoofing [5 marks] d. Man-in-middle attack [5 marks] e. TCP/IP Hijacking [5 marks] You are expected to carry out an analysis and discuss on the nature of each threats by addressing the following elements; What it is all about? How can such attack/threats take place in the network? Why they do it? When normally such attack take place (i.e. any...
2. a. With appropriate equations and notations, derive the money multiplier (M1). (5 marks) b) AMALAND...
2. a. With appropriate equations and notations, derive the money multiplier (M1). b) AMALAND is a country with a required reserve ratio of 10%. Assume that the banking system has an excess reserves equal to 4 billion. Further, the currency in circulation equals 450 billion, and the total amount of checkable deposits equals 900 billion. Based on these numbers, calculate (i) required reserves held by the banking system (ii) total reserves held by the banking system, (iii) monetary base (iv)...
10. Multiple choice - Choose the most appropriate answer. [5 marks] I. Your friend Jana was...
10. Multiple choice - Choose the most appropriate answer. [5 marks] I. Your friend Jana was diagnosed with a specific vitamin B toxicity. However, she doesn’t recall the name of the vitamin. Which of the following is the only possible culprit associated with toxicity symptoms? a. Niacin b. Biotin c. Riboflavin d. Vitamin B12 II. What type of foods should be controlled in individuals taking anticoagulant medicines? a. Cold water fish b. Processed soups c. Enriched breads d. Green leafy...
f its benefits. (5 Marks) b) The data identified below was listed in a project’s status...
f its benefits. b) The data identified below was listed in a project’s status report. Planned value of work (PV) = $80,000 Earned value of work performed (EV) = $60,000 Actual cost of work performed (AC) = $72,000 Budgeted cost at completion (BAC) = $120,000 Original length of the project is 15 months Using the data, calculate the following: 1. Cost performance index (CPI) and schedule performance index (SPI). (2 marks)
5. Use the data set given below to answer these questions: (24 marks) Income 52.1 49.3...
5. Use the data set given below to answer these questions: Income 52.1 49.3 48.9 44.4 53.5 48.0 49.2 50.6 53.2 43.7 46.9 45.5 50.1 48.1 52.7 49.5 44.5 52.0 51.1 54.9 54.0 54.2 44.2 47.2 51.9 52.8 52.3 53.6 44.2 38.4 a. What is the sample mean of the variable Income? b. What is the sample median the variable Income? c. Find the range of Income. d. Find the sample standard deviation of Income. e. Compute the first and...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT