Question

In: Computer Science

Business Case Your company needs to conduct an analysis of a large dataset that is 10...

Business Case

Your company needs to conduct an analysis of a large dataset that is 10 times bigger than the size of a hard drive of your most powerful computer. This dataset contains data about ATM transactions. The security team has provided a smaller dataset of suspicious transactions. Your task is to identify transactions similar to suspicious ones.

Directions

explain which parallel computing techniques would be appropriate for this case and why. Explain which component of Hadoop ecosystem may be used in this case, and why.

Example of parallel computing could be (distributed calculations vs. MapReduce) or others

key principles of Apache Hadoop environment (Hive, Spark, Pig, etc.) or others

Solutions

Expert Solution

Parallel computing technique that would be appropriate is : Machine learning and data mining.

Data mining results discovering meaningful patterns, so the data turns into information. Information or patterns that are potentially useful are not merely information but a huge source of knowledge. Before the huge amount of data was hidden, but now revealed and converted into useful knowledge.

And using either supervised or unsupervised learning, one can easily make model for suspicious transactions. In supervised learning, one has to provide a training set before, so that model can be trained and if at some point or the other, if the suspicious activities are matches then the activity can be reported while unsupervised learning is preferred to detecting anomaly, which means if some activity takes places that is not normal that it will be reported.

MapReduce, a component of Hadoop ecosystem can be used in this case because of following reasons:

It is the core component ad it facilitates the processing logic. Using MapReduce, we can write applications that will process large dataset, using parallel and distributed algorithms. MapReduce provides a crucial feature of parallel processing which can be used for Big Data Analysis. MapReduce has 2 main Functions:

Map : Map work is to convert one dataset into another and individual elements are broken down in keys or tuples.

Reduce: The input here is map function. It main job is to provide a aggregated and summarized result.


Related Solutions

Assuming the role of an HR business partner, conduct an analysis of a department in your...
Assuming the role of an HR business partner, conduct an analysis of a department in your current workplace. If you are not currently working or are uncomfortable talking about internal issues in your current organization, feel free to refer to a prior organization/department with which you have experience. You may select any department; it does not have to be an HR department. Organizational and personal identities may be kept private by masking real names if you prefer. Include an introduction...
Describe how to conduct an f-test to analyze a dataset (Steps are fine in this case)
Describe how to conduct an f-test to analyze a dataset (Steps are fine in this case)
Use the paired dataset given in the below table to conduct a simple regression analysis as...
Use the paired dataset given in the below table to conduct a simple regression analysis as follows: (10 pts) Using these 4 paired data points, fill up the table calculating all of the terms such as (x2, x*y) used in the formulas of this question. Fill up the other column titles on your own and make the calculations. If needed add coloumns. (the table should be written in your answer paper) xi yi xi2 xi*yi 1 2 32 2 1...
Develop individually according to your analysis of the sales process that a business needs to do....
Develop individually according to your analysis of the sales process that a business needs to do. Explain the sales process of the business for the short, medium and long term.
Apply SCP ( Structure Conduct and Performance ) analysis of the Petsmart company. (10-15pages)
Apply SCP ( Structure Conduct and Performance ) analysis of the Petsmart company. (10-15pages)
Conduct a quantitative analysis of a company’s internal processes using the Hampshire Company Case Study document....
Conduct a quantitative analysis of a company’s internal processes using the Hampshire Company Case Study document. Your analysis will consist of completing the Hampshire Company Spreadsheet and will be accompanied by a memo to management. Specifically, the following critical elements must be addressed: I. Cost-Volume-Profit Analysis Cost-volume-profit (CVP) analysis is a useful tool for informing short-term economic planning within an organization. In this section, a CVP analysis will be conducted and used to inform business decisions and recommendations. A. Perform...
Conduct a VRIO Analysis on the company Yahoo.
Conduct a VRIO Analysis on the company Yahoo.
the new product development group needs to conduct an analysis of consumer buying behavior in the...
the new product development group needs to conduct an analysis of consumer buying behavior in the vacuum cleaner industry. Outline how the team would conduct this analysis, while considering both current and potential product users. Your analysis should address how you would determine the following for Dyson. What needs are being met by the product purchase? What are the benefits to the consumers? Make sure that you differentiate between features and benefits; go beyond manifest motives and consider latent motives....
What are the techniques used for business process analysis? Conduct a SWOT analysis of these techniques?...
What are the techniques used for business process analysis? Conduct a SWOT analysis of these techniques? Which techniques is appropriate for your organisation and why? with reference
Case analysis report Please find an international business law case and write a case analysis. There...
Case analysis report Please find an international business law case and write a case analysis. There should be basic case information(such as case name, two parties, in which area, cast facts(the main disputes), issue(question), holding, rule or reasoning. Topic could be chosen from any Chapter from international economic and trade law. But topics related to contract/IPR/WTO/agency are easier to get. Requirements: a. Around 500-1000 words, better no more than two pages.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT