Question

In: Operations Management

Hadoopis one of the feasible and affordable solutions for big data analytics. Its success is on...

Hadoopis one of the feasible and affordable solutions for big data analytics. Its success is on the numerous add-on products in its 4 functional areas. Describe what the following add-onproducts are and how it can help in big data analytics with around 50 to 100 words each.

1) Pig

2) Spark

3) Storm

4) Atlas

5) Flume

6) Solr

7) HBase

8) Oozie

Solutions

Expert Solution

Hadoop-Pig

Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data workers to write complex data transformations without knowing Java. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL. Apache Pig's enables people to focuses more on analysing bulk data sets and to spend less time writing Map-Reduce programs. Similar to Pig's, who eat anything the pig's programming language is design to work upon any kind of data. Thats why the name Pig.

Apache Spark.

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. It is an open source cluster computing framework written in Scala, Jawa, Python and R. It is general purpose distributed computer engine used for processing and analysing a large amount of data. Spark is still maturing and lags some important enterprise-grade featurses.

Apache Storm

Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use. We currently use storm as our Twitter realtime processing pipeline. Who uses Apache Storm? Fullcontact Inc (u.s), Lookout, Inc (u.s).

Atlas Hadoop

Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Is a data governence tool which facilitates gathering, processing and maintaining meta data. Unlike spreadsheet and wikidocs, it has functioning components which can monitor your data processes, data stores, files and updates in a meta repository. Most popular data governance tools are IBM data governance, Talent, Collibra.

Apache Flume

Apache Flume is a system used for moving massive quantities of streaming data into HDFS. Collecting log data present in log files from web servers and aggregating it in HDFS for analysis, is one common example use case of Flume. It is a distributed, reliable and available software for efficiently collecting, aggregating and moving large amount of log data. It has a simple and flexible architecture based on streaming data floews.

Solr Hadoop

Apache Solr - On Hadoop. Advertisements. It is an open source search plateform build ipon a Java library called Lucene. Solr is a popular search plateform for websites because it can index and search multiple sites and written recommendations for related contents based on the search query's taxonomy. Solr can be used along with Hadoop. As Hadoop handles a large amount of data, Solr helps us in finding the required information from such a large source.

Hadoop-HBase

Apache HBase is the Hadoop database. It is a distributed, scalable, big data store. It is a subproject of the Apache Hadoop project and is used to provide real-time read and write access to your big data. HBase is called the Hadoop database because it is a NoSQL database that runs on top of Hadoop. It combines the scalability of Hadoop by running on the Hadoop Distributed File System (HDFS), with real-time data access as a key/value store and deep analytic capabilities of Map Reduce.

Hadoop-Oozie

Oozie is a jawa web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequientially into one logical unit of work. It integrated with the Hadoop stack, with YARN as its architectural centre and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Ive and Apache Sqoop. Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Oozie is a scalable, reliable and extensible system.


Related Solutions

Fundamentals of Big Data Analytics • Critical Success Factors for Big Data Analytics. • Enablers of...
Fundamentals of Big Data Analytics • Critical Success Factors for Big Data Analytics. • Enablers of Big Data Analytics • Challenges of Big Data Analytics • Business Problems Addressed by Big Data Analytics Top 5 Investment Bank Achieves Single Source of the Truth Questions for Discussion 4. How can Big Data benefit large-scale trading banks? 5. How did MarkLogic’s infrastructure help ease the leveraging of Big Data? 6. What were the challenges, the proposed solution, and the obtained results?
The emerging research analytics are: big data analytics, text analytics, web analytics, network analytics and mobile...
The emerging research analytics are: big data analytics, text analytics, web analytics, network analytics and mobile analytics. Focus on Mobile Analytics and discuss. Mobile Analytics As an effective channel for reaching many users and as a means of increasing the productivity and efficiency of an organization’s workforce, mobile computing is viewed by respondents of the recent IBM technology trends survey (IBM 2011) as the second most “in demand” area for software development. Mobile BI was also considered by the Gartner...
what are the Drivers for Cloud analytics and big data?
what are the Drivers for Cloud analytics and big data?
How does data analytics relate to Big Data? Why should accountants incorporate data analytics into their...
How does data analytics relate to Big Data? Why should accountants incorporate data analytics into their work? Provide at least one unique example. (DQ 9-1)
According to the IAASB, data analytics or big data is the “science and art of discovering...
According to the IAASB, data analytics or big data is the “science and art of discovering and analysing patterns, deviations and inconsistencies, and extracting other useful information in the data underlying or related to the subject matter of an audit through analysis, modelling and visualisation for the purpose of planning and performing the audit” You are required to: Critically analyse and demonstrate how the definition outlined by IAASB can be applied in the audit process to enhance audit quality.
What opportunities are arising in the accounting field due to big data and data analytics?
What opportunities are arising in the accounting field due to big data and data analytics?
Discuss how big data and data analytics are changing the role of Management Accountants.
Discuss how big data and data analytics are changing the role of Management Accountants.
Discuss how big data and data analytics are changing the role of Management Accountants.
Discuss how big data and data analytics are changing the role of Management Accountants.
There are fundamentals differences between Big Data, Data Mining and Data Analytics. own words and understanding,...
There are fundamentals differences between Big Data, Data Mining and Data Analytics. own words and understanding, define each and outline the differences. Atleast 100 words
What type of big data analytics are being referred in the following applications : (a) a...
What type of big data analytics are being referred in the following applications : (a) a system forecasting flash floods based on environmental data and models (b) Generating mean weekly sales for a supermarket in order to guide promotions of products (c) A system that analysts the causes of accidents from data for the past 5 years
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT