Question

In: Computer Science

ARTICLE :Big data and big intelligence . “The big data revolution is not about the data....

ARTICLE :Big data and big intelligence .

“The big data revolution is not about the data. It’s about the analytics that we can come up with and that we now have to beable to understand what these data say.” – Gary King, Professor and director of the Institute for Quantitative Social Science, Harvard University

Companies need to collect, aggregate and analyze data to make better business decisions. With the help of business-intelligence tools and methodologies, companies can now analyze large volumes of data quickly and cost effectively.

In addressing the challenges of analyzing large amounts of data, companies need to know what data they have and how it can be effectively stored and subsequently accessed. Data integrity also becomes increasingly important as the reliance on data for business decisions increases. Concepts such as data-classification schemes, taxonomies and the use of meta- data should be considered. As the volume of data swells, there will be a greater need for storage and a commensurate increase in storage costs.

Description

Big Data

The term “big data” is more than a term used to describe “a lot of data.” Big data encapsu- lates tools (e.g., Hadoop, Cassandra, etc.) for processing information at high volumes, high velocity, and high variety in a way that improves insight and decision making. According to IBM, big data can be broken down into four dimensions: volume, velocity, variety, and veracity.

Businesses are confronted with the paradox of being data rich but information poor. They need more effective means of capturing the informational value this data represents. They have to find more effective means of storing, archiving, managing and retrieving that data. As more and more data is collected, organizations are seeking more effective ways to extract timely information upon which to base ever more complex decisions.

Big Data anD Business intelligence 2

Business Intelligence (BI)

Business intelligence encompasses the processes, tools and techniques designed to har- vest insights from the large volumes of structured data within the organization. Today BI benefits from increasingly user-friendly technology and the degree to which analysis has migrated from the central IT department to the control of revenue-generating businesses. Retailers can leverage BI tools to assess how products placement will drive higher sales. For example, placing the salsa in the aisle with tortillas may result in more sales than plac- ing it with the sauces.

Importance

New sources of information are being developed and new techniques are needed to fully benefit from these new sources of information. The growth in unstructured information (e.g., videos, blogposts, tweets, sensor readings from the Internet of Things devices, etc.) already outpaces the traditional sources of transaction data. Big data is part of a larger trend of data-driven decision making. While organizations are keenly aware of the need to have some type of analytics, many organizations face challenges that include the lack of in-house expertise. Consequently, CPAs in their respective roles need to consider the appli- cability and potential value of investing in analytics. As mentioned above, the ultimate value proposition (i.e., insights into customers, products, services, etc.) is consistent irrespective whether it is BI or big data analytics.

Business Benefits and Considerations

Big data can provide enormous value and benefits to organizations. These benefits include:

  • increased effectiveness of business initiatives and promotions

  • improved understanding of customer behaviour and market conditions

  • better time-to-insight

  • cost savings.

    Tesco, a U.K. grocery and general merchandise retailer, links its supply chain management to weather, date and geographic data to identify and predict trends. An example of a trend the company has extracted through big data analytics is that a “16 degrees sunny Saturday in late April will cause a spike in sales. Exactly the same conditions a couple of weeks later will not, as people have had their first BBQ of the season.” In terms of overall benefits to the company, “Big Data projects deliver huge returns at Tesco; improving promotions to ensure 30% fewer gaps on shelves, predicting the weather and behaviour to deliver £6mil- lion less food wastage in the summer, £50million less stock in warehouses, optimising store operations to give £30million less wastage.” 1

1 http://cloudofdata.com/2012/10/tesco-uses-data-for-more-than-just-loyalty-cards.

Big Data anD Business intelligence 3

While the use of big data and BI may bring many benefits to an organization, there are some important risk areas to consider:

Risk Areas

Risk Mitigation Strategies

Risk practitioners or other gate- keepers in the organization may opt out of big data because of general fears and uncer- tainties around the technologies or other doubts.

Applying privacy controls that were applied to tradi- tional database technologies may not be adequate for a big-data environment.

When identifying the net new risks with big data, it is important to under- stand how the underlying technology differs from the standard control processes around transactional systems. A good place to start is the Cloud Security Alliance’s Top Ten Big Data Security and Privacy Challenges.2 For example, the publication walks the reader through how the technology dis- assembles a data set and then processes the individual “chunks” of data. It points out that if the process to read the chunk of data is unauthorized then there is a risk the analytic produced will be incorrect. Those familiar with IT General Controls3 will recognize this as an application development control which requires the underlying code to be tested and authorized.

Complying with privacy in a big-data environment requires an under- standing of how the risk profile has changed in terms of what data is
now being used. For example, big-data analytics can include the use of pictures. Although facial recognition technology is in its infancy, compa- nies, such as iOmniscient, have been advertising for a few years how their technology can be used in conjunction with big-data analytics.4 Within such a context, organizations need to explore what is sufficient notice and consent to be compliant with privacy regulations such as the Per- sonal Information Protection and Electronic Documents Act (PIPEDA). For example, privacy experts need to assess what is sufficient notice from a video recording perspective. Consequently, it may or may not be enough “notice” to post a sign informing customers they are being recorded and that the recording could be used for the purposes of analytics. Because Google already has the capability to search for images, it is not too much of a stretch to see how in-store cameras uploaded with Google Image Search can identify who is in the store and feed that information to sales persons. The Federal Trade Commission (FTC) has put together a best practices document that explores the privacy issues of facial recognition, citing how Kraft Foods plans to use such technology in the supermarket.5 Beyond compliance, organizations need to assess the privacy sensitiv- ities of their customers. For example, Benetton faced a backlash from

its customers for planning to use RFID tags to track its inventory. Some customers feared the technology would be used to invade their privacY

  1. 3 This is the term used by the PCAOB for “SOX Testing”.

  2. 4 See www.iomniscient.com/Media/PR/git2012-Face_Recognition_in_a_crowd.pdf. The company advertises how facial recognition can be correlate

Big Data anD Business intelligence 4

Risk Areas

Risk Mitigation Strategies

Data used in the big-data analytical model is not fit for purpose or con- tains significant errors that would lead to erroneous decisions.

Without good controls over data quality, the inclusion of “dirty data” will result in poor analysis. Ultimately, this could result in analytics that are “materially misstated”. For example, a mining company that performed analytics using poor quality data set up an oil rig at a dry well instead

of a productive well, resulting in millions of wasted Euros. Therefore, it is important to verify the integrity of the data available before moving on to discussions around big-data solutions. This is explored in the publication “a Framework for information integrity controls”, ensuring the integrity of information requires a multi-domain approach by exploring the con- trols from a content domain (e.g., accuracy of meta-data), a processing domain (e.g., ensuring accuracy of underlying program logic manipulating the information), and the information system-environment domain (e.g., logical access to the information).

An important precursor to a big-data exercise is to ensure the data has been cleansed of errors and is fit for purpose. Basic cleansing exercises should ensure fields exclude data that does not belong there (e.g., invalid states or provinces, alpha characters in numeric fields, invalid postal codes, etc.). However, trying to ensure the data is fit for purpose can be a more difficult exercise. For example, an investigation by ProPublica found software used to predict criminality was racially biased.7 In other words, systemic racism was programmed into the software. In such a situation, care must be taken to ensure only non-biased data be used within the big-data predictive model.

  

Conclusion

With the decreasing cost of data storage and the rising popularity of connected devices, there is no shortage of data for businesses to use. But this data needs to be cleaned, ana- lyzed and interpreted to provide the greatest value to businesses. With the help of BI tools and knowledge of risk areas around big data, businesses can begin to apply proper analyt- ics to their data to discover valuable insights.

ANSWER THE FOLLOWING

1. Describe the kinds of big data collected by the organizations.

2. Hadoop and Cassandra are two tools mentioned in the article that can be used . Research and describe one other tool that can be used.

3. What kinds of organizations are most likely to need big data management and analytical tools? Why?

Solutions

Expert Solution

1.

Types of big data collected by the organizations can be divided into 3 types:

a. Structured data

b. Unstructured data

c. Semi-structured data

a. Structured data:

It refers to the organized data which can be easily stored, processed and retrieved. They’re stored in databases in an ordered manner. It’s further divided into 2 categories where the data comes from- human actions such as storing information explicitly, and where the data comes from machines such as usage statistics, GPS data, etc. They’re exactly organized in a way that’s understandable.



b. Unstructured data:

It refers to data which has no proper format and is just stored based upon the actions. Such as social media, text, audio, video, emails all of these are stored somewhere in an unstructured format. It’s difficult to process and analyze such data as it’s time-consuming.

c. Semi-structured data:

It’s the line between both of the above mentioned types of data. The data appears to be unstructured but the information can be processed due to some properties. It isn’t quite stored under a database but still can be managed to do so hence it falls in between both of the above types.

2.

Hadoop:

It’s an open-source software used to store data and run applications. The HDFS is used to store data across a cluster. YARN helps in processing this data. It has benefits such as scalability, low cost, flexibility, etc. There are many tools in Hadoop such as Data extraction tool, data analyzing tool, data mining tool, data storing tool, etc. Hence these tools help in processing the big data in an efficient way with many benefits.

Cassandra:

It’s an open-source database management tool. This tool is used for social media purposes or where real-time data is provided and functions on hardware which is less powerful. It has fault tolerance and it’s own query language which is very simple. It’s called a NoSQL database.

One of the many other tools that are used for big data is:

MangoDB:

This tool is an open-source NoSQL database written using technologies like C, JavaScript,C++. This is used for real-time data, structured and unstructured data which provides high performance and availability. It stores the data in Json documents. It became a famous tool for big data due to it’s power, speed and flexibility.

3.

The organizations such as Amazon, Netflix, Starbucks, Oracle, IBM use big data and tools. The organizations where the data is huge such as immigration, police, streaming platforms, etc. Any organization which requires to maintain a lot of data and process it according to the need, or where the people are more do require a database management and tools in order to process the data in an easier manner. Nowadays many companies use the tools and management in order to be successful.

Why? With the amount of data increasing across the world, it's neccesary to keep a track and extract useful knowledge from this. It's said that there's abundant data available but the usage of it isn't defined. Hence it's required for the organizations to use such techniques to provide better solutions.

comment for queries!


Related Solutions

Scan the internet and find an article about big data. Discuss the implications of big data...
Scan the internet and find an article about big data. Discuss the implications of big data by considering the pros and cons. Is big data a good thing? Is the collection of data from people without their knowledge or consent ethical? Will the information gathered from big data be beneficial? Include the internet link to the article in the initial discussion post
Describe whether big data or a business intelligence application that uses a relational database, is a...
Describe whether big data or a business intelligence application that uses a relational database, is a better fit for the users at each of the three organizational and management levels.
Read the article, “Big Data in Big Companies” (written by Thomas Davenport and Jill Dyche) from...
Read the article, “Big Data in Big Companies” (written by Thomas Davenport and Jill Dyche) from the International Institute from Analytics. Comment on the role of business analytics in business today. ( Answer Needs to be at least 150 words)
Define, compare, and contrast those terms: analytics, informatics, business intelligence, and big data in related to...
Define, compare, and contrast those terms: analytics, informatics, business intelligence, and big data in related to healthcare sector
Review the article, The Rise of Big Data by Kenneth Cutier. Discuss the opportunities and challenges,...
Review the article, The Rise of Big Data by Kenneth Cutier. Discuss the opportunities and challenges, especially social and ethical challenges, related to big data as described in the article.
Post a citation and a short summary of the article you find about "Banks too Big...
Post a citation and a short summary of the article you find about "Banks too Big to Fail" Describe the problem from the standpoint of how concentration of market power in very large banks affects the overall economy. Make observations regarding how this affects individuals and businesses.
1.Is Small Beautiful? Consider the arguments about big business in Chris Tilly, “Is Small Beautiful “(article...
1.Is Small Beautiful? Consider the arguments about big business in Chris Tilly, “Is Small Beautiful “(article 5.1 in Real World Micro), Edward Herman’s “Brief History of Mergers and Antitrust Policy” (article 5.2), and Rob Larson, “Not Too Big Enough” (5.5). Should we have a more aggressive policy to break up large businesses and promote competition?   State their arguments and give reasons why you agree or disagree.  
Define fixed mindsets about intelligence (discussed in the text as an entity approach to intelligence) and...
Define fixed mindsets about intelligence (discussed in the text as an entity approach to intelligence) and growth mindsets about intelligence (discussed in the text as an incremental approach to intelligence) and then discuss how each mindset would affect: a) learning goals and b) how one responds to negative academic feedback (such as poor grades).
discussed big data , data warehouse and google database for big data and bootstrapping technique for...
discussed big data , data warehouse and google database for big data and bootstrapping technique for data analytics to a real life business scenario.
Big Data raises many questions about security, ethical standards, and proper use of data in health...
Big Data raises many questions about security, ethical standards, and proper use of data in health information systems. Ensuring ethical standards means that rules have to be established, but also enforced. Explain the types of activities you would implement in an organization you manage to ensure the proper use of data, particularly in light of the concerns Big Data raises. What safeguards would you recommend for collecting and retrieving healthcare data?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT