In: Computer Science
ARTICLE :Big data and big intelligence .
“The big data revolution is not about the data. It’s about the analytics that we can come up with and that we now have to beable to understand what these data say.” – Gary King, Professor and director of the Institute for Quantitative Social Science, Harvard University
Companies need to collect, aggregate and analyze data to make better business decisions. With the help of business-intelligence tools and methodologies, companies can now analyze large volumes of data quickly and cost effectively.
In addressing the challenges of analyzing large amounts of data, companies need to know what data they have and how it can be effectively stored and subsequently accessed. Data integrity also becomes increasingly important as the reliance on data for business decisions increases. Concepts such as data-classification schemes, taxonomies and the use of meta- data should be considered. As the volume of data swells, there will be a greater need for storage and a commensurate increase in storage costs.
Description
Big Data
The term “big data” is more than a term used to describe “a lot of data.” Big data encapsu- lates tools (e.g., Hadoop, Cassandra, etc.) for processing information at high volumes, high velocity, and high variety in a way that improves insight and decision making. According to IBM, big data can be broken down into four dimensions: volume, velocity, variety, and veracity.
Businesses are confronted with the paradox of being data rich but information poor. They need more effective means of capturing the informational value this data represents. They have to find more effective means of storing, archiving, managing and retrieving that data. As more and more data is collected, organizations are seeking more effective ways to extract timely information upon which to base ever more complex decisions.
Big Data anD Business intelligence 2 |
Business Intelligence (BI) Business intelligence encompasses the processes, tools and techniques designed to har- vest insights from the large volumes of structured data within the organization. Today BI benefits from increasingly user-friendly technology and the degree to which analysis has migrated from the central IT department to the control of revenue-generating businesses. Retailers can leverage BI tools to assess how products placement will drive higher sales. For example, placing the salsa in the aisle with tortillas may result in more sales than plac- ing it with the sauces. Importance New sources of information are being developed and new techniques are needed to fully benefit from these new sources of information. The growth in unstructured information (e.g., videos, blogposts, tweets, sensor readings from the Internet of Things devices, etc.) already outpaces the traditional sources of transaction data. Big data is part of a larger trend of data-driven decision making. While organizations are keenly aware of the need to have some type of analytics, many organizations face challenges that include the lack of in-house expertise. Consequently, CPAs in their respective roles need to consider the appli- cability and potential value of investing in analytics. As mentioned above, the ultimate value proposition (i.e., insights into customers, products, services, etc.) is consistent irrespective whether it is BI or big data analytics. Business Benefits and Considerations Big data can provide enormous value and benefits to organizations. These benefits include:
1 http://cloudofdata.com/2012/10/tesco-uses-data-for-more-than-just-loyalty-cards. |
Big Data anD Business intelligence 3 |
||
While the use of big data and BI may bring many benefits to an organization, there are some important risk areas to consider:
Risk practitioners or other gate- keepers in the organization may opt out of big data because of general fears and uncer- tainties around the technologies or other doubts. Applying privacy controls that were applied to tradi- tional database technologies may not be adequate for a big-data environment. When identifying the net new risks with big data, it is important to under- stand how the underlying technology differs from the standard control processes around transactional systems. A good place to start is the Cloud Security Alliance’s Top Ten Big Data Security and Privacy Challenges.2 For example, the publication walks the reader through how the technology dis- assembles a data set and then processes the individual “chunks” of data. It points out that if the process to read the chunk of data is unauthorized then there is a risk the analytic produced will be incorrect. Those familiar with IT General Controls3 will recognize this as an application development control which requires the underlying code to be tested and authorized. Complying with privacy in a big-data environment requires an
under- standing of how the risk profile has changed in terms of
what data is its customers for planning to use RFID tags to track its inventory. Some customers feared the technology would be used to invade their privacY
|
Big Data anD Business intelligence 4 |
|||
Data used in the big-data analytical model is not fit for purpose or con- tains significant errors that would lead to erroneous decisions. Without good controls over data quality, the inclusion of “dirty data” will result in poor analysis. Ultimately, this could result in analytics that are “materially misstated”. For example, a mining company that performed analytics using poor quality data set up an oil rig at a dry well instead of a productive well, resulting in millions of wasted Euros. Therefore, it is important to verify the integrity of the data available before moving on to discussions around big-data solutions. This is explored in the publication “a Framework for information integrity controls”, ensuring the integrity of information requires a multi-domain approach by exploring the con- trols from a content domain (e.g., accuracy of meta-data), a processing domain (e.g., ensuring accuracy of underlying program logic manipulating the information), and the information system-environment domain (e.g., logical access to the information). An important precursor to a big-data exercise is to ensure the data has been cleansed of errors and is fit for purpose. Basic cleansing exercises should ensure fields exclude data that does not belong there (e.g., invalid states or provinces, alpha characters in numeric fields, invalid postal codes, etc.). However, trying to ensure the data is fit for purpose can be a more difficult exercise. For example, an investigation by ProPublica found software used to predict criminality was racially biased.7 In other words, systemic racism was programmed into the software. In such a situation, care must be taken to ensure only non-biased data be used within the big-data predictive model. Conclusion With the decreasing cost of data storage and the rising popularity of connected devices, there is no shortage of data for businesses to use. But this data needs to be cleaned, ana- lyzed and interpreted to provide the greatest value to businesses. With the help of BI tools and knowledge of risk areas around big data, businesses can begin to apply proper analyt- ics to their data to discover valuable insights. |
|||
ANSWER THE FOLLOWING 1. Describe the kinds of big data collected by the organizations. 2. Hadoop and Cassandra are two tools mentioned in the article that can be used . Research and describe one other tool that can be used. 3. What kinds of organizations are most likely to need big data management and analytical tools? Why? |
1.
Types of big data collected by the organizations can be divided into 3 types:
a. Structured data
b. Unstructured data
c. Semi-structured data
a. Structured data:
It refers to the organized data which can be easily stored, processed and retrieved. They’re stored in databases in an ordered manner. It’s further divided into 2 categories where the data comes from- human actions such as storing information explicitly, and where the data comes from machines such as usage statistics, GPS data, etc. They’re exactly organized in a way that’s understandable.
b. Unstructured
data:
It refers to data which has no proper format and is just stored based upon the actions. Such as social media, text, audio, video, emails all of these are stored somewhere in an unstructured format. It’s difficult to process and analyze such data as it’s time-consuming.
c. Semi-structured data:
It’s the line between both of the above mentioned types of data. The data appears to be unstructured but the information can be processed due to some properties. It isn’t quite stored under a database but still can be managed to do so hence it falls in between both of the above types.
2.
Hadoop:
It’s an open-source software used to store data and run applications. The HDFS is used to store data across a cluster. YARN helps in processing this data. It has benefits such as scalability, low cost, flexibility, etc. There are many tools in Hadoop such as Data extraction tool, data analyzing tool, data mining tool, data storing tool, etc. Hence these tools help in processing the big data in an efficient way with many benefits.
Cassandra:
It’s an open-source database management tool. This tool is used for social media purposes or where real-time data is provided and functions on hardware which is less powerful. It has fault tolerance and it’s own query language which is very simple. It’s called a NoSQL database.
One of the many other tools that are used for big data is:
MangoDB:
This tool is an open-source NoSQL database written using technologies like C, JavaScript,C++. This is used for real-time data, structured and unstructured data which provides high performance and availability. It stores the data in Json documents. It became a famous tool for big data due to it’s power, speed and flexibility.
3.
The organizations such as Amazon, Netflix, Starbucks, Oracle, IBM use big data and tools. The organizations where the data is huge such as immigration, police, streaming platforms, etc. Any organization which requires to maintain a lot of data and process it according to the need, or where the people are more do require a database management and tools in order to process the data in an easier manner. Nowadays many companies use the tools and management in order to be successful.
Why? With the amount of data increasing across the world, it's neccesary to keep a track and extract useful knowledge from this. It's said that there's abundant data available but the usage of it isn't defined. Hence it's required for the organizations to use such techniques to provide better solutions.
comment for queries!