In: Computer Science
The term "Big Data" is used so often today, but some still don't have a basic understanding of the term. This discussion aims to provide a simple definition of the term that you can use in everyday discussion. Once you've read this posting, share with us your thoughts about big data. Is it good? Is it evil? What potential benefits do you see? Is there any ethical responsibility that goes along with use of this data? Happy learning and Posting!
BIG DATA IS BIG!
First, watch the following video that briefly describes big data and some of its benefits.
Play media comment.
Big data is new and “ginormous” and scary –very, very scary. No, wait. Big data is just another name for the same old data marketers have always used, and it’s not all that big, and it’s something we should be embracing, not fearing. No, hold on. That’s not it, either. What I meant to say is that big data is as powerful as a tsunami, but it’s a deluge that can be controlled . . . in a positive way, to provide business insights and value. Yes, that’s right, isn’t it?
Over the past few years, I have heard big data defined in many, many different ways, and so, I’m not surprised there’s so much confusion surrounding the term. So to get things started, let's converge on a single definition of the term:
Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.
Some people like to constrain big data to digital inputs like web behavior and social network interactions; however I believe that we can’t exclude traditional data derived from product transaction information, financial records and interaction channels, such as the call center and point-of-sale. All of that is big data, too, even though it may be dwarfed by the volume of digital data that’s now growing at an exponential rate.
In defining big data, it’s also important to understand the mix of unstructured and multi-structured data that comprises the volume of information.
Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.
Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transactional information. As digital disruption (Links to an external site.) transforms communication and interaction channels—and as marketers enhance the customer experience across devices, web properties, face-to-face interactions and social platforms—multi-structured data will continue to evolve.
Industry leaders like the global analyst firm Gartner (Links to an external site.) use phrases like “volume” (the amount of data), “velocity” (the speed of information generated and flowing into the enterprise) and “variety” (the kind of data available) to begin to frame the big data discussion. Others have focused on additional V’s, such as big data’s “veracity” and “value.”
What can you do with big data? Well, first, since this is a BUSINESS class, let's think of the potential for business. Watch this brief video about the potential of getting to know your customer better through big data.
Play media comment.
One question that arises when thinking about the vast amounts of data that companies now have access to is the question of consumer data privacy. Where does the ethical responsibility regarding use or misuse of such data lie?
Due to the increase in the power of computation and the advent of Internet and mobile networks, the rate at which data is getting generated is getting increased exponentially for the past two decades. Conventinally, organizations used to reprsent the data in a structured format using RDBMS. But now the data is getting generated through different devices (Computers/Laptops/Mobiles/Digital Cameras/CCTVs/Social Medias etc...), mostly these data are unstructured, i.e. they don't follow any particular structure / order. This unstructured data generated through different means in huge quantity are called as Big Data.
As per Gartner, Big Data will be defined as the data that is:
1. high in Volume (quantity of data)
2. Created at a faster rate. (Velocity)
3. The data created will be of varying kinds. (Variety)
Further, researchers / scientists extended the following features to Big Data definition
4. The correctness / accuracy of the data will inconsistent. (Veracity) and
5. The quality of the data will be inconsistent. (Value).
Is Big Data is Good?
Yes, Big Data is good, if the data available within an organization is studied and analyzed through experts and through Artificial Intelligence techniques like Machine Learning. By effectively analyzing the data, deep insights about the customer buying patterns / profit options / wastages / overheads can be understood and proper remedial or improvement action can be taken bu the company.
Is Big Data is Evil?
Big Data may turn to evil, if the data is not analyzed and if the organization is not learning from the data, then this will just increase the back up storage of the organization without any insights or learning. Further, if the data is available on a wrong hand then they may misuse the data leading to privacy issues for the customers and the customers may loose hope and respect for the organization.
Ethical Responsibility in usig the Big Data
Atmost care should be taken that the customer name and the other personal details about the customers are removed from the data before it reaches the hands of Data Analyst or Data Scientists. Care should be taken that the priivacy of the customers is preserved and it is not comprimised at any point. It would be advisable to not have any human involvement in collecting and processing the personal data and before analyzing the data the personal details about the customers should be automatically removed by the system.
The personal details about the customer should be stored in encrypted format to avoid in mishandlings.
Not following ethical responsibility will lead to information leakage like the disease of a patient, password of user, the credit card / debit card details, personal communications etc....Care should be taken to ensure that the customer data is safe.