In: Computer Science
One of the characteristics of Big Data is the variety of data. Explain why this characteristic has resulted in the need for languages other than SQL for processing Big Data.
Variety makes Big Data really big. Big Data comes from a great variety of sources and generally is one out of three types: structured, semi structured and unstructured data.
1) Structured
By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. It refers to highly organized information that can be readily and seamlessly stored and accessed from a database by simple search engine algorithms.
2) Semi-structured
Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers to the data that although has not been classified under a particular repository (database), yet contains vital information or tags that segregate individual elements within the data.
3) Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This makes it very difficult and time-consuming to process and analyze unstructured data. Data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are considered as unstructured data.
This huge variety of unstructured data poses certain issues for storage, mining and analyzing data and it can be said that due to introduction od unstructured data there comes a need and demand for languages other than SQL for the processing of data.
As SQL can handle structured and little bit of semi-structured data, but cannot handle unstructured data. So various other softwares/languages are used like NoSQL( Cassandra, MongoDB, DynamoDB etc.)