In: Computer Science
Explain how RAM can be exploited by Big Data databases to increase overall performance.
Explain the differences between the two types of in-memory databases: in-memory database (IMDB) and and in-memory data grid (IMDG)
Big Data database to increase is the significantly higher access speeds resulting from the use of RAM. This also leads to quicker data analysis. However, it’s not only the reduced fetch time that optimizes data analysis. In-memory DBs make the evaluation of structured and unstructured data possible from any system. Until now, companies and software solutions have been faced with the challenge of storing and processing large amounts of unstructured data, such as texts, images, or audio and video files.
By using distributed data infrastructures, unstructured data can be stored in an in-memory database, in which several processing units (computers, processors, etc.) work on a common task in parallel and distribute it to different server clusters. This results in both a higher storage capacity, faster processing, and better transfer speed of the unstructured data.
The popularity of NoSQL databases has increased due to the need of:
(1) processing vast amount of data faster than the relational database management systems by taking the advantage of highly scalable architecture,
(2) flexible (schema-free) data structure, and,
(3) low latency and high performance. Despite that memory usage is not major criteria to evaluate performance of algorithms,
The differences between the two types of in-memory databases: in-memory database (IMDB) and and in-memory data grid (IMDG) are:
The distinction between an IMDG and IMDB is fairly technical. Data grids tend to use key-value stores as embedded objects on a Java Virtual Machine (JVM), while in-memory databases are optimized for columnar storage. In essence, IMDGs offer a memory fabric that can be used by popular programming languages, such as Java or .NET, to support any application needs. IMDBs, on the other hand, store frequently accessed data or all application data in-memory to support database applications and other workloads
An In-Memory Data Grid (IMDG) is a technology that is designed to handle intensive data processing applications. Think of an IMDG as a mechanism that seamlessly gives your applications access to the random-access memory (RAM) across multiple computers. It is as if you simply added much more RAM modules to your application’s computer, while also gaining the ability to run multiple applications in parallel on the same data to handle large processing tasks. This makes application development much easier since you do not have to constantly read and save small chunks of data as you would with a database.
An In-Memory Database (IMDB), by comparison, is a system that lets you store and read data, which is all done in memory. When compared to traditional disk-based databases, IMDBs have the obvious advantage of reading and writing data much faster, since both activities are done via RAM. Compared to IMDGs, IMDB applications generally process smaller blocks of data at a time, since the applications have to read data from the IMDB and then write it back once the processing is done. And IMDBs are necessarily run separately from the applications, so network communication between the application and the IMDB is always required. If you choose to implement an IMDB, you often are looking to replace your existing databases, and since they are normally a core part of a legacy system, replacing them is non-trivial. On the upside, there will be limited changes required to the application layer. IMDBs can also be used as caches like IMDGs, but with the disadvantage of requiring the transfer of data over the network to the application.