MapR is a complete enterprise-grade
distribution for Apache Hadoop. The
MapR Converged Data Platform has been engineered
to improve Hadoop's reliability, performance, and
ease of use. You can use MapR with Apache
Hadoop, HDFS, and MapReduce APIs.
The main functions of the MapR Data Platform
include storage, management, processing, and analysis of data for
AI and analytics applications. It also provides increased
reliability and ensured security over mission critical information.
MapR is built for organizations with demanding
production needs.
Advantages:
Disadvantages:
- MapR does not have a good interface console as Cloudera.
- It's more expensive
- MapR basically rewrote HDFS and HBase to be more performant,
but some companies prefer the apache code base which is open source
and used in the all other distributions. It can make integration
with other tools easier, as there is more documentation and support
from a broader community available.
MapR Advantages from user,developer,administrator and
risk perspective:
- Easy data ingestion: Copying data to and from
the MapR cluster is as simple as copying data to a standard file
system using the Direct Access NFS™ capabilities of the MapR
Converged Data Platform. Applications can therefore ingest data
directly into the MapR cluster in real time without the need for
staging areas or redundant clusters just to ingest data.
- Existing applications work: Due to the
POSIX-compliant MapR Distributed File and Object Store integrated
into the MapR Converge Data Platform, any application works
directly on MapR without undergoing code changes. Existing tools,
scripts, custom utilities and applications are good to go on day
one.
- Multi-tenancy: Support multiple user groups,
any and all enterprise data sets, and multiple applications in the
same cluster. Data modelers, developers and analysts can all work
in unison on the same cluster without stepping on each other's
toes.
- Business continuity: The MapR Converged Data
Platform provides integrated high availability (HA), data
protection, and disaster recovery (DR) capabilities to protect
against both hardware failure as well as site-wide failure.
- Global scale: Scalability is key to the MapR
Converged Data Platform so the analytics can operate at both
data-at-rest and data-in-motion. MapR provides the only data
platform that scales to trillions of files, millions of event
streams and petabytes of raw data without compromising
performance.
- High performance: The MapR Converged Data
Platform was designed for high performance with respect to both
high throughput and low latency for Apache Hadoop and Apache Spark
applications. In addition, the MapR Platform requires significantly
fewer servers versus other big data platforms, leading to
architectural simplicity and lower capital and operational
expenses.