In: Computer Science
Vendor+Software | Snowflake | IBM Db2 | Google Bigquery | Rubrik | Vertica |
Functionality | a core architecture to enable many types of data workloads, including a single platform for developing modern data applications. | IBM Db2 databases to a higher standard, making it easy to deploy your data wherever it's needed, fluidly adapting to your changing needs and integrating with multiple platforms, languages and workloads. | BigQuery is Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. | Rubrik delivers instant application availability to hybrid cloud enterprises for recovery, search, cloud, and development. | Vertica combines the power of a high-performance, massively parallel processing SQL query engine with advanced analytics and machine learning so you can unlock the true potential of your data with no limits |
ETL process |
Snowflake supports both transformation during (ETL) or after loading (ELT). |
The ETL process can be as simple as transferring some data from one table to another on the same system. It can also be as complex as taking data from an entirely different system that is thousands of miles away and rearranging and reformatting it to fit a very different system. | With the advent of Cloud Data Warehousing (such as Google BigQuery), the transformation process is more commonly performed at the data warehouse. The ETL approach transforms to ELT (Extract-Load-Transform). This may simplify data loading to Google BigQuery greatly, as you don't need to think about the necessary transformations. | Provide instant clones to developers, accelerate application development, testing, and ETL (extract, transform and load) workflows without any impact to production environments. |
ETL processes extract data from Vertica system tables in the V_CATALOG and V_MONITOR schemas and load it into the dimension tables and fact tables in the VHist schema. Populating VHist is a two-step process:
|
Platform | Snowflake works with a wide range of data integration tools, including Informatica, Talend, Tableau, Matillion and others. | IBM Db2 Database can be deployed on IBM Cloud Pak for Data, a fully-integrated data and AI platform, built on the Red Hat OpenShift Container Platform. | You can access BigQuery by using the Cloud Console or the
classic web UI, by using the bq command-line tool, or
by making calls to the BigQuery REST API using a variety of client
libraries such as Java, .NET, or Python. There are also a variety
of third-party tools that you can use to interact with BigQuery,
such as visualizing the data or loading the data. |
Cloud Data Management platform | premise, in the clouds (AWS, Azure and GCP), on Apache Hadoop, or as a hybrid model. |
Best feature | instant, secure, and governed access to their entire network of data | Data movement from one DB2 database to another can be a large part of the task of populating a large data store. The DB2 EXPORT and IMPORT utilities allow you to move data from a host or iSeries server database to a file on the DB2 Connect workstation, and the reverse. | When you use BigQuery, all your data operates on a cloud platform. It has a serverless architecture, which allows you to scale your analytics automatically. This feature allows you to focus on the most critical insights that you want to uncover. | mobilize applications, automate protection policies, recover from Ransomware, search and analyze application data at scale | Column-oriented storage organization, which increases performance of queries. • Standard SQL interface with advanced analytics capabilities built-in, such as time series, pattern matching, event series joins, machine learning and geospatial. • Compression, which reduces storage costs and I/O bandwidth. |
Support | Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform | Linux, Unix, and Windows operating systems. | Google BigQuery support for Spark, SQL, and DataFrames - spotify/spark-bigquery. | Rubrik supports the leading operating systems, databases, hypervisors, clouds, and SaaS applications. | Support for standard programming interfaces ODBC, JDBC, ADO.NET, and OLEDB. • Integration to Hadoop with the capability to perform analytics on ORC and Parquet files directly. |
Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
With Hadoop and its robust architecture and economical feature, it is the best fit for storing huge amounts of data. Hadoop ecosystem contains many components like MapReduce, Hive, HBase, Zookeeper, Apache Pig etc. These components are able to serve a broad spectrum of applications.
Challenges