Question

In: Computer Science

Research on the different Data Warehousing Software (at least 5). Compare and contrast them in terms...

  1. Research on the different Data Warehousing Software (at least 5). Compare and contrast them in terms of:
    1. Vendor
    2. Functionality
    3. ETL process
    4. Platform
    5. Best feature
    6. Support

  1. What is Hadoop? Why is it important? What are the challenges of using Hadoop?

Solutions

Expert Solution

Vendor+Software Snowflake IBM Db2 Google Bigquery Rubrik Vertica
Functionality a core architecture to enable many types of data workloads, including a single platform for developing modern data applications. IBM Db2 databases to a higher standard, making it easy to deploy your data wherever it's needed, fluidly adapting to your changing needs and integrating with multiple platforms, languages and workloads. BigQuery is Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. Rubrik delivers instant application availability to hybrid cloud enterprises for recovery, search, cloud, and development. Vertica combines the power of a high-performance, massively parallel processing SQL query engine with advanced analytics and machine learning so you can unlock the true potential of your data with no limits
ETL process

Snowflake supports both transformation during (ETL) or after loading (ELT).

The ETL process can be as simple as transferring some data from one table to another on the same system. It can also be as complex as taking data from an entirely different system that is thousands of miles away and rearranging and reformatting it to fit a very different system. With the advent of Cloud Data Warehousing (such as Google BigQuery), the transformation process is more commonly performed at the data warehouse. The ETL approach transforms to ELT (Extract-Load-Transform). This may simplify data loading to Google BigQuery greatly, as you don't need to think about the necessary transformations. Provide instant clones to developers, accelerate application development, testing, and ETL (extract, transform and load) workflows without any impact to production environments.

ETL processes extract data from Vertica system tables in the V_CATALOG and V_MONITOR schemas and load it into the dimension tables and fact tables in the VHist schema.

Populating VHist is a two-step process:

  1. Source to stage—The source tables are loaded into staging tables. Each staging table is a replica of its source table, with one difference: the addition of BATCH_ID, an integer column that holds the identity of the ETL batch load.
  2. Stage to star—The staging tables are transformed into a star schema.
Platform Snowflake works with a wide range of data integration tools, including Informatica, Talend, Tableau, Matillion and others. IBM Db2 Database can be deployed on IBM Cloud Pak for Data, a fully-integrated data and AI platform, built on the Red Hat OpenShift Container Platform. You can access BigQuery by using the Cloud Console or the classic web UI, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python. There are also a variety of third-party tools that you can use to interact with BigQuery, such as visualizing the data or loading the data. Cloud Data Management platform premise, in the clouds (AWS, Azure and GCP), on Apache Hadoop, or as a hybrid model.
Best feature instant, secure, and governed access to their entire network of data Data movement from one DB2 database to another can be a large part of the task of populating a large data store. The DB2 EXPORT and IMPORT utilities allow you to move data from a host or iSeries server database to a file on the DB2 Connect workstation, and the reverse. When you use BigQuery, all your data operates on a cloud platform. It has a serverless architecture, which allows you to scale your analytics automatically. This feature allows you to focus on the most critical insights that you want to uncover. mobilize applications, automate protection policies, recover from Ransomware, search and analyze application data at scale Column-oriented storage organization, which increases performance of queries. • Standard SQL interface with advanced analytics capabilities built-in, such as time series, pattern matching, event series joins, machine learning and geospatial. • Compression, which reduces storage costs and I/O bandwidth.
Support Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform Linux, Unix, and Windows operating systems. Google BigQuery support for Spark, SQL, and DataFrames - spotify/spark-bigquery. Rubrik supports the leading operating systems, databases, hypervisors, clouds, and SaaS applications. Support for standard programming interfaces ODBC, JDBC, ADO.NET, and OLEDB. • Integration to Hadoop with the capability to perform analytics on ORC and Parquet files directly.
  1. What is Hadoop? Why is it important? What are the challenges of using Hadoop?

Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.

With Hadoop and its robust architecture and economical feature, it is the best fit for storing huge amounts of data. Hadoop ecosystem contains many components like MapReduce, Hive, HBase, Zookeeper, Apache Pig etc. These components are able to serve a broad spectrum of applications.

Challenges

  • Lack of performance and scalability.
  • Lack of flexible resource management.
  • Lack of application deployment support.
  • Lack of quality of service.
  • Lack of multiple data source support.

Related Solutions

1. Compare and contrast the main differences between data and data warehousing. (4 marks) Question 1b....
1. Compare and contrast the main differences between data and data warehousing. Question 1b. Discuss the two major techniques for Data Mining. Question 2a Define Data Warehousing (DW)? Question 2b List and discuss two characteristics of Data Warehousing Question 3a State two reasons why data visualisation is important? Question 3b. Suggest two ways to build an Interactive Dashboard to the Air transport logistics Director Question 5. List two benefits of using a dashboard?
Research at least four different data loss prevention (DLP) products from four different vendors. Compare at...
Research at least four different data loss prevention (DLP) products from four different vendors. Compare at least six different functions and options. Based on your research which would you choose? What features make this product the optimum? Why? Write a summary of your research. Minimum 150 words for your response
Compare and contrast transformation, transduction and conjugation. Include at least 5 different comparisons/contrasts between all three.
Compare and contrast transformation, transduction and conjugation. Include at least 5 different comparisons/contrasts between all three.
There are five (5) main types of warehousing. List and explain them.
There are five (5) main types of warehousing. List and explain them.
Compare and contrast different local and international sustainable reporting frameworks in terms of Australia.
Compare and contrast different local and international sustainable reporting frameworks in terms of Australia.
research hemorrhage, describe the condition, and include at least 30 medical terms and bold them as...
research hemorrhage, describe the condition, and include at least 30 medical terms and bold them as you discuss signs and symptoms and treatment.   Please Bold the condition and terms that would be new to most people. Cite the references if you have used them in the research process. Please do not plagiarize. I will upvote if completed like this!
5. Compare and contrast lumbar curvature and waist to hip ratio specifically in terms of the...
5. Compare and contrast lumbar curvature and waist to hip ratio specifically in terms of the research. Where is the research lacking? Where are there good point? (Evolutionary biology course PLZ answer it in a broad and clear way. Thank you!) Expert Answer
Compare and contrast at least three data sets used in electronic health records
Compare and contrast at least three data sets used in electronic health records
Compare and contrast the 5 different types of leukocytes (both structure and function).
Compare and contrast the 5 different types of leukocytes (both structure and function).
Compare and contrast cocaine with amphetamine in terms of how it’s consumed, its different uses, the...
Compare and contrast cocaine with amphetamine in terms of how it’s consumed, its different uses, the effects it has on the user, and how it manipulates monoamine activity (is the manipulation agonistic or antagonistic?). What kind of long-term effects do stimulants have on the health of habitual users
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT