Question

In: Computer Science

Select two PaaS cloud service providers and  Analyze each service provider’s capability to provide big-data cloud services...

Select two PaaS cloud service providers and  Analyze each service provider’s capability to provide big-data cloud services using the following framework

Preprocessing of unstructured data: Describe the solution. Does the capability exist within the offering or is the capability the customer’s responsibility?

Social graphs, API, and visualization tools: What tools exist within the offering, if any?

Data Analytics software tools: What tools exist? And what are their capabilities?

Machine learning: What capabilities exist, if any?

Data governance and security: What processes and tools are embedded in the offering? How are responsibilities identified between vendor and customer?

Solutions

Expert Solution

Hello Learner,

Thanks for your question.

I will be comparing Google cloud with AWS


1. Preprocessing Unstructured Data:
Google: Cloud Dataprep

Google provides a serverless solution for the preprocessing of unstructured data called Cloud Dataprep. Cloud Dataprep is operated by Trifacta. The tools help users to structure, cleanse and blend data without writing a single line of code by just providing UI input. Once you have provided the data tO Dataprep it used Cloud Dataflow to process the unstructured data. Being a serverless less solution Clod Dataprep can scale on demand.

AWS: Kinesis Analytics
Kinesis Analytics
will read the unstructured data and will create a schema having a single column. However, we can use AWS Lamda for creating the schema of unstructured data based on our requirements. We need to create schema because Kinesis Analytics uses SQL to analyze your data.

So Google provides an application for Data pre-processing but AWS provides very limited inbuilt Data-preprocessing capability, it is customer responsibility to create a schema based on the requirement

2. Social graphs, API, and visualization tools:

Google - Data Studio:

Data Studio is the tool provided by Google which helps in visualizing the BigQuery data. We can use the tool to see the trend in the data and makes business decisions. We can visualize the data by connecting it to BigQuery Source and then select the data source. Once the data is loaded to Data Studio we can select the type of the chart.

AWS - QuickSight:

QuickSight is the tool provided by Amazon for easy visualization of the data and to get insights from the data, anytime, on any device. QuickSight can take data from different sources like MS Excel, CSV or any of the SaaS applications. QuickSight is a smart Business Intelligence solution for any kind of analytics.

3. Data Analytics software tools

Google - Google Cloud BI solution:

Google Cloud BI solution is the data analytics tool provided by Google. We can load the data from any source to BigQuery, which will then do data processing & cataloging. Now we can use tools like BigQuery(SQL interface), Data Studio, Google Sheet to perform ad-hoc analysis, advance analytics, visualization & reporting.

AWS- QuickSight:

QuickSight is the BI tool provided by Amazon which helps users or organizations to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device. We can upload the data from different data sources like Excel, CSV or any of the SaaS applications. QuickSight helps organizations to scale the business analytics capabilities to a huge number of users by using a robust in-memory engine (SPICE– a Super-fast, Parallel, In-memory Calculation Engine).  SPICE supports a rich number of calculations to help derive valuable insights from data Data in SPICE is persisted.

4. Machine learning:

Google- TensorFlow:

TensorFlow is an end to end platform for Machine Learning provided by Google. It has a flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications. TensorFlow makes the machine learning easy by training the models by using the high-level Keras API.TensorFlow Extended tool can be used for full production ML pipeline

AWS: Amazon SageMaker:

Amazon SageMaker is an end to end machine learning platform that enables users to quickly build, train, and host machine learning models at any scale of the data. There are 3 components for SageMaker i.e. Authoring, Model Training, and Model Hosting. In order to get the machine learning solution, we first need to create a Jupyter notebook(inbuilt with all the functionality), then use the various algorithms (Supervised or Unsupervised ) to train the model and then hostel the model using HTTPs endpoints to get realtime inferences.

5. Data governance and security:

Google:
Data governance and security are the core of any storage provider. Google has various processes to provide data security.

1. Data Encryption: Data will be encrypted while in-transit and the rest of the journey. So only the authorized user will be able to view the stored data.
2. Cloud Key Management System (KMS): Google manages cryptographic keys and will rotate the key frequently.
3.Cloud Identity and Access Management (IAM): It is the tool provided by Google which helps administrators to authorize access to specific resources. Giving full control and visibility to manage cloud resources centrally.
4. Data Backup: Data is also automatically replicated and encrypted for backup and disaster recovery.
5. Data Deletion: When data is ready to be deleted, it is first marked as "scheduled for deletion," and then it is removed in accordance with service-specific policies.

AWS-

AWS uses a de-identified data lake (DIDL) to provide data security on the cloud. DIDL architecture approach helps to provide data privacy by de-identifying and protecting sensitive information while in in-transit. DIDL solutions help enterprises get to the root cause of risk associated with the data architectures and protecting PII. A DIDL on AWS can help to discover, identify, catalog, monitor, and protect your data. It removes personally identifiable information before it enters your data lake.

Please let me know if you need any further information, I will be more than happy to help you.


Related Solutions

Research the Web for service-level agreements of two different providers of cloud services and compare these...
Research the Web for service-level agreements of two different providers of cloud services and compare these based on availablity, security, and privacy. How do the agreements differ? are the agreements reasonable? Which provider woluld you select for your cloud infrastructure if you were to start a commpany?
Identify 4 cloud service providers and give the required criteria provided below for each of the...
Identify 4 cloud service providers and give the required criteria provided below for each of the 4 cloud services. Which provider would you select for a GIS development? Why? Provider Service Model: IAAS/ PAAS Deployment Model: Public Cloud, Private Cloud, Hybrid Virtual Machine Size: Max CPU, Max Mem Operating Systems Uptime (availability) Security Location of servers Application services Database services
A trading firm is deciding between subscribing to three different data services. Each service will generate...
A trading firm is deciding between subscribing to three different data services. Each service will generate revenues of $500,000 per month for four years with the first payment received in one month. The three services available are as follows: (i) pay $600,000 today plus a monthly fee. The fee is $80,000 in one month then decrases by five percent per month for eleven months. Then it remains the same for the next three years. (ii) Pay two million today. (iii)...
What two types of services do internal auditors provide? Provide three examples of each type of...
What two types of services do internal auditors provide? Provide three examples of each type of engagement. What steps are included in the planning phase of an assurance engagement? What is the relationship between business objectives and business assertions? What does "inherent risk" mean? What elements do well-written observations include? What is the difference between "negative assurance" and "positive assurance?" What information must final assurance engagement communications include?
The following analysis of selected data is for each of the two services Gates Corporation provides....
The following analysis of selected data is for each of the two services Gates Corporation provides. Service A Service B Per-service data at 10,000 services Sales price $26 $22 Service costs: Variable 9 9 Fixed 6 4 Selling and administrative expenses: Variable 5 3 Fixed 3 1 In the Gates operation, labor capacity is the company’s constraining resource. Each unit of A requires 3 hours of labor, and each unit of B requires 2 hours of labor. Assuming that all...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT