In: Computer Science
1.List the 3 categories of Data Management and 3 Layers of Data Architecture
Explain the link between each of them, you can include drawings of Data Integration Flow to show that.
2. List the 5 categories of requirements for a BI project?
Provide 1 example of each requirement category for a BI system for any business.
3. Discuss the differences between structured and Unstructured Data.
Provide 3 examples of each type of data?
4.
Also explain your answer
id |
name |
salary |
start_date |
dept |
1 |
Rick |
623.3 |
1/01/2012 |
IT |
2 |
Dan |
515.2 |
23/09/2013 |
Operations |
3 |
Michelle |
611 |
15/11/2014 |
IT |
4 |
Ryan |
729 |
11/05/2014 |
HR |
5 |
Gary |
843.25 |
27/03/2015 |
Finance |
6 |
Nina |
578 |
21/05/2013 |
IT |
7 |
Simon |
632.8 |
30/07/2013 |
Operations |
8 |
Guru |
722.5 |
17/06/2014 |
Finance |
price <- 10
if (category =='A'){
cat('A vat rate of 8% is applied.','The total price is',price *1.08)
} else{
cat('A vat rate of 10% is applied.','The total price is',price *1.10)
}
ifelse(a %% 2 == 0,"even","odd")
1. Data Management:
A database is a collection of data or records. Database management systems are designed to manage databases. A database management system (DBMS) is a software system that uses a standard method to store and organize data. The data can be added, updated, deleted, or traversed using various algorithms and queries by SQL.
Types of Database Management Systems:
There are several types of database management systems. Here is a list of seven common database management systems:
Database architecture
Database architecture is an extension of the 2-tier architecture. 3-tier architecture has following layers
The goal of Three-teir architecture is:
Example of Three-teir Architecture is Any large website on the internet
2.BI REQUIREMENTS:
1.Functional Requirements:
Some functionalities, like projects or workspaces, help teams or departments work more effectively, together or apart. Collaboration tools such as messaging, comment threads, email or Slack integrations make it easy to start important conversations and keep them going.
Globalization Support
Projects or Workspaces
Collaboration and Information Sharing
Decentralized Analytics Environment
Write to Transactional Applications
2.Dashboarding and Data Visualization
Dashboards are a staple of business intelligence frankly because they work: they reveal the underlying value of data in a format that people can look at and understand in seconds. It’s no surprise then that data visualization is one of the most important requirements of BI software; by translating insights into a visual medium, data visualization turns complex results into easily understandable conclusions for the user to interpret, customize and share with others.
Dashboards
Storyboarding
Interactive Data Visualizations
Filtering
Drill-Down and Drill-Up Capabilities
Auto-Charting
Geospatial Visualizations and Maps
Animations
Advanced Visualizations using Python and R
Auto-refresh and Real-Time Updates
Pre-Built Templates
Web Accessibility and Embeddability
3.Data Source Connectivity:
you will be able to import all your data into the platform, whether it lives in Excel files, a cloud storage system, an on-premises server – or a combination of all of the above. Doing so ensures that your BI tool will deliver full visibility into all your operations and processes.
Standard Files (i.e. Excel, CSV, XML, JSON, PDF and more)
Statistical Files
Relational and NoSQL Databases
JDBC, ODBC and Parameterized Connections
Big Data Ecosystems
Enterprise BI and ERP Platforms
CRM, Customer Success and Marketing Platforms
E-Commerce and Accounting Platforms
Social Media, SEO and Web Analytics Platforms
Cloud File Storage Systems
Project Management and Enterprise Messaging Platforms
SFTP/FTP Support
4.Data Management
help users prepare, collect and organize data to ensure greater visibility and more accurate results overall.
Data Exploration
Data Modeling
Data Preparation
Data Blending
Extract, Transform, Load (ETL) Tool
Metadata Management and Data Catalog
OLAP and Multi-Dimensional Analysis
Data Governance
Advanced Data Preparation using Python and R
5.Data Querying
A query is a request for data written in a special syntax, often Structured Query Language (SQL), from a database that extracts information and formats it for consumption and analysis. Data querying can perform calculations, automate tasks or dig deeper through data mining, which uncovers hidden trends and relationships between data points. Though more specialized for the fields of data science and big data than business intelligence specifically, it is certainly a feature you can consider depending on your business needs.
Query Multiple Data Sources
Complex Queries
Scheduled Queries
Readable and Modifiable SQL
Multi-pass SQL
Batch Updates
Visual Querying
In-Memory Analysis
Live Connection
3.Structured data vs unstructured data:
Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats. This means that structured data takes advantage of schema-on-write and unstructured data employs schema-on-read.
Structured data is commonly stored in data warehouses and unstructured data is stored in data lakes. Both have cloud-use potential, but structured data allows for less storage space and unstructured data requires more.
The last difference could potentially have the most impact.
Unstructured data is most often categorized as qualitative data, and it cannot be processed and analyzed using conventional tools and methods.
Examples of unstructured data include text, video, audio, mobile activity, social media activity, satellite imagery, surveillance imagery – the list goes on and on.
Unstructured data is difficult to deconstruct because it has no pre-defined model, meaning it cannot be organized in relational databases. Instead, non-relational, or NoSQL databases, are best fit for managing unstructured data.
Another way to manage unstructured data is to have it flow into a data lake, allowing it to be in its raw, unstructured format.
4.queries:
i.
[Create a data frame]
data <- read.csv("input.csv")
[ Get the max salary from data frame.]
sal <- max(data$salary)
print(sal)
output:
843.25
ii.
[Create a data frame.]
data <- read.csv("input.csv")
[ Get the max salary from data frame.]
sal <- max(data$salary)
[ Get the person detail having max salary.]
retval <- subset(data, salary == max(salary))
print(retval)
When we execute the above code, it produces the following result
−
id name salary start_date dept
5 NA Gary 843.25 2015-03-27 Finance
iii. Get all the people working in IT department
[ Create a data frame.]
data <- read.csv("input.csv")
retval <- subset( data, dept == "IT")
print(retval)
When we execute the above code, it produces the following result
−
id name salary start_date dept
1 1 Rick 623.3 2012-01-01 IT
3 3 Michelle 611.0 2014-11-15 IT
6 6 Nina 578.0 2013-05-21 IT
iv.Get the persons in IT department whose salary is greater than
600
[ Create a data frame.]
data <- read.csv("input.csv")
info <- subset(data, salary > 600 & dept ==
"IT")
print(info)
When we execute the above code, it produces the following result
−
id name salary start_date dept
1 1 Rick 623.3 2012-01-01 IT
3 3 Michelle 611.0 2014-11-15 IT