In: Statistics and Probability
1 a) Imagine a real-world example (application, situation, or business) that can benefit from Data Analytics. Discuss what sorts of data can be available in your example (not necessarily only one dataset), making sure some of the data to be structured, and some to be unstructured.
You need to be specific about the data details, such as: where it comes from, how it can be collected, how it looks like (give few specific samples), and most importantly: how the data analytics tasks/questions can benefit the users or stakeholders in your given example
b) Discuss whether you agree or disagree with the following statement: “The ultimate goal of data analytics is to understand the causal relationships among variables in a dataset”.
So, I am giving an example of credit card problem. As you know, one of biggest risk associated with credit cards is the risk of default by the customer which is also defined as credit risk.
So, in this we are solving this problem for a credit card company who is launching a new credit card for businesses and have decided to offer this new card to its existing card holders. They have to asisign every existing customer as a low type or a high type based on the history associated with the given individual and this characterization will help the company assign a credit limit.
Going ahead, lets understand the attributes that will be of use for the company and how can they access the data.
1. Yearly business revenues - This variable will help in knowing the income source which will be used to repay the amount.
2. Years since when the business is going - This reflects the reliability of the individual's income
3. Average utilization of credit line on half yearly basis (which is defined as the credit limit defined by the company) - This number is in percentage.
4. Average balance in 6 Months (The amount that an individual has to pay)
5. Avearge amount paid in 6 Months
6. Number of cards - It is an indicator of reliability of the customer
All the above variables can be collected from the internal data base of the company.Now, lets focus on the external information that can be of some use for the company.
The company will contact the consumer bureau which can give them the data on the borrowing history of the individual.
7. Number of cards apart from the company's card
8 . Average balance and the amount paid in 6 Months
Then we can go and collect the credit rating associated with the individal.
Based on the above information,company can assign a low type or a high type.
b.) Not always. Yes, causation is an important subset of the problems but sometimes we are interested in classification or some times finding the insights from the data is the only objective behind data analytics.