Question

In: Statistics and Probability

Scenario Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition....

Scenario

Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers (depositors) with varying sizes of relationship with the bank. The customer base of asset customers (borrowers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business. In particular, it wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).

A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise smarter campaigns with better target marketing. The goal is to use k-NN to predict whether a new customer will accept a loan offer. This will serve as the basis for the design of a new campaign.

The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer’s relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.

  • UniversalBank.csv

With all this information in mind and the use of R, your job is to:

  • Partition the dataset into 60% training and 40% validation sets considering the information on the following customer:
    Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1, Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1, and Credit Card = 1.

  • Perform a k-NN classification with all predictors except ID and ZIP code using k = 1. Remember to transform categorical predictors with more than two categories into dummy variables first.

  • Specify the success class as 1 (loan acceptance), and use the default cutoff value of 0.5. How would this customer be classified?

  • Tell me, what is a choice of k that balances between overfitting and ignoring the predictor information?

  • Show the confusion matrix for the validation data that results from using the best k. Then,

  • Consider the following customer:
    Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1, Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1 and Credit Card = 1.

  • Classify the above customer using the best k.

  • Repartition the data, this time into training, validation, and test sets (50% : 30% : 20%).

  • Apply the k-NN method with the k chosen above.

  • Compare the confusion matrix of the test set with that of the training and validation sets.

  • Comment on the differences and their reason.

Solutions

Expert Solution

Market basket analysis covers the two methods association and sequence analysis.   Both   are   useful   to   find   frequent   patterns among   the   variables.   The association method is useful to identify, which variables occur together and accordingly creates a rule.

The rule is developed by counting how often a variable emerge alone and in combination in the data. In addition to the connection of variable s and their probability, sequencing also considers the order in which the relationships occur. Thus, it includes a timing element in the analysis.

Overall, the market basket analysis is useful to find out the probability that variables appear together. Unfortunately, this analysis does not give us any results out of two reasons. First, no significant association at a confidence level of
5% could be created for unknown reasons. Second, there is no time element, which is necessary for performing a sequence discovery.  

Memory based reasoning uses the K-nearest neighbour method to make prediction for new data. For binary target variables, this method searches a local area of predefined K numbers of neighbours and allocates the new object to the closest neighbour.

In terms of our target, the disadvantage is again that this is a predictive method and does not help to find important variables to explain the characteristics of loan taker. Obviously, text mining is used to detect patterns i n articles or other written documents and therefore is of no use for the universal bank data set.

class prob neighbour age experience income family CCAvg
0 0 1 40 10 84 2 2
edu_1 edu_2 Morgage securities acct cd account online credit card
1 0 0 0 0 1 1

From the output we conclude that the above customer is classified as belonging to the loan not accepted group

Choice of k that balances between over fitting and ignoring the predictor would be k = 6. The value is chosen because it minimizes the % validation error. After testing various k levels. According to the validation error log for different k the best k points to 6, where %error training is 7.4% and validation % error is 8.75%.

Validation error log for different k

value of K Error training % error validation %
1 0 10
2 5.83 13.75
3 6.67 11.25
4 7.5 18.75
5 6.67 12.5
6 7.5 16.5
7 10 12.5
8 9.17 12.5
9 8.33 11.25

The value of k that balances between over fitting and ignoring the predictor information is 9.

Validation Data scoring - Summary Report (for k=1)

cut-off prob value -> 0.5

class prob neigthbour age experience income family CCAvg
0 0 1 40 10 84 2 2
edu_1 edu_2 Morgage securities acct cd account online credit card
0 01 0 0 0 1 1

Related Solutions

A national bank that is developing very rapidly will impose a new mechanism in customer service....
A national bank that is developing very rapidly will impose a new mechanism in customer service. For this reason, trials were conducted at several branch offices. If from the trial it turns out consumers are more satisfied then the new mechanism will be applied to all of its branch offices. 24 customers were chosen to be asked for their opinion on the new mechanism. Their answer is to compare to various elements of the new mechanism with existing ones. There...
In a rapidly growing market like energy drinks, firms need to capture the market scenario quick...
In a rapidly growing market like energy drinks, firms need to capture the market scenario quick enough. Firms that fail to keep up the changing pace with the market can lose out to their competitors. Red bull is one of the brands in the energy drink industry. Red bull has estimated the demand of its product by the following function: Q = 12.5 P0 -1.1A 2.1Y 1.3P -1.8 Where, Q = Number of energy drink bottles sold of Red bull...
SCENARIO: The Duncan National Commercial Bank of The Caribbean (DNCBC) is a relatively large financial institution...
SCENARIO: The Duncan National Commercial Bank of The Caribbean (DNCBC) is a relatively large financial institution in the Caribbean region. It has assets of over US $4 billion and in 2017 it experienced a net loss for the first time in over 50 years of operations. The loss was attributed to aggressive expansion in St. Lucia, Barbados and Dominica and the Directors estimate that profitability will be achieved in the new financial year March 1 2018- April, 30, 2019.The bank...
Regional Bank has been growing rapidly. In the past two years, it has acquired six smaller...
Regional Bank has been growing rapidly. In the past two years, it has acquired six smaller financial institutions. The long-term strategic plan is for the bank to keep growing and to “go public” within the next three to five years. FDIC regulators have told management that they will not approve any additional acquisitions until the bank strengthens its information security program. The regulators commented that Regional Bank’s information security policy is confusing, lacking in structure, and filled with discrepancies. Should...
Regional Bank has been growing rapidly. In the past two years, it has acquired six smaller...
Regional Bank has been growing rapidly. In the past two years, it has acquired six smaller financial institutions. The long-term strategic plan is for the bank to keep growing and to “go public” within the next three to five years. FDIC regulators have told management that they will not approve any additional acquisitions until the bank strengthens its information security program. The regulators commented that Regional Bank’s information security policy is confusing, lacking in structure, and filled with discrepancies. What...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT