In: Computer Science
The following code will
generate a Decision Tree. You need to run the code and explain the
tree. After you get the Tree. You need to explain how does it draw
like that.
install.packages("rpart.plot") # install package
rpart.plot
##########################################
# section 7.1.1 Overview of a Decision Tree
##########################################
library("rpart")
library("rpart.plot")
# Read the data
setwd("c:/data/")
banktrain <-
read.table("bank-sample-test.csv",header=TRUE,sep=",")
## drop a few columns to simplify the
tree
drops<-c("age", "balance", "day", "campaign", "pdays",
"previous", "month")
banktrain <- banktrain [,!(names(banktrain) %in%
drops)]
summary(banktrain)
# Make a simple decision tree by only keeping the
categorical variables
fit <- rpart(subscribed ~ job + marital + education + default
+ housing + loan + contact + poutcome,
method="class",
data=banktrain,
control=rpart.control(minsplit=1),
parms=list(split='information'))
summary(fit)
# Plot the tree
rpart.plot(fit, type=4, extra=2, clip.right.labs=FALSE,
varlen=0, faclen=3)
HI,
please see the code I have used for the decision tree. The dataset used has the column name as "y" instead of "subscribed" so please do the necessary substitution if you use your dataset.
The dataset I have taken is uploaded on the github repository - https://github.com/just4jin/bank-marketing-prediction/blob/master/data/bank.csv
The code-
install.packages("rpart.plot") # install package
rpart.plot
##########################################
# section 7.1.1 Overview of a Decision Tree
##########################################
library("rpart")
library("rpart.plot")
# Read the data
setwd("c:/data/")
banktrain <- read.table("bank_full.csv",head=TRUE,sep=";")
## drop a few columns to simplify the tree
drops<-c("age", "balance", "day", "campaign", "pdays",
"previous", "month")
banktrain <- banktrain [,!(names(banktrain) %in% drops)]
summary(banktrain)
# Make a simple decision tree by only keeping the categorical
variables
fit <- rpart(y ~ job + marital + education + default + housing +
loan + contact + poutcome,
method="class",
data=banktrain,
control=rpart.control(minsplit=1),
parms=list(split='information'))
summary(fit)
# Plot the tree
rpart.plot(fit, type=4, extra=2, clip.right.labs=FALSE, varlen=0,
faclen=3)