In: Statistics and Probability
Predicting the Amount of Money Spent on Insured Customers
For this assignment, we will be analyzing insured customers' data for an insurance company:
Based on a sample data that consists of the profile of insured customers, we want to be able to predict the dollar amount of money spent by the insurance company on insured customers.
Insured ustomers' Data
The insured customers' data is in a csv file. It has information sconsisting of:
1.age
2.sex (female, male)
3.BMI
4.Children
5.Smoker (yes, no)
6.Region (northeast, northwest, southeast, southwest])
7.expenses
The value we want to predict is expenses
Necessary files are in onedrive:
https://1drv.ms/u/s!Al0FoC_cg4VI3r5Y-ORAr_DjO5etwQ
https://1drv.ms/u/s!Al0FoC_cg4VI3r5X-v6AWSBI2zapLw
I write R-code for that problem. But before run this code
first
copy the given data from Excel. then run it:
The R-code is:
b=read.table("clipboard",header=T)
head(b,10)
attach(b)
x1=as.numeric(sex)
x2=as.numeric(smoker)
x3=as.numeric(region)
l=lm(expenses~age+x1+bmi+children+x2+x3)
summary(l)
And the output is:
> summary(l)
Call:
lm(formula = expenses ~ age + x1 + bmi + children + x2 +
x3)
Residuals:
Min 1Q Median 3Q Max
-11340 -2811 -1021 1407 29740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -35152.71 1174.34 -29.934 < 2e-16 ***
age 257.27 11.89 21.646 < 2e-16 ***
x1 -131.15 332.80 -0.394 0.69359
bmi 332.64 27.72 12.000 < 2e-16 ***
children 479.56 137.64 3.484 0.00051 ***
x2 23819.32 411.83 57.838 < 2e-16 ***
x3 -353.49 151.92 -2.327 0.02013 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 6060 on 1331 degrees of
freedom
Multiple R-squared: 0.7508, Adjusted R-squared: 0.7496
F-statistic: 668.2 on 6 and 1331 DF, p-value: <
2.2e-16
Thus the regression equation is:
expenses=(257.27)*age - (131.15)*sex + (332.64)*bmi +
(479.56)*children
+(23819.32)*smoker-(353.49)*region
Thus we can predict value of expenses by putting other known values in regression equation.