Question

In: Math

The file banking.txt attached to this assignment provides data acquired from banking and census records for...

The file banking.txt attached to this assignment provides data acquired from banking and census records
for different zip codes in the bank’s current market. Such information can be useful in targeting
advertising for new customers or for choosing locations for branch offices. The data show
- median age of the population (AGE)
- median income (INCOME) in $
- average bank balance (BALANCE) in $
- median years of education (EDUCATION)
Use r

Use R to fit a regression model to predict balance from age, education and income. Analyze the
model parameters. Which predictors have a significant effect on balance? Use the t-tests on the
parameters for alpha=.05. [2 pts = 1 pt R code + 1 pt answer]
f) If one of the predictors is not significant, remove it from the model and refit the new regression
model. Write the expression of the fitted regression model. [2 pts = 1 pt R code + 1 pt answer]
g) Interpret the value of the parameters for the variables in the model. [1 pt]
h) Report the value for the R2
coefficient and describe what it indicates. [1 pt]
i) According to census data, the population for a certain zip code area has median age equal to 34.8
years, median education equal to 12.5 years and median income equal to $42,401.
- Use the final model computed in point (f) to compute the predicted average balance for the zip
code area. [1 pt]
- If the observed average balance for the zip code area is $21,572, what’s the model prediction
error? [1 pt]
j) Conduct a global F-test for overall model adequacy. Write down the test hypotheses and test statistic
and discuss conclusions.

Age        Education            Income Balance

35.9        14.8        91033    38517

37.7        13.8        86748    40618

36.8        13.8        72245    35206

35.3        13.2        70639    33434

35.3        13.2        64879    28162

34.8        13.7        75591    36708

39.3        14.4        80615    38766

36.6        13.9        76507    34811

35.7        16.1        107935 41032

40.5        15.1        82557    41742

37.9        14.2        58294    29950

43.1        15.8        88041    51107

37.7        12.9        64597    34936

36           13.1        64894    32387

40.4        16.1        61091    32150

33.8        13.6        76771    37996

36.4        13.5        55609    24672

37.7        12.8        74091    37603

36.2        12.9        53713    26785

39.1        12.7        60262    32576

39.4        16.1        111548 56569

36.1        12.8        48600    26144

35.3        12.7        51419    24558

37.5        12.8        51182    23584

34.4        12.8        60753    26773

33.7        13.8        64601    27877

40.4        13.2        62164    28507

38.9        12.7        46607    27096

34.3        12.7        61446    28018

38.7        12.8        62024    31283

33.4        12.6        54986    24671

35           12.7        48182    25280

38.1        12.7        47388    24890

34.9        12.5        55273    26114

36.1        12.9        53892    27570

32.7        12.6        47923    20826

37.1        12.5        46176    23858

23.5        13.6        33088    20834

38           13.6        53890    26542

33.6        12.7        57390    27396

41.7        13           48439    31054

36.6        14.1        56803    29198

34.9        12.4        52392    24650

36.7        12.8        48631    23610

38.4        12.5        52500    29706

34.8        12.5        42401    21572

33.6        12.7        64792    32677

37           14.1        59842    29347

34.4        12.7        65625    29127

37.2        12.5        54044    27753

35.7        12.6        39707    21345

37.8        12.9        45286    28174

35.6        12.8        37784    19125

35.7        12.4        52284    29763

34.3        12.4        42944    22275

39.8        13.4        46036    27005

36.2        12.3        50357    24076

35.1        12.3        45521    23293

35.6        16.1        30418    16854

40.7        12.7        52500    28867

33.5        12.5        41795    21556

37.5        12.5        66667    31758

37.6        12.9        38596    17939

39.1        12.6        44286    22579

33.1        12.2        37287    19343

36.4        12.9        38184    21534

37.3        12.5        47119    22357

38.7        13.6        44520    25276

36.9        12.7        52838    23077

32.7        12.3        34688    20082

36.1        12.4        31770    15912

39.5        12.8        32994    21145

36.5        12.3        33891    18340

32.9        12.4        37813    19196

29.9        12.3        46528    21798

32.1        12.3        30319    13677

36.1        13.3        36492    20572

35.9        12.4        51818    26242

32.7        12.2        35625    17077

37.2        12.6        36789    20020

38.8        12.3        42750    25385

37.5        13           30412    20463

36.4        12.5        37083    21670

42.4        12.6        31563    15961

19.5        16.1        15395    5956

30.5        12.8        21433    11380

33.2        12.3        31250    18959

36.7        12.5        31344    16100

32.4        12.6        29733    14620

36.5        12.4        41607    22340

33.9        12.1        32813    26405

29.6        12.1        29375    13693

37.5        11.1        34896    20586

34           12.6        20578    14095

28.7        12.1        32574    14393

36.1        12.2        30589    16352

30.6        12.3        26565    17410

22.8        12.3        16590    10436

30.3        12.2        9354       9904

22           12           14115    9071

30.8        11.9        17992    10679

35.1        11           7741       6207

Solutions

Expert Solution

e) Following is the R code used for building regression model and its output

----------R command ---------

data <- read.table("banking.txt", header=T)
model <- lm(Balance~Age+Education+Income, data=data);
summary(model);

---------- Output ---------

Call:
lm(formula = Balance ~ Age + Education + Income, data = data)

Residuals:
Min 1Q Median 3Q Max
-7722.0 -1547.4 -56.1 1167.9 8480.2

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.540e+03 4.423e+03 -2.157 0.0335 *
Age 3.325e+02 7.234e+01 4.597 1.28e-05 ***
Education 2.887e+02 3.005e+02 0.960 0.3392
Income 3.871e-01 1.748e-02 22.137 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2458 on 98 degrees of freedom
Multiple R-squared: 0.9225, Adjusted R-squared: 0.9201
F-statistic: 388.8 on 3 and 98 DF, p-value: < 2.2e-16

-----------

Now, observing about output, the model can be written as -

Looking at the output from R, the parameters

-------------------------------------------------------------------

As is insignificant, we can remove the associated variable and build new model as -

----------R command ---------
model <- lm(Balance~Age+Income, data=data);
summary(model);

----------Output ---------

Call:
lm(formula = Balance ~ Age + Income, data = data)

Residuals:
Min 1Q Median 3Q Max
-7385.1 -1577.9 -119.2 1200.6 8362.9

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.912e+03 2.301e+03 -2.570 0.0117 *
Age 3.227e+02 7.159e+01 4.508 1.8e-05 ***
Income 3.966e-01 1.437e-02 27.600 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2457 on 99 degrees of freedom
Multiple R-squared: 0.9218, Adjusted R-squared: 0.9202
F-statistic: 583.2 on 2 and 99 DF, p-value: < 2.2e-16

-----

Now lets answer some raised questions -

  

For this part we have to use R-command predict :-

predict(model1, data.frame(Age=34.5, Education=12.5, Income=42401));
1
21950.85
>
> predict(model_final, data.frame(Age=34.5, Income=42401));
1
22038.4
>
>
> prediction_error = abs(21572 - 22038.4)
> print(prediction_error);
[1] 466.4
---------------------------

Hence, by using first model the predicted value of balance is 21950.85 while by using second (final) model the predicted value of balance is 22038.4.

If, the real balance of the given person known, the prediction error = predicted value - real value = 466.4. (for final model).


Related Solutions

ASSIGNMENT: Write a program and use the attached file (babynames.txt) as input file, and create two...
ASSIGNMENT: Write a program and use the attached file (babynames.txt) as input file, and create two output tiles. One file listing out all boys names, and the other file listing out all girls name. CODE: (teacher gave some of the code below use it to find the answer please String B is the boy names String E is girl names) import java.io.File; import java.io.FileNotFoundException; import java.io.PrintWriter; import java.util.Scanner; /** This program reads a file with numbers, and writes the numbers...
ASSIGNMENT: Write a program and use the attached file (babynames.txt) as input file, and create two...
ASSIGNMENT: Write a program and use the attached file (babynames.txt) as input file, and create two output tiles. One file listing out all boys names, and the other file listing out all girls name. CODE: (teacher gave some of the code below use it to find the answer please String B is the boy names String E is girl names) import java.io.File; import java.io.FileNotFoundException; import java.io.PrintWriter; import java.util.Scanner; /** This program reads a file with numbers, and writes the numbers...
The data in the attached excel file comes from Consumer Reports and was collected over a...
The data in the attached excel file comes from Consumer Reports and was collected over a two-year period. It gives the average mpg over a 195-mile trip, the weight of the vehicle (pounds), engine displacement (liters), number of cylinders, horsepower, type of transmission (0 = manual, 1 = automatic), the number of gears and whether the car was foreign (1) or domestic (0). Part a: Build a model for predicting the average mpg based on the data in the attached...
In Python. The file “essay.txt” attached to this assignment includes an essay. The essay includes a...
In Python. The file “essay.txt” attached to this assignment includes an essay. The essay includes a couple of sections that are separated by two consecutive newline characters (i.e. ‘\n’) that are shown as empty lines between the sections if you open the file in a text editor like Notepad. Each section starts with a title followed by a couple of paragraphs; the title and the paragraphs are separated by a newline character. Each paragraph includes a couple of sentences that...
In this assignment you will write a PHP program that reads from a data file, creates...
In this assignment you will write a PHP program that reads from a data file, creates an associative array, sorts the data by key and displays the information in an HTML table. Create a PHP file called hw4.php that will do the following: - Display your name. - Read text data from a file. The data file is hw3.txt. The file hw3.txt will hold the following text: PQRParrot, Quagga, Raccoon DEFDo statements, Else statements, For statements GHIGeese, Hippos, If statements...
Please also go over the Excel file attached to this assignment in order to familiarize yourself...
Please also go over the Excel file attached to this assignment in order to familiarize yourself with the different ways Excel can be used to solve Time Value of Money/Dividend Discount Model problems. There are three worksheets in the Excel file. This Excel file with examples is just that: a file to show you some examples of using Excel to solve TVM /DDM problems. Do not confuse this posted Excel file with the separate Excel file you need to create...
The file medinc.mtw contains data on the median incomes (medinc) of census dissemination areas in Toronto....
The file medinc.mtw contains data on the median incomes (medinc) of census dissemination areas in Toronto. (a) Treating this set of data as the population, use Minitab to calculate the population mean and the population standard deviation for the medinc variable. Set aside all population information until parts (d) and (e). (b) Use Minitab (Calc Menu – Random Data – Sample from Columns) to draw twenty samples of size n = 40 from the Toronto medinc population. This procedure must...
Data Set The data set (attached) is a modified CSV file on all International flight departing...
Data Set The data set (attached) is a modified CSV file on all International flight departing from US Airports between January and June 2019 reported by the US Department of Transportation (https://data.transportation.gov/Aviation/International_Report_Passengers/xgub-n9bw). Each record holds a route (origin to destination) operated by an airline. This CSV file was modified to keep it simple and relatively smaller. Here is a description of each column: Column 1 – Month (1 – January, 2 – February, 3 – March, 4 – April, 5...
You have an Excel file attached to this link. You are to first transpose the data...
You have an Excel file attached to this link. You are to first transpose the data so that, instead of being in the horizontal format, it will be converted to the vertical form. Then to run a Multiple Regression and see which of the independent variables are significant and whether the overall model is significant. You also should be able to comment on the goodness of the fit. All of these require that you have thoroughly watched the video lectures...
ASSIGNMENT: Enter the hypothetical data below in SPSS to use for the assignment.  The SPSS commands: 'file',...
ASSIGNMENT: Enter the hypothetical data below in SPSS to use for the assignment.  The SPSS commands: 'file', 'new', 'data' will create a spreadsheet in which to enter the data below (manually). Case Control Treatment 1                              5                              6 2                              4                              7 3                              5                              5              4                              4                              6 5                              5                              5 6                              6                              6 7                              5                              5 8                              4                              6 9                              5                              5 10                           5                              10 In this experiment, all participants rated the credibility of fake news stories on a scale of 1...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT