Question

In: Statistics and Probability

Using the data, fit an appropriate regression model to determine whether time spent studying (hours) is...

Using the data, fit an appropriate regression model to determine whether
time spent studying (hours) is a useful predictor of the chance of passing the exam (result, 0=fail 1=pass). Formally assess
the overall fit of the model.

DATA three;
INPUT result hours;
/* result=0 is fail; result=1 is pass */
cards;
0 0.8
0 1.6
0 1.4
1 2.3
1 1.4
1 3.2
0 0.3
1 1.7
0 1.8
1 2.7
0 0.6
0 1.1
1 2.1
1 2.8
1 3.4
1 3.6
0 1.7
1 0.9
1 2.2
1 3.1
0 1.4
1 1.9
0 0.4
0 1.6
1 2.5
1 3.2
1 1.7
1 1.9
0 2.2
0 1.3
1 1.5
;
run;

Solutions

Expert Solution

Question

Solution :-

We can run a logistic regression to answer this question. Following is the python code that you can use to find the regression result.

import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt
plt.rc("font", size=14)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)

Pass_or_Fail=[0,0,0,1,1,1,0,1,0,1,0,0,1,1,1,1,0,1,1,1,0,1,0,0,1,1,1,1,0,0,1]

Hours=[0.8,1.6,1.4,2.3,1.4,3.2,0.3,1.7,1.8,2.7,0.6,1.1,2.1,2.8,3.4,3.6,1.7,0.9,2.2,3.1,1.4,1.9,0.4,1.6,2.5,3.2,1.7,1.9,2.2,1.3,1.5]
import statsmodels.api as sm
logit_model=sm.Logit(Pass_or_Fail,Hours)
result=logit_model.fit()
print(result.summary2())

This code gave me the following result-

  

Optimization terminated successfully. Current function value: 0.607547 Iterations 5 Results: Logit ============================================================== Model: Logit Pseudo R-squared: 0.107 Dependent Variable: y AIC: 39.6679 Date: 2020-04-27 17:29 BIC: 41.1019 No. Observations: 31 Log-Likelihood: -18.834 Df Model: 0 LL-Null: -21.083 Df Residuals: 30 LLR p-value: nan Converged: 1.0000 Scale: 1.0000 No. Iterations: 5.0000 ----------------------------------------------------------------- Coef. Std.Err. z P>|z| [0.025 0.975] ----------------------------------------------------------------- x1 0.4313 0.2017 2.1390 0.0324 0.0361 0.8266 ==============================================================

We should not look at R-square in this kind of regression. But what we can look at is the improvement in odds ratio. We can say that if we increase the hours of study by 1 then odds of passing the exam improves by 0.4313. This number is significant at 5% level. That means that hours of study is an important varibale in explaining the pass or fail result of the exam.

  


Related Solutions

A multiple regression model is to be constructed to model the time spent using the internet...
A multiple regression model is to be constructed to model the time spent using the internet per week among internet users. The explanatory variables are age, hours spent working per week and annual income. Data has been collected on 30 randomly selected individuals: Time using internet (minutes) Age Hours working per week Annual income ('000) 140 56 39 28 257 35 31 79 163 35 35 34 115 33 52 27 182 45 36 37 214 51 57 80 187...
Fit a multiple regression model that relates the salary to education, work experience, and time spent...
Fit a multiple regression model that relates the salary to education, work experience, and time spent at the bank so far. a - State what your model is. b - Determine whether the independent variables are significant, or not, at a level of significance of 5%. c - Which independent variable is most significant in explaining salary? Which is least significant? d - Is your overall model significant? Provide statistical proof by conducting an F-test for overall fit of the...
Fit a multiple regression model that relates the salary to education, work experience, and time spent...
Fit a multiple regression model that relates the salary to education, work experience, and time spent at the bank so far. a - State what your model is. b - Determine whether the independent variables are significant, or not, at a level of significance of 5%. c - Which independent variable is most significant in explaining salary? Which is least significant? d - Is your overall model significant? Provide statistical proof by conducting an F-test for overall fit of the...
Correlation and Simple Linear Regression Analysis Determine the following using data about "hours studying per weekend"...
Correlation and Simple Linear Regression Analysis Determine the following using data about "hours studying per weekend" and "grade point average" below. a) scatter diagram with hours the independent variable b) coefficient of correlation c) coefficient of determination d) coefficient of nondetermination e) average error for predicting grades f) .01 level of significance test for slope being 0 g) regression equation h) expected grades for people who study 5 hours Hours Grades 3 3.0 2 2.0 6 3.8 3 2.6 4...
A study was conducted to determine if the number of hours spent studying for an exam...
A study was conducted to determine if the number of hours spent studying for an exam is associated with the exam score. The following are the observations obtained from a random sample of 5 students: Number of Hours Spent Studying (X) 8 2 6 4 2 Exam Score (Y) 98 74 87 82 72 The average number of hours spent studying for an exam is _____. Part A: A) 82.6 B) 22 C) 4.4 D) 413 The average exam score...
3. Fit a multiple regression model that relates the salary to education, work experience, and time spent at the bank so far.
SALARY EDUC EXPER TIME 39000 12 0 1 40200 10 44 7 42900 12 5 30 43800 8 6 7 43800 8 8 6 43800 12 0 7 43800 12 0 10 43800 12 5 6 44400 15 75 2 45000 8 52 3 45000 12 8 19 46200 12 52 3 48000 8 70 20 48000 12 6 23 48000 12 11 12 48000 12 11 17 48000 12 63 22 48000 12 144 24 48000 12 163 12...
Use the appropriate test to determine whether X1 can be dropped from the regression model given...
Use the appropriate test to determine whether X1 can be dropped from the regression model given that X2 is retained. Use level of significance 0.05. Find the value of appropriate test statistic, the critical vale and the P-value. Plese show me how to use R to solve this. 0.1537; 4.6679; 0.7014 0.1537; 4.3245; 0.6523 1.537; 4.6932; 0.4128 2.5632; 4.6679; 0.3128 X1   X2   Y 190   130   35 176   174   81.7 205   134   42.5 210   191   98.3 230   165   52.7 192   194  ...
Each of the distributions below could be used to model the time spent studying for an...
Each of the distributions below could be used to model the time spent studying for an exam. Take 1,000 random samples of size 25 from each of the distributions below. In each case (a,b,c), plot the empirical distribution of the sample mean, estimate the mean of the sample mean, and estimate the standard deviation of the sample mean. Compare the results to the theoretical results. a. N(5, 1.52) b. Unif(0,10) c. Gamma(5,1)
Each of the distributions below could be used to model the time spent studying for an...
Each of the distributions below could be used to model the time spent studying for an exam. Take one random sample of size 25 from each of the distributions below. Then, take 1,000 resamples (i.e., sample with replacement) of size 25 from your sample. In each case (a,b,c), plot the empirical distribution of the sample mean, estimate the mean of the sample mean, and estimate the standard deviation of the sample mean. Compare the results to the theoretical results. a....
Using the GDP Data do the following: Generate the best fit model (regression) Generate the specific...
Using the GDP Data do the following: Generate the best fit model (regression) Generate the specific regression form Explain any dummy variables created Explain any time variables created Discuss the significance of all variables Generate and discuss the residual plot GDP C I G 822.2 625.7 93.6 110.1 751.5 592.3 62.5 121.3 703.6 574.3 39.2 126.6 611.8 523.0 11.8 122.4 603.3 511.0 17.5 118.0 668.3 546.9 31.6 133.0 728.3 580.6 58.4 137.0 822.5 639.6 74.9 158.9 865.8 663.5 93.6 153.2...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT