In: Statistics and Probability
Predicting High/Low user of Social Networking Sites among students
A study was conducted to identify the variables which distinguish between heavy/light users of social networking sites among students. A questionnaire was designed for the purpose. The social networking sites considered for the study were Facebook, Orkut, Linked-in, Twitter, etc. the online survey was conducted on a sample of 61 students in the age group of 20 to 30. The collected response data is attached herewith in excel sheet.
DATA Reference:
Resp. No. | X1 | X2 | X3A | X3B | X3C | X3D | X3E | X3F | X3G | X3H | X3I | X3J | X3K | X3L |
1 | 2 | 3 | 1 | 2 | 3 | 5 | 2 | 3 | 2 | 5 | 4 | 2 | 2 | 1 |
2 | 4 | 3 | 4 | 4 | 5 | 2 | 2 | 3 | 4 | 2 | 2 | 4 | 4 | 1 |
3 | 2 | 3 | 1 | 5 | 3 | 2 | 2 | 5 | 3 | 1 | 1 | 4 | 1 | 1 |
4 | 2 | 3 | 5 | 4 | 5 | 3 | 4 | 5 | 5 | 4 | 5 | 5 | 3 | 3 |
5 | 2 | 2 | 4 | 4 | 5 | 4 | 3 | 3 | 3 | 2 | 3 | 4 | 3 | 2 |
6 | 2 | 2 | 2 | 4 | 5 | 1 | 1 | 2 | 2 | 1 | 2 | 3 | 2 | 1 |
7 | 1 | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | 1 | 1 | 3 | 2 | 1 |
8 | 4 | 1 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 5 | 5 | 1 |
9 | 2 | 2 | 3 | 5 | 4 | 4 | 4 | 3 | 2 | 4 | 5 | 5 | 3 | 2 |
10 | 1 | 1 | 3 | 5 | 5 | 1 | 3 | 2 | 1 | 1 | 2 | 4 | 2 | 1 |
11 | 1 | 1 | 5 | 1 | 2 | 5 | 3 | 3 | 3 | 4 | 4 | 2 | 5 | 5 |
12 | 4 | 1 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 2 |
13 | 2 | 2 | 5 | 4 | 4 | 2 | 5 | 3 | 3 | 2 | 2 | 4 | 3 | 1 |
14 | 1 | 1 | 5 | 1 | 1 | 5 | 5 | 5 | 3 | 5 | 3 | 3 | 5 | 5 |
15 | 1 | 1 | 3 | 4 | 5 | 1 | 3 | 4 | 4 | 3 | 3 | 4 | 1 | 1 |
16 | 1 | 1 | 5 | 1 | 1 | 2 | 5 | 5 | 5 | 5 | 3 | 5 | 5 | 5 |
17 | 1 | 1 | 1 | 4 | 4 | 4 | 4 | 2 | 2 | 1 | 1 | 4 | 1 | 2 |
18 | 2 | 2 | 3 | 2 | 4 | 1 | 4 | 2 | 4 | 5 | 3 | 5 | 3 | 1 |
19 | 4 | 1 | 2 | 3 | 2 | 2 | 2 | 2 | 3 | 4 | 4 | 3 | 2 | 2 |
20 | 1 | 1 | 5 | 4 | 4 | 2 | 1 | 1 | 1 | 1 | 5 | 4 | 1 | 5 |
21 | 3 | 1 | 3 | 4 | 5 | 4 | 4 | 4 | 3 | 3 | 2 | 2 | 3 | 4 |
22 | 1 | 1 | 2 | 4 | 5 | 1 | 2 | 2 | 2 | 3 | 3 | 5 | 3 | 1 |
23 | 3 | 1 | 4 | 5 | 5 | 5 | 4 | 4 | 3 | 3 | 3 | 5 | 4 | 3 |
24 | 1 | 1 | 3 | 5 | 5 | 4 | 4 | 4 | 5 | 5 | 5 | 5 | 3 | 3 |
25 | 1 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 2 | 1 |
26 | 1 | 1 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 1 | 3 | 4 | 2 | 2 |
27 | 1 | 1 | 4 | 4 | 5 | 3 | 4 | 4 | 4 | 4 | 5 | 5 | 5 | 3 |
28 | 3 | 2 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 |
29 | 2 | 2 | 2 | 3 | 4 | 2 | 3 | 4 | 4 | 4 | 3 | 2 | 2 | 1 |
30 | 1 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 4 | 4 | 5 | 5 | 2 | 4 |
31 | 1 | 1 | 4 | 4 | 5 | 2 | 5 | 3 | 5 | 2 | 5 | 5 | 3 | 1 |
32 | 2 | 2 | 3 | 4 | 4 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
33 | 1 | 2 | 3 | 1 | 2 | 4 | 4 | 3 | 2 | 2 | 4 | 2 | 1 | 1 |
34 | 1 | 1 | 4 | 4 | 5 | 4 | 3 | 3 | 3 | 4 | 4 | 4 | 3 | 4 |
35 | 1 | 4 | 1 | 4 | 4 | 5 | 4 | 4 | 3 | 4 | 3 | 4 | 4 | 5 |
36 | 2 | 4 | 2 | 3 | 3 | 4 | 4 | 3 | 4 | 3 | 4 | 4 | 3 | 5 |
37 | 2 | 3 | 3 | 2 | 3 | 3 | 4 | 4 | 4 | 5 | 4 | 5 | 2 | 4 |
38 | 2 | 3 | 1 | 3 | 3 | 4 | 3 | 3 | 3 | 4 | 3 | 4 | 3 | 4 |
39 | 1 | 3 | 1 | 4 | 4 | 5 | 2 | 3 | 2 | 2 | 3 | 5 | 3 | 5 |
40 | 2 | 3 | 1 | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 5 | 4 | 4 |
41 | 2 | 3 | 2 | 3 | 4 | 3 | 5 | 5 | 4 | 4 | 5 | 4 | 3 | 3 |
42 | 1 | 3 | 4 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | 4 | 2 | 4 | 3 |
43 | 2 | 3 | 2 | 2 | 3 | 4 | 4 | 4 | 3 | 3 | 3 | 4 | 2 | 4 |
44 | 1 | 2 | 2 | 2 | 4 | 3 | 2 | 2 | 3 | 4 | 1 | 4 | 2 | 4 |
45 | 4 | 4 | 4 | 1 | 1 | 2 | 3 | 2 | 1 | 2 | 5 | 3 | 4 | 4 |
46 | 1 | 3 | 1 | 3 | 4 | 3 | 4 | 3 | 4 | 4 | 3 | 4 | 3 | 3 |
47 | 1 | 1 | 1 | 5 | 5 | 5 | 1 | 1 | 1 | 1 | 1 | 5 | 1 | 1 |
48 | 1 | 2 | 1 | 2 | 3 | 3 | 2 | 2 | 3 | 3 | 2 | 3 | 2 | 3 |
49 | 1 | 2 | 1 | 2 | 3 | 3 | 2 | 2 | 3 | 3 | 2 | 4 | 3 | 4 |
50 | 1 | 2 | 1 | 2 | 3 | 4 | 3 | 2 | 3 | 4 | 2 | 4 | 2 | 4 |
51 | 1 | 1 | 5 | 3 | 3 | 4 | 2 | 2 | 3 | 3 | 2 | 4 | 4 | 2 |
52 | 2 | 2 | 4 | 5 | 4 | 4 | 4 | 4 | 2 | 3 | 3 | 4 | 4 | 5 |
53 | 2 | 1 | 4 | 3 | 4 | 4 | 3 | 2 | 3 | 3 | 3 | 4 | 3 | 3 |
54 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
55 | 4 | 4 | 1 | 5 | 5 | 5 | 5 | 5 | 1 | 5 | 1 | 5 | 1 | 5 |
56 | 4 | 4 | 2 | 4 | 4 | 3 | 3 | 2 | 2 | 5 | 2 | 4 | 2 | 5 |
57 | 1 | 2 | 2 | 3 | 4 | 4 | 2 | 2 | 2 | 3 | 2 | 4 | 4 | 4 |
58 | 1 | 3 | 2 | 4 | 4 | 4 | 2 | 2 | 2 | 4 | 3 | 5 | 4 | 4 |
59 | 1 | 2 | 2 | 4 | 4 | 5 | 2 | 2 | 3 | 4 | 3 | 5 | 3 | 4 |
60 | 1 | 4 | 2 | 4 | 4 | 4 | 1 | 2 | 3 | 4 | 3 | 5 | 3 | 4 |
61 | 1 | 2 | 1 | 5 | 5 | 5 | 2 | 2 | 3 | 4 | 3 | 4 | 2 | 3 |
X1 | = | Time spent on week days | ||||||||||||
1 = | Less than 1 Hr | |||||||||||||
2 = | 1 to less than 3 hours | |||||||||||||
3 = | 3 to less than 5 hours | |||||||||||||
4 = | More than 5 hrs | |||||||||||||
X2 | = | Time spent on week ends | ||||||||||||
1 = | Less than 2 hrs | |||||||||||||
2 = | 2 to less than 4 hours | |||||||||||||
3 = | 4-6 hrs | |||||||||||||
4 = | More than 6 hrs | |||||||||||||
X3A | = | Linking with Professional | ||||||||||||
X3B | = | Messaging | ||||||||||||
X3C | = | Networking | ||||||||||||
X3D | = | Make new friends | ||||||||||||
X3E | = | Promote events | ||||||||||||
X3F | = | Blogging | ||||||||||||
X3G | = | News Updates | ||||||||||||
X3H | = | Games | ||||||||||||
X3I | = | Education | ||||||||||||
X3J | = | Photo sharing | ||||||||||||
X3K | = | Job seeking | ||||||||||||
X3L | = | Online dating | ||||||||||||
For questions X3A to X3L, 1 = least useful and 5 = extremely useful |
Questions:
Q1. Divide the sample into two groups-one that is using the social networking site for less than one hour on weekdays (low users) and the second which is using the social networking site for one or more hours (high users). Run a two-group Logistic regression analysis with high/low user as a categorical dependent variable and the variables X3A to X3L as predictor variables. To:
a. Compute the percentage of respondents that it is able to classify correctly
b. Determine the statistical significance of the logistic function
c. Identify which of the predictor variables are relatively better in classifying between the two groups.
d. Classify a new respondent into one the two groups by building a decision rule and a cut-off score.
e. Explain the odds ratios in the SPSS output
Q2. Divide the sample into two groups-one that is using the social networking site for less than four hours on weekends(low users) and the second which is using the social networking site for four or more hours on weekends(high users) and repeat the analysis as carried out in the first question.
I do not have SPSS, so I have used EXCEL to solve the question. First of all, one needs to install Real statistics resource pack as AddIn to Excel which has Logistic Regression solver.
a) Compute the percentage of respondents that it is able to classify correctly.
The result of the classification table inserted below shows us numbers of classifications classified correctly and incorrectly.
Classification Table | |||
Suc-Obs | Fail-Obs | SUM | |
Suc-Pred | 19 | 11 | 30 |
Fail-Pred | 10 | 21 | 31 |
SUM | 29 | 32 | 61 |
The table has values categorized as follows:
Classification Table | ||
Suc-Obs | Fail-Obs | |
Suc-Pred | TP | FP |
Fail-Pred | FN | TN |
True Positives (TP) = the number of cases which were correctly classified to be positive, i.e. were predicted to be a success and were actually observed to be a success
False Positives (FP) = the number of cases which were incorrectly classified as positive, i.e. were predicted to be a success but were actually observed to be a failure
True Negatives (TN) = the number of cases which were correctly classified to be negative, i.e. were predicted to be a failure and were actually observed to be a failure
False Negatives (FN) = the number of cases which were incorrectly classified as negative, i.e. were predicted to be a negative but were actually observed to be a success.
Percentage of respondents that it is able to classify correctly = (TP+TN/Total)*100 = ((19+21)/61)*100 = 65.37%
b. Determine the statistical significance of the logistic function
LL0 | -42.2082 |
LL1 | -35.9327 |
Chi-Sq | 12.55094 |
df | 12 |
p-value | 0.402506 |
alpha | 0.05 |
sig | no |
From the analysis, we get the significance test values as given in the table. The coefficient of the logistic regressions a = -42.2 and b = -35.9
The logistic regression model is given as
c. Identify which of the predictor variables are relatively better in classifying between the two groups.
coeff b | s.e. | Wald | p-value | exp(b) | lower | upper | |
Intercept | -0.14963 | 1.820771 | 0.006753 | 0.934505 | 0.861029 | ||
X3A | -0.0146 | 0.281978 | 0.002682 | 0.958695 | 0.985502 | 0.567071 | 1.712685 |
X3B | 0.559262 | 0.498285 | 1.259722 | 0.261704 | 1.749381 | 0.658784 | 4.645424 |
X3C | -0.1649 | 0.503224 | 0.107385 | 0.743141 | 0.847974 | 0.316255 | 2.27367 |
X3D | -0.29475 | 0.296051 | 0.991247 | 0.319438 | 0.744716 | 0.416861 | 1.330423 |
X3E | 0.146831 | 0.392836 | 0.139706 | 0.708574 | 1.158159 | 0.536272 | 2.501215 |
X3F | 0.604464 | 0.446656 | 1.831451 | 0.175956 | 1.830272 | 0.762643 | 4.39248 |
X3G | -0.36108 | 0.42638 | 0.717155 | 0.397079 | 0.696923 | 0.302169 | 1.607387 |
X3H | 0.461482 | 0.338309 | 1.86073 | 0.172541 | 1.586424 | 0.817429 | 3.078848 |
X3I | -0.07459 | 0.300851 | 0.061477 | 0.804177 | 0.92812 | 0.514659 | 1.673743 |
X3J | -0.59604 | 0.401764 | 2.200962 | 0.137925 | 0.550987 | 0.250703 | 1.210944 |
X3K | 0.221605 | 0.339166 | 0.426908 | 0.51351 | 1.248078 | 0.642012 | 2.426275 |
X3L | -0.31477 | 0.27987 | 1.264982 | 0.26071 | 0.729954 | 0.421765 | 1.263342 |
None of the predictor is significantly above to differntitae the high and low users.(p value in the table aobe)