In: Statistics and Probability
1.(0.5) IF WE ARE GIVEN THE POPULATION STANDARD DEVIATION AND n > 30 DO WE USE THE z-VALUES OR t-VALUES? We would use Z values
NOW SHOW WHAT YOU HAVE LEARNED SO FAR WITH DESCRIPTIVE AND INFERENTIAL STATISTICS.YOU ARE PROMOTING ONE I.T. EMPLOYEE AND HAVE TWO CANDIDATES THAT HAVE EACH TAKEN THE SAME 15 SECURITY EXAMS OVER THE PAST YEAR. YOU HAVE TWO FINALIST CANDIDATES WHO HAVE THE FOLLOWING SCORES. SO, WHICH ONE DO YOU PICK AND WHY? (EACH TEST HAD POSSIBLE SCORES RANGING FROM 0 TO 100)
We would consider candidate B due to the fact that he/she has a higher mean 83.13 vs candidate A who has a mean of 82.73 and a lower variance (26.12) vs A variance (280.07) that shows consistence
A |
40 |
89 |
90 |
91 |
92 |
89 |
44 |
84 |
85 |
92 |
89 |
90 |
86 |
88 |
92 |
B |
99 |
85 |
84 |
79 |
81 |
88 |
80 |
85 |
79 |
82 |
81 |
80 |
79 |
83 |
82 |
2.(0.5) DO YOU HAVE ANY QUESTION(S) FOR THESE TWO CANDIDATES RELATED TO THEIR SCORES? WHAT ARE THEY? AND, BASED ON THE ANSWERS TO THOSE QUESTIONS, WHAT MIGHT YOU DO REGARDING THESE DATA?
I might ask what was going on in candidate A school/personal life what caused them to get grades that were in the 40’s. There is more to a candidate than their quiz grades but from the numbers candidate B stands out as a stronger candidate based on test scores only. I would get to know each candidate in person well and look at the full picture.
3.(1) DO A SCATTER PLOT (RAW OR RANK-ORDERED: YOUR CHOICE) AND EXPLAIN HOW IT HELPS YOU MAKE A DECISION.
4.(1) DO A FREQUENCY (RELATIVE AND CUMULATIVE) TABLE AND EXPLAIN HOW IT HELPS YOU MAKE A DECISION.
5.(1) DETERMINE Q3 (75TH PERCENTILE) AND EXPLAIN HOW IT HELPS. ALSO, COULD ANY OTHER QUARTILE OR THE MODE OR ANY OTHER STATISTICS BE OF VALUE TO YOU IN MAKING YOUR DECISION? IF SO, CALCULATE THEM AND EXPLAIN HOW THEY HELP.
6.(2) CALCULATE A 95% CONFIDENCE INTERVAL (? = 5% SO ?/2 = 2.5%) FOR EACH CANDIDATE AND BASED ON THEIR “OVERLAP”, WHICH CANDIDATE LOOKS BETTER AND WHY? (ARE YOU GOING TO USE z-VALUES OR t-VALUES? WHY?)
7. (a 0.5) IF OUR CLASS WERE GRADED BASED ON A “NORMAL”, BELL-SHAPED DISTRIBUTION, WHAT THEORETICAL PERCENT AND HOW MANY ACTUAL STUDENTS OF THE 24 IN THIS CLASS SHOULD PASS (+OR – ONE SD). HOW MANY SHOULD GET A’s AND B”S, D’s AND F’s ?
(b 0.5) MAKING THIS MORE REALISTIC, IF THE AVERAGE POINTS WERE 65 WITH AN SD OF 5, AND YOU NEED TO BE IN THE TOP 10% TO PASS, HOW MANY POINTS DO YOU HAVE TO HAVE?
8. (1) THE MARGIN OF ERROR (BOUNDARY) FOR A PROPORTION: EBP = z * ? [(p’*q’) / n ]. IF WE KNOW THE EBP, AND THE PROBABILITIES OF EVENTS p’ AND q’ OCCURRING, WE CAN DETERMINE THE NECESSARY SAMPLE SIZE. LET’S SAY THAT THE EBP = 7% AND THAT p’ = 40% AND q’ = 60% (REMEMBER THAT p’ AND q’ MUST ADD UP TO 1.00 OR 100%)
CALCULATE THE NECESSARY SAMPLE SIZE (n) USING THESE NUMBERS. (HINT: CAN’T USE %’s, NEED TO CONVERT THEM TO DECIMAL FRACTIONS) ALSO, WHEN NO ALPHA VALUE IS GIVEN AS HERE, ASSUME ? =5% (BUT, DO WE USE ?/2 HERE SINCE IT’S A CI?)
1) we would use Z test for n>30 ,since for n>30 t distribution approches to normal distribution
Ho: mu1=mu2 vs H1: mu1 not equal to m1
Estimate for difference: -0.40
95% CI for difference: (-9.98, 9.18)
T-Test of difference = 0 (vs ?): T-Value = -0.09 P-Value = 0.931 DF = 16
Therefore as pvalue >0.05 hence we failed to reject Ho ,this implies that average score for both candidates are different
3) scatter plot
we can observe from scatter plot that there is no relation between scores of two candidates
4)
A | realtive frequency | cumulative | freq |
40 | 0.066666667 | 0.066666667 | 1 |
44 | 0.066666667 | 0.133333333 | 1 |
84 | 0.066666667 | 0.2 | 1 |
85 | 0.066666667 | 0.266666667 | 1 |
86 | 0.066666667 | 0.333333333 | 1 |
88 | 0.066666667 | 0.4 | 1 |
89 | 0.2 | 0.6 | 3 |
90 | 0.133333333 | 0.733333333 | 2 |
91 | 0.066666667 | 0.8 | 1 |
92 | 0.2 | 1 |
3 |
B | realtive frequency | cumulative | freq |
79 | 0.2 | 0.2 | 3 |
80 | 0.133333 | 0.333333 | 2 |
81 | 0.133333 | 0.466667 | 2 |
82 | 0.133333 | 0.6 | 2 |
83 | 0.066667 | 0.666667 | 1 |
84 | 0.066667 | 0.733333 | 1 |
85 | 0.133333 | 0.866667 | 2 |
88 | 0.066667 | 0.933333 | 1 |
99 | 0.066667 | 1 | 1 |
therefore using these cumulative frequency we can estimate percentage of observation below /above particular observation.
5)
Variable | Mean | StDev | Minimum | Q1 | Median | Q3 | Maximum | IQR |
C1 | 82.73 | 16.74 | 40 | 85 | 89 | 91 | 92 | 6 |
C2 | 83.13 | 5.11 | 79 | 80 | 82 | 85 | 99 | 5 |
6) Confidence interval
Variable | N | Mean | StDev | SE | Mean | 95% | CI |
C1 | 15 | 82.73 | 16.74 | 4.32 | (73.47, | 92.00) | |
C2 | 15 | 83.13 | 5.11 | 1.32 | (80.30, | 85.96) |