In: Statistics and Probability
An article was written that on average only 35% of people read, and that the number of books that are read are correlated to the reader’s age. A simple random survey was conducted on 34 people. The binomial question being “Do you read books?”. The 2 quantitative question being: “How many books have you read in the last 2 years?” and “What is your age”. The results are below: 18 people stated they do not read, 0 books per year. Their ages are: 23,24,27,28,28,28,30,32,32,36,38,38,42,45,48,50,51,52
16 people state they read. Listed is their age followed by the number of books read:
Age: # of Books:
17 5
30 6
31 10
34 12
35 24
36 10
36 12
45 15
47 14
54 6
54 7
60 20
61 18
62 24
63 6
65 20
a. For your first “binomial” question, perform a hypothesis test
using your
prediction and your sample data. Choose the correct type of
distribution,
determine if your test should be 1-or 2-tailed, and pick a
reasonable significance
level. Also, construct a confidence interval for the true % using
your level of
significance.
b. For your two quantitative questions, perform a linear regression
test on the two
variables to (1) find the equation of the regression line and (2)
find the correlation
coefficient. Graph, either with software or by hand, your data and
the regression
line and label appropriately.
a.
*** Choosing Hypothesis ***
In this case we want to test if there is any relation between age of the person and number of books read in last 2 years.
So in other words, if there is a any correlation between the two.
So we want to test if the population correlation coefficient () is 0 or not
Mathematically we can write the Null and alternate Hypothesis as follows
against the alternate hypothesis
*** Type of Test ***
It is a two tailed test and we will perform the test with 95% confidence level
*** Distribution type ***
The test statistic in this case is defined as
where r is sample correlation coeffecient and n is the sample size
We arrange the data including 0 books read in the following table and calculated the sample correlation co-efficient (using excel)
Age | Books |
17 | 5 |
30 | 6 |
31 | 10 |
34 | 12 |
35 | 24 |
36 | 10 |
36 | 12 |
45 | 15 |
47 | 14 |
54 | 6 |
54 | 7 |
60 | 20 |
61 | 18 |
62 | 24 |
63 | 6 |
65 | 20 |
23 | 0 |
24 | 0 |
27 | 0 |
28 | 0 |
28 | 0 |
28 | 0 |
30 | 0 |
32 | 0 |
32 | 0 |
36 | 0 |
38 | 0 |
38 | 0 |
42 | 0 |
45 | 0 |
48 | 0 |
50 | 0 |
51 | 0 |
52 | 0 |
Age | Books | |
Age | 1 | |
Books | 0.474481 | 1 |
So, r = 0.4745
The t statistic as given above is calculated as
t calculated at 95% confidence level and 32 df is 2.032
Since the t Observed > t Critical, we reject the Null Hypothesis and conclude that there is a relationship exists between age of the reader and number of books read in last 2 years
*** Calculation for Confidence Interval ***
To calculate the confidence interval, first we have to use Fisher Transformation
The confidence interval with 95% confidence is given by
Or in other words, the confidence interval with 95% confidence interval is
0.3363 and 0.6955