In: Statistics and Probability
Gender | Age | Ethnicity | Marital | Qualification | PostSchool | Hours | Income |
Male | 23 | European | Never | Vocational | Yes | 70 | 884 |
Female | 42 | Other | Married | Vocational | Yes | 27 | 525 |
Female | 22 | European | Never | School | No | 15 | 309 |
Male | 40 | Maori | Previously | Vocational | Yes | 39 | 517 |
Female | 22 | Pacific | Never | School | No | 8 | 86 |
Female | 18 | European | Never | School | No | 17 | 255 |
Male | 24 | European | Never | Degree | Yes | 40 | 860 |
Female | 32 | European | Married | None | No | 10 | 211 |
Male | 35 | European | Married | School | No | 70 | 1131 |
Female | 34 | European | Other | None | No | 25 | 386 |
Female | 45 | European | Married | School | No | 16 | 299 |
Female | 30 | Maori | Never | School | No | 40 | 819 |
Male | 35 | European | Previously | Degree | Yes | 45 | 934 |
Female | 33 | European | Never | Vocational | Yes | 8 | 299 |
Male | 45 | European | Married | Degree | Yes | 50 | 1614 |
Female | 39 | European | Other | Degree | Yes | 55 | 1152 |
Male | 42 | European | Previously | Degree | Yes | 54 | 856 |
Male | 33 | European | Previously | Degree | Yes | 60 | 548 |
Female | 43 | European | Previously | None | No | 25 | 266 |
Please help me complete the following tasks with step-by-step explanations:
Create a Frequency (Pivot) Table of the Qualification and Gender variables. Compare the modal Qualification for each Gender.
Draw a suitable graph of the Ethnicity variable, and comment on what it shows
Draw boxplots of Hours Worked by Qualification. Ensure the ordinal nature of Qualification is reflected in the graph. Use your graph to compare the hours worked for the four groups, i.e. explain what the graph shows.
Calculate the mean and standard deviation of the Hours data, and the 90th percentile. What does the latter number describe about the hours worked?
Calculate the mean and standard deviation of the Income variable for males and females separately. (Hint: consider using the Sort functionality) Draw boxplots of the Income variable by Gender. Do the means and standard deviations agree with the information shown by the boxplots? Explain.
Draw a histogram of the Income variable. Summarise the sample distribution. If the histogram is bimodal, can you explain the source of this?
gender(with qualification) | Count of gender |
female | 11 |
degree | 1 |
none | 3 |
school | 5 |
vocational | 2 |
male | 8 |
degree | 5 |
school | 1 |
vocational | 2 |
Grand Total | 19 |
as we can see that modal qualification for female is school whereas for male is degree
it shows that european ethnicity is the most among all the sample data points. It tells us that european are the majority and more in numbers.
The box plot shows us that the central tendancy of degree in basis of hours worked is more than all the other four qualifications whereas the spread or dispersion of vocational is most than the rest of the three. School has the lowest median among the others. This tells us that degree has the most consistent work hours among the peers and also the most work hours.
The mean of hours=total/19=35.47
standard deviation of hours==20.62
for the 90th percentile we sort the hours data by ascending order first
we get
8 |
8 |
10 |
15 |
16 |
17 |
25 |
25 |
27 |
39 |
40 |
40 |
45 |
50 |
54 |
55 |
60 |
70 |
70 |
90th percentile=90/100 *(N+1) th value of the sorted data= 18th value of the sorted data= 70
The 90th percentile is the value of the hours for which 90% of the data is below/lesser than it
Variable gender Mean
StDev
income female 418.8 309.0
male 918 346
yes the box plots of the male and female incomes agree with the data as we can see that the median(central tendancy) is around the same as shown in the data. Also the spread of the male is higher than the female as shown in the boxplot as the whiskers are more for the male than the female which indicates a higher standard deviation.
income | Count of income |
86-235 | 2 |
236-385 | 5 |
386-535 | 3 |
536-685 | 1 |
686-835 | 1 |
836-985 | 4 |
986-1135 | 1 |
1136-1285 | 1 |
1586-1735 | 1 |
we can see that it is a bimodal distribution with two peaks at 236-385 range and 836-985 range
The bimodal might be due to the the presence of two high frequency of two different income groups where the majority of the people lie. It indicates two different groups with local maximum and minumum.