In: Statistics and Probability
Introduction:
Find a list of at least five related numbers. Compute statistics about the data, and give your interpretation.
Prompt:
Analyze the data you have gathered.
Solution-
The data used were collected on 200 high schools students and are scores on various tests, including science, math, reading and social studies (socst). The variable female is a dichotomous variable coded 1 if the student was female and 0 if male.
In the syntax below
You will find that the examine command always produces a lot of output. This can be very helpful if you know what you are looking for, but can be overwhelming if you are not used to it. If you need just a few numbers, you may want to use the descriptives command. Each as shown below.Using spss software
c. Minimum – This is the minimum, or smallest, value of the variable.
d. Maximum – This is the maximum, or largest, value of the variable.
e. Mean – This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values.
a. Statistic – These are the descriptive statistics.
b. Std. Error – These are the standard errors for the descriptive statistics. The standard error gives some idea about the variability possible in the statistic.
c. Mean – This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values.
d. 95% Confidence Interval for Mean Lower Bound – This is the lower (95%) confidence limit for the mean. If we repeatedly drew samples of 200 students’ writing test scores and calculated the mean for each sample, we would expect that 95% of them would fall between the lower and the upper 95% confidence limits. This gives you some idea about the variability of the estimate of the true population mean.
e. 95% Confidence Interval for Mean Upper Bound – This is the upper (95%) confidence limit for the mean.
f. 5% Trimmed Mean – This is the mean that would be obtained if the lower and upper 5% of values of the variable were deleted. If the value of the 5% trimmed mean is very different from the mean, this indicates that there are some outliers. However, you cannot assume that all outliers have been removed from the trimmed mean.
g. Median – This is the median. The median splits the distribution such that half of all values are above this value, and half are below.
h. Variance – The variance is a measure of variability. It is the sum of the squared distances of data value from the mean divided by the variance divisor. The Corrected SS is the sum of squared distances of data value from the mean. Therefore, the variance is the corrected SS divided by N-1. We don’t generally use variance as an index of spread because it is in squared units. Instead, we use standard deviation.
i. St. Deviation – Standard deviation is the square root of the variance. It measures the spread of a set of observations. The larger the standard deviation is, the more spread out the observations are.
j. Minimum – This is the minimum, or smallest, value of the variable
Maximum – This is the maximum, or largest, value of the variable.
Range – The range is a measure of the spread of a variable. It is equal to the difference between the largest and the smallest observations. It is easy to compute and easy to understand. However, it is very insensitive to variability.
m. Interquartile Range – The interquartile range is the difference between the upper and the lower quartiles. It measures the spread of a data set. It is robust to extreme observations.