In: Statistics and Probability
Find a random variable in your day-to-day life, call it X(ω), and do the following:
• Describe X as either quantitative, qualitative, discrete, continuous, etc.
• Give the support of X (i.e. its possible range of values)
• Speculate on its distribution. Is it normal, geometric, exponential, etc. Give specific reasons and justification for this speculation!
• Sample this random variable at least 5 times.
• Use this sample to estimate its parameters.
• Give the newly parameterized distribution explicitly.
A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon.
Suppose I am interested in looking at statistics test scores from a certain college from a sample of 100 students. Well, the random variable would be the test scores, which could range from 0% (didn't study at all) to 100% (excellent student). However, since test scores vary quite a bit and they may even have decimal places in their scores, I can't possibly denote all the test scores using discrete numbers. So in this case, I use intervals of scores to denote the various values of my random variable.
Let's look at a hypothetical table of the random variable X and the number of people who scored in those different intervals:
Test Scores | Frequency(% of students) |
---|---|
0 to <20% | 5 |
20% to <40% | 20 |
40% to <60% | 30 |
60% to <80% | 35 |
80% to 100% | 10 |
When we have to use intervals for our random variable or all values in an interval are possible, we call it a continuous random variable.
Since I know there are one hundred students in all, I could also have a column with relative frequency or percentage of students that scored in the different intervals. We calculate this by dividing each frequency by the total (in this case, 100). We then either leave the answer as a decimal or convert it to a percentage. Thus, like the coin example, the random variable (in this case, the intervals) would have certain probabilities or percentages associated with it. And this would be a probability distribution for the test scores.
Test Scores | Relative Frequency |
---|---|
0 to <20% | 5% |
20% to <40% | 20% |
40% to <60% | 30% |
60% to <80% | 35% |
80% to 100% | 10% |
Perhaps you noticed above that in the above table the sum of all probabilities added up to 1 or 100%. However, for continuous random variables, we can construct a histogram of the table with relative frequencies, and the area under the histogram is also equal to 1.