In: Statistics and Probability
Part 1: Combinations and Permutations: Winning the Lottery
To win the Powerball jackpot you need to choose the correct five
numbers from the integers 1 - 69 as well as pick the correct
Powerball which is one number picked from the integers 1 - 26. The
order in which you pick the numbers is not relevant. You just need
to pick the correct five numbers in any order and the correct
Powerball.
Because there is only one correct set of five numbers and one
correct Powerball, the probability of winning the jackpot would be
calculated as:
To calculate the “# of ways of choosing the numbers” we use
combinations.
The expression for combinations is nCk, where n is the number of
items available to be chosen from and k is the number of items
chosen.
For the portion of Powerball where 5 numbers are chosen from 1 –
69, n=69 and k=5. The number of ways to choose five numbers from
the integers 1 - 69 is calculated as:
The symbol ! is called “factorial.” The Factorial of a Natural
Number is the product of the number and all natural numbers below
it.
For instance, 4! = 4321 = 24.
So 69C5 can be simplified as:
69C5 =
69! 5!(69−5)!
=
69∙68∙67∙66∙65∙64! 5!64!
=
69∙68∙67∙66∙65 5!
=
69∙68∙67∙66∙65 5∙4∙3∙2∙1
= 11,238,513
This means there are 11,238,513 different ways to choose 5 numbers
from the integers 1 – 69 when the order of the numbers does not
matter. If there was a lottery game based on choosing 5 numbers
from 1 – 69, the probability of winning would be
P=1/11,238,513.
Since this is the Powerball we also need to choose one number
between the integers 1 - 26. We know there are 26 ways to choose
one number out of 26 numbers but we can use combinations again if
we wanted to. That calculation would be:
26C1 =
26! 1!(26−1)!
=
26∙25! 1!25!
= 26
3
Since choosing the five numbers is independent of choosing the
Powerball, the total number of ways of choosing the numbers
is:
Therefore, the probability of winning the Powerball jackpot is P
=
1 292,201,338 .
Part 1 Exercises: Use the above example as a guide to answer the
following questions. You must explain how the probabilities are
calculated to earn full credit. Express your answers as fractions
and use commas in large numbers.
1. Use the link to find information about the Mega Millions game:
http://www.flalottery.com/ Once on the website, click on “Mega
Millions”, and then on “How to Play and How to Win” to get the
information needed to calculate the probability of winning the Mega
Millions jackpot. Find the probability of winning the Mega Millions
jackpot using the same manner as the powerball example.
2. If a lottery game was based on choosing 5 numbers from the
integers 1 – 52, what is the probability of choosing the numbers
for the winning jackpot?
3. If a lottery game was based on choosing 6 numbers from the
integers 1 – 49 with another number chosen from the integers 1 –
22, what is the probability of choosing the numbers for the winning
jackpot?
Part 2: Descriptive Statistics
Example: A call center monitored the length of time needed to
resolve a customer’s service issue. For 8 customers, the time in
minutes to resolve each issue is listed in the following data set.
18, 24, 45, 21, 22, 31, 16, 21
The manager of the call center wants to know the mean, median,
mode, and standard deviation of the data set.
The mean is calculated with the formula: ?̅ =
Σ? ?
The sample mean is ?̅ . x = data value n = sample size = number of
data values
There are 8 data values, so n=8.
4
The mean is: ?̅ =
18+24+45+21+22+31+16+21 8
=
198 8
= 24.75 Because the data values are integers, the mean would be
rounded to one more decimal place, giving a mean of ?̅ = 24.8. If
the mean is used for further calculations (such as for standard
deviation, shown later in this document), it should not be rounded
during those calculations. The rounding is done after the
calculations (the standard deviation will be rounded).
The median of a data set, denoted as ? ̃, is the “middle” value of
the data set. The first step in finding the median is sorting the
data set in ascending order (from smallest to largest), as shown
below.
16, 18, 21, 21, 22, 24, 31, 45
If there is an odd number of data values, ? ̃ is the actual middle
value of the data set. Since this data set consists of an even
number of data values the median is the average of the two middle
values. There are 8 data values (n=8) and n/2 = 4. Counting in 4
values from the left gives a data value of 21. Counting in 4 values
from the right gives a data value of 22.
The median is: ? ̃ =
21+22 2
= 21.5
The mode of a data set is the value that occurs most often. For
this data set the mode is 21 because it occurs twice. It is
possible for a data set to have more than one mode. If the data
value 16 occurred twice in the data set, the two modes would be 21
and 16. It is also possible for a data set to not have a mode. If
every data value occurs only one time, the data set would not have
a mode.
The standard deviation of a data set (a sample), denoted as s,
describes how much the data set “varies” or how it is “dispersed”
around the mean. The formula for standard deviation is shown below.
The formula for standard deviation is: ? = √Σ(?−?̅)2 ?−1
To calculate standard deviation, it is sometimes convenient to use
a table, as shown below, to keep track of the value for (? − ?)2.
The first step is to find the difference between the data value and
the mean, which was found above, ?̅ = 24.75 . This is shown in the
second column of the table. x (data value) (? − ?̅) (? − ?̅)2 18
18-24.75= -6.75 24 24-24.75= -0.75 45 45-24.75= 20.25 21 21-24.75=
-3.75 22 22-24.75= -2.75 31 31-24.75= 6.25 16 16-24.75= -8.75 21
21-24.75 = -3.75
5
The next step is to square the difference between the data value
and the mean. This can be done immediately after finding the
difference when using a calculator by just pressing the squaring
button. This is shown in the third column of the table. Note that
these values are not rounded.
x (data value) (? − ?̅) (? − ?̅)2 18 18-24.75= -6.75
(-6.75)2=45.5625 24 24-24.75= -0.75 (-0.75)2=0.5625 45 45-24.75=
20.25 (20.25)2=410.0625 21 21-24.75= -3.75 (-3.75)2=14.0625 22
22-24.75= -2.75 (-2.75)2=7.5625 31 31-24.75= 6.25 (6.25)2=39.0625
16 16-24.75= -8.75 (-8.75)2=76.5625 21 21-24.75 = -3.75
(-3.75)2=14.0625
The next step is to add the squared values, giving the amount that
will be in the numerator of the formula. Adding the values in the
third column gives Σ(? − ?̅)2 = 607.5.
Because n=8, the calculation is now: ? = √607.5 8−1 = √607.5 7 =
√86.7857… = 9.3158… ≈ 9.3
Note that no rounding is done until the final result, giving a
standard deviation of s=9.3.
Part 2 Exercises: For each of the data sets provided below,
calculate the mean, median, mode, and standard deviation. Round the
answers to one decimal place and show all work.
1. Data Set 1: 26, 31, 10, 37, 38 a. mean = ? b. median = ? c. mode
= ? d. standard deviation = ?
2. Data Set 2: 48, 42, 23, 26, 50, 39, 55, 62, 50, 42 a. mean = ?
b. median = ? c. mode = ? d. standard deviation = ?
Part 3: Creating Frequency Distributions and Histograms
Example: It is often quite useful to represent data graphically.
One popular way to represent data is the histogram. The following
set of 16 values can be organized into a frequency distribution
(table) and then the frequency distribution is used to create a
histogram.
18, 16, 14, 12, 13, 19, 17, 14, 10, 9, 11, 14, 13, 19, 10, 16
The sorted data set is: 9, 10, 10, 11, 12, 13, 13, 14, 14, 14, 16,
16, 17, 18, 19, 19
To create a histogram from this set of data we start by
constructing a frequency distribution or table. A frequency
distribution shows how data is split up into categories or classes
by listing
6
the classes along with the number of data values in each class. The
first step in creating a frequency distribution is deciding how
many classes we wish to use. For this data set let’s use four
classes. Once that decision has been made there is a step-by-step
process we can follow.
Step 1: To calculate the class width, find the following
value.
????ℎ =
??????? ?????−??????? ????? ?????? ?? ???????
If this value is a decimal, round up to the nearest integer
(illustrated in this example). If this value is an integer, you may
have to add one in order to have enough classes to accommodate the
data.
For the data set listed above, the width is:
width =
19−9 4
= 2.5
This value is rounded up to give a width of 3.
Step 2: Choose a value for the first lower class limit. Typically,
this is the minimum value but it could also be a conveniently
chosen value.
For the data set listed above, the first lower class limit will be
9.
Step 3: Use the first lower class limit and class width to list the
other lower class limits. Add the class width to each lower class
limit to determine the next lower class limit.
First lower class limit = 9 Second lower class limit = 9+3 = 12
Third lower class limit = 12+3 = 15 Fourth lower class limit = 15+3
= 18
This creates the start of a frequency distribution table as shown
below:
Step 4: Determine the upper class limits. The upper class limit in
the FIRST class will be the value just below the lower class limit
in the SECOND class. Then add the width to get the remaining upper
class limits.
First upper class limit = 11, because 11 is the value right before
12 Second upper class limit = 11+3 = 14 Third upper class limit =
14+3 = 17 Fourth upper class limit = 17+3 = 20
Class Frequency 9 - 12 - 15 - 18 -
7
Now the classes in the table are complete, as shown below.
Step 5: Use the sorted list to determine how many values belong in
each class and enter the value into the Frequency column. Be sure
to check that all values are included by adding the frequencies to
get the sample size.
9, 10, 10, 11, 12, 13, 13, 14, 14, 14, 16, 16, 17, 18, 19, 19
The first class has a frequency of 4, because 9, 10, 10, 11 are in
the first class. The second class has a frequency of 6, because 12,
13, 13, 14, 14, 14 are in the second class. The third class has a
frequency of 3, because 16, 16, 17 are in the third class. The
fourth class has a frequency of 3, because 18, 19, 19 are in the
fourth class.
For the frequency distribution above, the sum of the frequencies is
equal to the sample size: 4 + 6 + 3 + 3 = 16
Step 6: Follow the directions from the link on how to “Create a
Histogram From a Frequency Table” (included below) to create a
histogram similar to the one shown here.
0
1
2
3
4
5
6
7
9 - 11 12 - 14 15 - 17 18 - 20
Frequency
Data Values
Histogram
Class Frequency 9 - 11 12 - 14 15 - 17 18 - 20
Class Frequency 9 - 11 4 12 - 14 6 15 - 17 3 18 - 20 3
8
The following links are for videos demonstrating how to use
Excel.
Using Excel for Data and Tables Create a Histogram from a Frequency
Table
Part 3 Exercises:
Use the following data set to complete the problems.
15, 26, 31, 10, 37, 38, 35, 30, 24, 21, 26, 27, 24, 32, 32, 19, 34,
32, 24, 39, 30
1. List the data values in order from lowest to highest. 2. Find n,
the number of data values. 3. Find the width for a frequency
distribution with 6 classes of equal width. 4. Create a frequency
distribution table. 5. Show that the frequencies in the table add
up to the sample size. 6. Create a histogram.