In: Statistics and Probability
Statistics can be taught using any of these culture theme. To pick one I would pick movie and show how statistical studies can be illustrated.
Suppose a Movie dataset contains weekend and daily per theater box office receipt data as well as total gross receipts of the country for a set of say, 49 movies. Dates are provided for all time series values. The diverse list of movies was selected, not at random, but to spark student interest and to provide a range of box office values. The values provide a rich dataset to use for applications such as simple graphical analysis, a variety of time series and causal forecasting models, curve-fitting, and rate of change analysis.
I have chosen a dataset that is already available online.A comparison chart can be demonstrated as shown
Make three files contain the raw data. The accompanying files are documentation files that should contain brief descriptions of the datasets. The total receipts file (movietotal.dat.txt) has four variables: the movie's number in the alphabetical list, its title, its characteristic (type), and the gross US receipts (in $ millions). There are two time series files (moviedaily.dat.txt and movieweekend.dat.txt), one showing daily per theater box office receipts in dollars, and the other showing weekend per theater box office receipts, for these movies.
The daily and weekend time series files have five variables. The first variable is the movie's number in the alphabetical list, the second is the movie title, the third is an index for the observation number, the fourth is the per theater box office receipt amount in dollars, and the fifth is the date (mm/dd/yyyy). For weekend data, the date is for the Friday of the Friday, Saturday, and Sunday that comprise the weekend total. If daily data is missing for a title, the third, fourth, and fifth variables are coded as NA. Movie titles are arranged alphabetically. The day of the week is not provided in the daily chart; if you have your students take this data to Excel, they can use the “=Weekday” function to determine the day of the week.
Some movies opened to a limited audience and so on those occasions we waited to record values until the movie was in general release. For some titles, the site does not report receipts every day and/or weekend near the end of the movie's run. It is a good exercise for students to look for missing entries in the time series and determine what to do about those instances. Alternatively, instructors might decide to cleanse the data in advance.
For the dataset refer to thislink: https://www.tandfonline.com/doi/full/10.1080/10691898.2009.11889512
Exercise 1: Data Retrieval and Graphing
Students will locate data for a specific movie, bring the data to the software package, format it, and create a time series plot. We use this in the first days of the introductory business statistics class; it would also be suitable for an information literacy class.
Exercise 2: Descriptive Statistics & Analysis
Students will compute descriptive statistics for several different types of movies using software, and examine these statistics to draw conclusions about the movie types. We use this exercise in the early part of the introductory business statistics class. It could also be used to illustrate the difficulty of using descriptive statistics to draw conclusions about time series data.
Exercise 3: Examination of Time Series Data
Students will create time series plots using daily and weekend movie box office data. Using visual analysis and software tools, they will prepare a discussion of the features of the plots. We use this exercise at the beginning of the forecasting unit to help students recognize trend and
seasonality in time series data.
Exercise 4: Nonlinear Trend Forecasting
Using software, students will fit several nonlinear trend equations to the weekend per theater box office receipts and determine their suitability as forecasting models. We have used this exercise to illustrate nonlinear regression, trend fitting, and concepts of rate of change. It also provides the basis for a discussion of overfitting models when we ask students to consider whether their models are reasonable and appropriate.
Exercise 5: Time Series Project
This project duplicates the activities of previous exercises, combining them into one project, and adds a calculus-based activity for rate of change. We have had good results using this exercise as an out-of-class group project in the second required statistics course.
Exercise 6: Seasonal Forecasting
Students will examine the seasonal patterns in the daily per theater box office receipts. Using software tools available, they will create seasonal forecasting models and evaluate them. We have used this exercise in both the second required business statistics class, where we generally rely on seasonal decomposition, and in the specialized forecasting class, where we ask students to develop and compare results from several more advanced seasonal forecasting procedures.
Exercise 7: Comparing Several Movies
This is a more advanced exercise and could be used in our second course or a business strategy class. Students will play the role of a movie industry analyst who must predict box office revenue for a new movie. In order to find similar movies to use for comparison, they will need to determine which factors are appropriate. Data from the comparison group will be used to develop a model for the new release. We recommend this as a group exercise for upper level students.