In: Advanced Math
I'm using 2005 NFL stats to come up with a multiple linear regression analysis models with the winning percentage being the dependent variable. My question would be, what are the most significant variables that are used in deciding an NFL team's capacity to win? Passing yards, rushing game, defense or field goals are some of my independent variables. But I’m considering adding the defensive stats to the regression. How do I complete the introduction and model subtopics for my presentation?
1. Introduction
A. Topic: Select a topic of research in an area of applied science in which you are interested. Make sure to get instructor approval before proceeding. Use research literature to provide an introductory summary of the topic that establishes your interest and expertise within the area of study.
B. Research Question: Formulate an analytical, researchable question based on your topic. The scope of your research question should be reasonable given the time constraints of the course.
C. Information: Gather and summarize applicable data, research, or other information that you intend to use in the creation and analysis of your model. For example, you might first consider principles or data sets to inform the creation and analysis of your model.
D. Assessment: Assess the appropriateness of the information you gathered. How will the information help you create an effective mathematical model to address your research question? Are there any underlying assumptions or limitations in the information you gathered?
II. Model
A. Selection: Select a model type to create. Defend your selection, comparing it to other types of models. In other words, why did you choose your model type? Does the model type you chose, have any limitations? What comparative tests can you perform to support your model choice? Use research to support your model type selection.
B. Creation: Create the model using the information you gathered. Remember to be cognizant of the research question and to use appropriate mathematical tools, technology, theoretical underpinnings, and/or data, if applicable.
C. Process: Explain the process you used to build your model. Include your reasoning for specific choices and decisions you made while building your model. Use research to support your choices.
D. Tools: What tools and techniques did you use to create your model? Why are these tools and techniques appropriate for the information you gathered, and the model type you selected? Be sure to provide support for your choices.
E. Analysis: Analyze the results of your model to determine whether the model fits the research question and information you gathered. How well do the model results fit the question and information? Consider including computational tests or graphical displays to support your argument.
F. Limitations: What limitations does your model have? Use algorithmic, tabular, and graphical displays to articulate the limitations of your model.
G. Approach: Explain the approach that you took to answer your research question. Be sure to fully explain the process and steps you used to achieve your results. H. Applicability: Articulate the purpose for your model. How applicable is the model to the research question? How does the model help you answer your research question? How well does it align to the research
You can solve the problem with the use of Microsoft Excel and Data Analysis.
Data Collection
Prior to making any predictions, the first thing you must do is collect your data. In regards to NFL data, Pro Football Reference offers an abundant amount of data.
Determining the Relevant Variables
Too much irrelevant data can be a problem. After importing the Team Offense and Defense tables from Pro Football Reference to Microsoft Excel. It will give you 25 columns of statistics. Therefore, you need to filter out the data into only the variables that you deem most relevant to a team's ability to win.
What are the most important variables that are used in determining an NFL team's ability to win? The answer to this question varies depending on your opinion. Personally, I feel that these variables are the best determinants of an NFL team's ability to win:
Using only these variables, you can now transform your data into a less intimidating table. By carefully studying the variables, you may conclude that the Margin variable is determined by the other six variables. Therefore, the dependent variable of my data set is Margin and the independent, a.k.a. explanatory, variables are Sc% OFF, TO% OFF, YdsPen OFF, Sc% DEF, TO% DEF, YdsPen DEF. This information will be important for the next step.
How to Make Use of the Data
With all of this data, you need to create a relationship between your variables that will serve as a formula for computing your ratings. Linear regression is a good way to model the relationship between two variables (dependent and independent) of a data set. Since we previously determined that this model contains a dependent variable that is explained by several independent variables, linear regression is the method that we will use.
In Microsoft Excel, you can run a linear regression by going into the Data tab, then clicking Data Analysis and scrolling down to Regression. The Input Y Range (dependent variable) in my model is the Margin column. The Input X Ranges (independent variables) are the columns containing Sc% OFF, TO% OFF, YdsPen OFF, Sc% DEF, TO% DEF, YdsPen DEF. Once your linear regression is set-up, simply press OK to see your results.
The results of this linear regression were good. The regression output determined that there was an R Squared value of 0.9077 which, as expected, tells me that there is a strong relationship between the X and Y Ranges. Using the coefficients in the regression summary, this is the formula that I will use to determine each teams' rating:
Rating = A*ScOFF - B*TOOFF - C*YdsPenOFF - D*ScDEF + E*TODEF + F*YdsPenDEF
Here is how you can read the formula:
Ranking Each Team
With a formula in place, all you have to do now is calculate the rating of each team by inputting the variable data into the rating formula. The VLOOKUP() and SUM() functions in Microsoft Excel make this an easy task. Now, you can rank the teams based on their rating. The higher the rating, the better the rank.
Predictions & Results
You can predict results using the final ranking table you have obtained.
Conclusion
Obviously, there is no perfect method for predicting the outcome of a football game or any sports event for that matter. There are simply too many unpredictable variables to account for. However, by choosing your variables wisely it is evident that you can make a quality prediction.