In: Statistics and Probability
(WEATHER PREDICTION PROBLEM)
Your task is to predict the probability of precipitation on a given
future day. To help you, you have weather statistics from the last
five years. Suppose you want to predict if there will be any
precipitation on May 19. Should you base your prediction
on:
the relative frequency of precipitation on May 19 for
the last five years
the relative frequency of precipitation on all days in
May during these years
the relative frequency of precipitation on all days
during these years?
Assume that the probability is simply estimated as the relative frequency of precipitation for all days you choose to include, and nothing else.Motivate your answer, and discuss the difficulties involved in choosing the model. Would the situation be any different if you had 100 years of weather statistics? Is choosing the model something that necessarily requires human judgement?
We should base our prediction on -
"the relative frequency of precipitation on all days in May during these years" because we can assume that the weather conditions in the month of may over the last 5 years provide us with sufficiently large population data to predict weather on any given day in the month of May. Note, just using the "May 19 data for last 5 years" will be too small a sample and will have high variability. And also, using " all days" data will be not giving the correct information for the specific month of May.
Note if we had data for 100 years, then we will still be using the entire May month data of last 100 years instead of just May 19 data again for the same reason that 100 is still a small sample size. Here, for choosing the model, we have to make a very critical assumption that the Weather climate has remained unchanged throughout the last 100 years despite the changes in human population, industrial growth, pollution factors etc. So, yes choosing a model does require human judgement.