Question

In: Statistics and Probability

How would bias impact developing of accurate predictive models? How would you minimize the impact of...

How would bias impact developing of accurate predictive models? How would you minimize the impact of bias?

Solutions

Expert Solution

Bias
Bias refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model. So, if the true relation is complex and you try to use linear regression, then it will undoubtedly result in some bias in the estimation of f(X). No matter how many observations you have, it is impossible to produce an accurate prediction if you are using a restrictive/ simple algorithm, when the true relation is highly complex.
Bias and variance are the two components of imprecision in predictive models, and in general there is a trade-off between them, so normally reducing one tends to increase the other. Bias in predictive models is a measure of model rigidity andinflexibility, and means that your model is not capturing all the signal it could from the data. Bias is also known as under-fitting. Variance on the other hand is a measure of model inconsistency, high variance models tend to perform very well on some data points and really bad on others. This is also known as over-fittingand means that your model is too flexible for the amount of training data you have and ends up picking up noise in addition to the signal, learning random patterns that happen by chance and do not generalize beyond your training data.
The simplest way to determine if your model is suffering more from bias or from variance is the following rule of thumb:

If your model is performing really well on the training set, but much poorer on the hold-out set, then it’s suffering from high variance. On the other hand if your model is performing poorly on both training and test data sets, it is suffering from high bias.
Depending on the performance of your current model and whether it is suffering more from high bias or high variance, you can resort to one or more of these seven techniques to bring your model where you want it to be:
1) Add More Data! Of course! This is almost always a good idea if you can afford it. It drives variance down (without a trade-off in bias) and allows you to use more flexible models.
2) Add More Features! This is almost always a good idea too. Again, if you can afford it. Adding new features increases model flexibility and decreases bias(on the expense of variance). The only time when it’s not a good idea to add new features is when your data set is small in terms of data points and you can’t invest in #1 above
3) Do Feature Selection. Well, … only do it if you have a lot of features and not enough data points. Feature selection is almost the inverse of #2 above, and pulls your model in the opposite direction (decreasing variance on the expense of some bias) but the trade-off can be good if you do the feature selection methodically and only remove noisy and in-informative features. If you have enough data, most models can automatically handle noisy and uninformative features and you don’t need to do explicit feature selection. In this day and age of “Big Data” the need for explicit feature selection rarely arises. It is also worth noting that proper feature selection is non-trivial and computationally intensive.
4) Use Regularization. This is the neater version of #3 and amounts to implicit feature selection. The specifics are beyond the scope for this post, but regularization tells your algorithm to try to use as few features as possible, or to not trust any single feature too much. Regularization relies on smart implementations of training algorithms and is usually the much preferred version of feature selection.
5) Bagging is short for Bootstrap Aggregation. It uses several versions of the same model trained on slightly different samples of the training data to reduce variance without any noticeable effect on bias. Bagging could be computationally intensive esp. in terms of memory.
6) Boosting is a slightly more complicated concept and relies on training several models successively each trying to learn from the errors of the models preceding it. Boosting decreases bias and hardly affects variance (unless you are very sloppy). Again the price is computation time and memory size.
7) Use a more different class of models! Of course you don’t have to do all the above if there is another type of models that is more suitable to your data set out-of-the-box. Changing the model class (e.g. from linear model to neural network) moves you to a different point in the space above. Some algorithms are just better suited to some data sets than others. Identifying the right type of models could be really tricky though!


Related Solutions

How would bias impact developing of accurate predictive models? How would you minimize the impact of...
How would bias impact developing of accurate predictive models? How would you minimize the impact of bias?
Why would someone use the Coase Theorem to minimize the impact of externalities?
Why would someone use the Coase Theorem to minimize the impact of externalities?
What is implicit bias? Who has implicit bias? How does implicit bias impact education, mental health,...
What is implicit bias? Who has implicit bias? How does implicit bias impact education, mental health, employment, socioeconomic status, and cultural and racial disparities?
1. How do the three criteria for an experimental design, manipulation, randomization, and control minimize bias...
1. How do the three criteria for an experimental design, manipulation, randomization, and control minimize bias and decrease threats to internal validity? 2. Why do researchers state that randomized clinical trials provide the strongest evidence for an individual study when using an evidence-based model? 3. How does intervention fidelity increase the strength and quality of the evidence provided by the findings of a study using these types of designs? 4. What is your cosmic question?
1. How do the three criteria for an experimental design, manipulation, randomization, and control minimize bias...
1. How do the three criteria for an experimental design, manipulation, randomization, and control minimize bias and decrease threats to internal validity? 2. Why do researchers state that randomized clinical trials provide the strongest evidence for an individual study when using an evidence-based model? 3. How does intervention fidelity increase the strength and quality of the evidence provided by the findings of a study using these types of designs?
Designing models that explain the impact of inflation in the economy is a topic that would...
Designing models that explain the impact of inflation in the economy is a topic that would be studied by a macroeconomist or a microeconomist? Explain.
Designing models that explain the impact of inflation in the economy is a topic that would...
Designing models that explain the impact of inflation in the economy is a topic that would be studied by a macroeconomist or a microeconomist? Explain
what is unconscious bias and how does it impact health care provision?
what is unconscious bias and how does it impact health care provision?
Explain how you would design an experiment to test for the presence of status quo bias
Explain how you would design an experiment to test for the presence of status quo bias
Discuss the need to use high-performance modeling nodes like support vector machines in predictive models. How...
Discuss the need to use high-performance modeling nodes like support vector machines in predictive models. How would you use these high-performance tools in a real-life situation?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT