Question

In: Computer Science

Advanced Database: What is overfitting? What is underfitting? What is decision tree pruning? What is cross-validation?...

Advanced Database:

  1. What is overfitting?
  2. What is underfitting?
  3. What is decision tree pruning?
  4. What is cross-validation?
  5. What is the role of the activation function?
  6. Provide some examples of activation functions.

Every answer should be minimum if 4 to 5 lines

Solutions

Expert Solution

Underfitting:
A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data. (It’s just like trying to fit undersized pants!) Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data. In such cases the rules of the machine learning model are too easy and flexible to be applied on such a minimal data and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing the features by feature selection.


Overfitting:
A statistical model is said to be overfitted, when we train it with a lot of data (just like fitting ourselves in an oversized pants!). When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set. Then the model does not categorize the data correctly, because of too much of details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

Decision tree pruning
Pruning is a technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

Cross-validation

Cross validation (CV) is one of the technique used to test the effectiveness of a machine learning models, it is also a re-sampling procedure used to evaluate a model if we have a limited data. To perform CV we need to keep aside a sample/portion of the data on which is do not use to train the model, later us this sample for testing/validating. There are many methods
it sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

The purpose of the activation function is to introduce non-linearity into the output of a neuron.

Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it.

Most popular types of Activation functions -
Sigmoid or Logistic.
Tanh — Hyperbolic tangent.
ReLu -Rectified linear units.


Related Solutions

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models...
Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models will we end up creating?
What is the decision learning tree?
What is the decision learning tree?
Describe a procedure of cross validation in Kriging in brief
Describe a procedure of cross validation in Kriging in brief
[USING R & dataset “Boston”] Using the leave-one-out cross-validation and 5-fold cross-validation techniques to compare the...
[USING R & dataset “Boston”] Using the leave-one-out cross-validation and 5-fold cross-validation techniques to compare the performance of models in (a) and (b) with: (a) SalesPredict <- lm(Sales ~ Price + Urban + US, data = Carseats) (b) SalesRevise <- lm(Sales ~ Price + US, data = Carseats) Hint: Functions update (with option subset) and predict.
Paul Bowlin owns and operates a tree removal, pruning, and spraying business in a metropolitan area...
Paul Bowlin owns and operates a tree removal, pruning, and spraying business in a metropolitan area with a population of approximately 200,000. The business has grown to the point where Bowlin uses one and sometimes two crews, with four or five employees on each crew. Pricing has always been an important tool in gaining business but Bowlin realizes that there are ways to entice customers other than quoting the lowest price. For example, he provides careful cleanup of branches and...
• What is a decision tree? • What is line balancing? • What is a group...
• What is a decision tree? • What is line balancing? • What is a group technology?
Under what conditions is a decision tree preferable to a decision/payoff table? In what types of...
Under what conditions is a decision tree preferable to a decision/payoff table? In what types of situations would it be more appropriate to use a utility framework for decision making rather than an expected monetary value framework? Give an example.
What is the purpose of cross-validation? Specifically, why wouldn’t we test a hypothesis trained on all...
What is the purpose of cross-validation? Specifically, why wouldn’t we test a hypothesis trained on all of the available examples?
What process is involved in pruning weak synaptic connections in networks? The pruning of networks results...
What process is involved in pruning weak synaptic connections in networks? The pruning of networks results in a core network in which the nodes have highly and mutual interconnections. I would propose that the pruning process should be the teacher? Right? Is pruning similar to back-propagation? I believe both are relying on trial and error. Is the role of the teacher only limited to return pathways in the brain?
how do you compute the test error in 5-fold cross-validation?
how do you compute the test error in 5-fold cross-validation?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT