Random variable is important to understand the ideas behind the
various techniques, in order to know how and when to use them. One
has to understand the simpler methods first, in order to grasp the
more sophisticated ones. It is important to accurately assess the
performance of a method, to know how well or how badly it is
working. Additionally, this is an exciting research area, having
important applications in science, industry, and finance.
Ultimately, statistical learning is a fundamental ingredient in the
training of a modern data scientist. Examples of Statistical
Learning problems include:
- Identify the risk factors for
prostate cancer.
- Predict whether someone will have
a heart attack on the basis of demographic, diet and clinical
measurements.
- Customize an email spam detection
system.
- Identify the numbers in a
handwritten zip code.
- Classify a tissue sample into one
of several cancer classes.
- Establish the relationship
between salary and demographic variables in population survey
data.
- Machine learning arose as a
subfield of Artificial Intelligence.
- Statistical learning arose as a
subfield of Statistics.
- Machine learning has a greater
emphasis on large scale applications and prediction accuracy.
- Statistical learning emphasizes
models and their interpretability, and precision and
uncertainty.
- But the distinction has become
and more blurred, and there is a great deal of
“cross-fertilization.
- Dimension reduction reduces the
problem of estimating p + 1 coefficients to the simple
problem of M + 1coefficients, where M < p.
This is attained by computing M different linear
combinations, or projections, of the variables. Then
these M projections are used as predictors to fit a linear
regression model by least squares. 2 approaches for this task are
principal component regression and partial least
squares.