In: Accounting
Describe and illustrate (with screenshots within any chosen data analytics/app) the following concepts:
1. Central Tendency,
2. Variability,
3. Normal Distribution,
4. Standardization
1. Central Tendency
Central tendency is a descriptive summary of a dataset through a single value that reflects the center of the data distribution. Along with the variability (dispersion) of a dataset, central tendency is a branch of descriptive statistics.
The central tendency is one of the most quintessential concepts in statistics. Although it does not provide information regarding the individual values in the dataset, it delivers a comprehensive summary of the whole dataset.
Measures of Central Tendency
Generally, the central tendency of a dataset can be described using the following measures:
Even though the measures above are the most commonly used to define central tendency, there are some other measures, including, but not limited to, geometric mean, harmonic mean, midrange, and geometric median.
The selection of a central tendency measure depends on the properties of a dataset. For instance, the mode is the only central tendency measure for categorical data, while a median works best with ordinal data.
Although the mean is regarded as the best measure of central tendency for quantitative data, that is not always the case. For example, the mean may not work well with quantitative datasets that contain extremely large or extremely small values. The extreme values may distort the mean. Thus, you may consider other measures.
The measures of central tendency can be found using a formula or definition. Also, they can be identified using a frequency distribution graph. Note that for datasets that follow a normal distribution, the mean, median, and mode are located on the same spot on the graph.
2. Variability
Variability, almost by definition, is the extent to which data points in a statistical distribution or data set diverge—vary—from the average value, as well as the extent to which these data points differ from each other. In financial terms, this is most often applied to the variability of investment returns. Understanding the variability of investment returns is just as important to professional investors as understanding the value of the returns themselves. Investors equate a high variability of returns to a higher degree of risk when investing.
KEY TAKEAWAYS
• Variability refers to the divergence of data from its mean value, and is commonly used in the statistical and financial sectors.
• Variability in finance is most commonly applied to variability of returns, wherein investors prefer investments that have higher return with less variability.
• Variability is used to standardize the returns obtained on an investment and provides a point of comparison for additional analysis.
Understanding Variability
Professional investors perceive the risk of an asset class to be directly proportional to the variability of its returns. As a result, investors demand a greater return from assets with higher variability of returns, such as stocks or commodities, than what they might expect from assets with lower variability of returns, such as Treasury bills.
This difference in expectation is also known as the risk premium, The risk premium refers to the amount required to motivate investors to place their money in higher-risk assets. If an asset displays a greater variability of returns but does not show a greater rate of return, investors will not be as likely to invest money in that asset.
Variability statistics refers to the difference being exhibited by data points within a data set, as related to each other or as related to the mean. This can be expressed through the range, variance or standard deviation of a data set. The field of finance uses these concepts as they are specifically applied to price data and the returns that changes in price imply.
The range refers to the difference between the largest and smallest value assigned to the variable being examined. In statistical analysis, the range is represented by a single number. In financial data, this range is most commonly referring to the highest and lowest price value for a given day or another time period. The standard deviation is representative of the spread existing between price points within that time period, and the variance is the square of the standard deviation based on the list of data points in that same time period
3. NORMAL DISTRIBUTION
The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.
In this blog post, you’ll learn how to use the normal distribution, its parameters, and how to calculate Z-scores to standardize your data and find probabilities.
Example of Normally Distributed Data: Heights
Height data are normally distributed. The distribution in this example fits real data that I collected from 14-year-old girls during a study.
As you can see, the distribution of heights follows the typical pattern for all normal distributions. Most girls are close to the average (1.512 meters). Small differences between an individual’s height and the mean occur more frequently than substantial deviations from the mean. The standard deviation is 0.0741m, which indicates the typical distance that individual girls tend to fall from mean height.
The distribution is symmetric. The number of girls shorter than average equals the number of girls taller than average. In both tails of the distribution, extremely short girls occur as infrequently as extremely tall girls.
Parameters of the Normal Distribution
As with any probability distribution, the parameters for the normal distribution define its shape and probabilities entirely. The normal distribution has two parameters, the mean and standard deviation. The normal distribution does not have just one form. Instead, the shape changes based on the parameter values, as shown in the graphs below.
Mean
The mean is the central tendency of the distribution. It defines the location of the peak for normal distributions. Most values cluster around the mean. On a graph, changing the mean shifts the entire curve left or right on the X-axis.
Standard deviation
The standard deviation is a measure of variability. It defines the width of the normal distribution. The standard deviation determines how far away from the mean the values tend to fall. It represents the typical distance between the observations and the average.
On a graph, changing the standard deviation either tightens or spreads out the width of the distribution along the X-axis. Larger standard deviations produce distributions that are more spread out.
When you have narrow distributions, the probabilities are higher that values won’t fall far from the mean. As you increase the spread of the distribution, the likelihood that observations will be further away from the mean also increases.
Population parameters versus sample estimates
The mean and standard deviation are parameter values that apply to entire populations. For the normal distribution, statisticians signify the parameters by using the Greek symbol μ (mu) for the population mean and σ (sigma) for the population standard deviation.
Unfortunately, population parameters are usually unknown because it’s generally impossible to measure an entire population. However, you can use random samples to calculate estimates of these parameters. Statisticians represent sample estimates of these parameters using x̅ for the sample mean and s for the sample standard deviation.
Common Properties for All Forms of the Normal Distribution
Despite the different shapes, all forms of the normal distribution have the following characteristic properties.
4. Standardization
The word standardization may sound a little weird at first but understanding it in the context of statistics is not brain surgery. It is something that has to do with distributions. In fact, every distribution can be standardized. Say the mean and the variance of a variable are mu and sigma squared respectively. Standardization is the process of transforming a variable to one with a mean of 0 and a standard deviation of 1.
You can see how everything is denoted below along with the formula that allows us to standardize a distribution.
What’s a Standard Normal Distribution?
Logically, a normal distribution can also be standardized. The result is called a standard normal distribution.
You may be wondering how the standardization goes down here. Well, all we need to do is simply shift the mean by mu, and the standard deviation by sigma.
We use the letter Z to denote it. As we already mentioned, its mean is 0 and its standard deviation: 1.
The standardized variable is called a z-score. It is equal to the original variable, minus its mean, divided by its standard deviation.