In: Statistics and Probability
In a half page, explain the difference between categorical and quantitative data. What types of graphs make sense and what types of graphs do not make sense for categorical data? For quantitative data? Explain why. What types of calculations make sense and what types of calculations do not make sense for categorical data? For quantitative data? Explain why.
Categorical variables take category or label values and place an observation with in one and only one category which means the categories are mutually exclusive. For example if smoking is a categorical variable then it can be divided in to categories such as smoker and non smoker and an individual.would be placed in either of the two group or categories.
Bar graphs are widely used to plot the categorical variables. Since, the distribution of categorical variable desribes the count or percentage of observations with in the categories, the bars can represent this count for each category in the bar graph. Pie charts can also be used to express the percentage of individuals in all categories and is helpful for comparisons. Histograms, boxplot, stem and leaf display and scatter plots do not make sense for categorical variable since these graphs need numerical values and not labels to not produce absurd results.
For categorical variables, values such as observed and expected frequencies for each of the categories, as well as chi-square statistic will make sense as it can then be used to conclude if there is any association between different categorical variables
Quantitative variables take numerical values and represent some kind of measurement. Histograms, boxplot, scatter plot each requires the data to be quantitative. This is because each of these plots demand numerical values to be executed. Pie charts and bargraphs shouldn't be used for quantitative values as they represent the percentage and frequency of observations in different groups or categories, which is why numerical input for these graphs would lead to absurdity.
For quantitative data, all sorts of values such as mean, standard deviation, variance, kurtosis , skewness, mode, median etc. Will make sense as we can obtain the measures of central tendency and dispersion values only for numerical data.