In: Statistics and Probability
1A. Explain how histograms (Links to an external site.) are used to examine data. Describe the various shapes of histogram and their meaning.
1B. When would it be appropriate to use a histogram as opposed to a scattergram?
1C. What is the purpose of a Data Dashboard? If you were the manager of a restaurant, for example, what 5 key metrics would you want to display on a Data Dashboard? (note: You should think about what important metrics a company, particularly a restaurant would want to measure. Answers will vary.)
1D.. Describe "Simpson's Paradox". Search the Internet and provide an example of this statistical phenomenon.
Histogram-Histogram is a graph continuing a set of rectangles each being constructed to represent the size of the class interval by its width and the frequency in the each class interval and the total area of the histogram is proportional to total frequency.
Figure A Data skewed right Figure B Data skewed left Figure C Symmetric data
Data sets have many different shape but for statistical analysis these three are sufficient.
If most of the data are on the left, with a few smaller values showing up on the right side of the histogram, the data are skewed to the right.Figure A shows an example of data that are skewed to the right.
If most of the data are on the right, with a few smaller values showing up on the left side of the histogram, the data are skewed to the left.Figure B shows an example of data that are skewed to the left.
If the data are symmetric, they have about the same shape on either side of the middle. In other words, if you fold the histogram in half, it looks about the same on both sides. Figure C shows an example of data that are symmetric.
A scatter graph is a graphical display that shows the correlation or relation between two data by plotting dots to represent each pair of scores. A scatter graph indicates the strength and direction of the correlation of co-variables.
B-Major difference is that a histogram is only used to plot the frequency of score occurrences in a continuous data set that has been divided into classes,(It shows data characteristics how the data is distributed of two population or two different groups) called bins. Bar charts, on the other hand, we can graphical display that shows the correlation or relation between two data by plotting dots to represent each pair of scores.
C-A data dashboard is an information management tool that visually tracks, analyzes and displays key performance indicators (KPI), metrics and key data points to monitor the health of a business, department or specific process.
A data dashboard is the most efficient way to track multiple data sources because it provides a central location for businesses to monitor and analyze performance. Real-time monitoring reduces the hours of analyzing and long line of communication that previously challenged businesses.
Material in Kitchen | Stock position |
Vegetable | + Increased |
Oil | - Decreases |
meat | = Equal |
Chicken | + Increased |
Salt | + Increased |
D answer example-In a certain hospital, there are two surgeons. Surgeon A operates on 100 patients, and 95 survive. Surgeon B operates on 80 patients and 72 survive. We are considering having surgery performed in this hospital and living through the operation is something that is important. We want to choose the better of the two surgeons.
Simpson's Paradox -Suppose we are observing several groups, and establish a relationship or correlation for each of these groups. Simpson’s paradox says that when we combine all of the groups together and look at the data in aggregate form, the correlation that we noticed before may reverse itself. This is most often due to lurking variables that have not been considered, but sometimes it is due to the numerical values of the data.
We look at the data and use it to calculate what percentage of surgeon A's patients survived their operations and compare it to the survival rate of the patients of surgeon B.