In: Statistics and Probability
Q1: Define the following terms:
a. correlation coefficient
b. scatter plot
c. bivariate relationship
Q2: Provide an example where the outlier is more important to the research than the other observations?
Q3: Identify when to use Spearman’s rho
Q1
answer:
a. correlation coefficient:
A connection coefficient is a factual proportion of how much changes to the estimation of one variable foresee change to the estimation of another. In emphatically related factors, the esteem increments or abatements pair. In adversely corresponded factors, the estimation of one increments as the estimation of alternate abatements.
b. scatter plot:
Scatter plots are like line diagrams in that they utilize flat and vertical tomahawks to plot information focuses. Be that as it may, they have an unmistakable reason. Disseminate plots demonstrate the amount one variable is influenced by another. The connection between two factors is called their relationship .
c. bi variate relationship:
Information in insights is now and again arranged by what number of factors are in a specific report. For instance, "stature" may be one variable and "weight" may be another variable. Contingent upon the quantity of factors being taken a gander at, the information may be uni variate, or it may be bi variate.
Q2
answer:
Q3
answer:
necessities:
SOLUTION:-
We have 2 types of answers ,based on u r convince u have to choose any 1 answer ..(both are correct)
When a point deviated far from the higher frequency of the data (common zone) we call it as Outlier.
So deviation from normal flow of data is called outlier.
Consider an example where our data is economic growth of a country.
We have data of 25 countries and we boxplot them.
The point outside the box we call it outlier. Since the data is economic growth, the point deviated from normal flow we can say that this country has significantly high or low economic growth. In this way the outlier can be a important part of analysing data.
( OR)
In statistics, an outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses. Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, which may be two distinct sub-populations, or may indicate 'correct trial' versus 'measurement error'; this is modeled by a mixture model.