In: Math
According to the text, Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data. Write a paper at least 3 pages long detailing how specific concepts covered in this class fit into the definition and how they can apply to your area of study.
The concepts that need to be covered are the five-number summary, probability distributions, and hypothesis testing. For each concept be sure to explain the details of the concept and its importance, how it fits into the definition of statistics, and give a detailed example of how that concept can be used in your field of study. You can, but don’t have to, research your field of study to find examples. If you do, be sure to cite your sources.
Write the paper with a strong opening paragraph that makes the reader interested in reading the rest of the paper. It should have a strong closing paragraph that ties the paper together. In other words, don’t just answer questions, write a paper that informs and happens to cover the questions along the way. If you use your text or other sources for information in your paper be sure to cite those sources and include a works cited page at the end.
Statistics:
Statistics is a branch of science it define like obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.
"You can Convince people by Numbers only not by words"
Assume you have some information like in USA the people average sleeping time has 8 hours if you pass this statements people will not listen you, suppose if you pass this statements with numbers with analysis then people will listen you, that much powerful tool is statistics.
Generally in statistics we use to draw conclusions either population or sample these methods are called Differential and inferential statistics, in both methods our goal is to find some statistic(sample) or parameter(population)
Collect the data:
The same example statement to analyze the first step is collect the data, we should gather data from different source like name,age,gender,zip code etc
Organize the data:
After gathering data ,the data may be is not in structured format, some data is missing so we should organize the data in good format like structured format.
Analyze:
Now this is important step over data analyze, there are many techniques to analyze the data and draw conclusions from the data.
we should answer all type of questions below i have written......
(i) what is the types of data variables Numerical or Categorical
(ii) suppose after gather data the data may contain missing values, then you should impute missing values by using mean,median,mode
(iii)before that you should analyze central tendency of data like mean,mode and median
(iv)how the data is distributed ,is any outliers in the data? if it has outliers how you deal?
(v)which transformation technique you will apply on outlier concept example: log,square root etc
(vi) what is the standard deviation interpretation of data how can you justification empirical rule?
example:1
suppose if a house broker wants to show a house prices like this ...
h1=1cr.h2=2cr,h3=3cr,h4=4cr,h5=50 cr
if you apply add h5 house price to data and calculate mean of house price = (1+2+3+4+50)/5=60/5=12cr
if you remove h5 fro data then calculate mean= (1+2+3+4)/4=10/5=2.5
see the magic....this is because h5 house price is so high compare to other prices, it is like an outlier so data is skewd...for this type we will not use mean, we go for median
like this many insights we should draw from data
example=2:
why we go for Standard deviation instead of Variance?
we have so many dispersion variables are there like range,mean deviation,absolute deviation ,variance, and SD
every thing has some disadvantages let see....
a) Range:
suppose 1,2,3,4,5,6,........................100
range=(100-1)=99 --------------------------------> draw back is not consider middile values
b) Mean deviation:
1,2,3,4,5----------> mean =3
(1-3)+(2-3)+(3-3)+(4-3)+(5-3)=0 if add all deviations it gives zero
c)Absolute Deviation:
To avoid this zero we go for modulus concept
But |x| is discontinue at zero , so for math calculations it will not prefer
d) Variance:
Next we go for Variance , variance will avoid draw back of both Deviation and Absolute Deviation but due to square term it automatically square the units also, to avoid this draw back we go for Standard deviation
like this statistics is very beautiful subject to analyze and draw conclusions on data
Apart from this Statistical analysis depends on Distributions and Hypothesis.
Distributions:
Example:
In statistical experiments involving chance, outcomes occur randomly. As an example of such an experiment, a teacher randomly selects three students from a large batch of students to be tested for pass the exam
Each selected student is to be rated as good or poor. The students are numbered from 1 to 3, a poor student is designated with a P, and a good Student is designated with a G.The expression P1 G2 P3 denotes one particular outcome in which the first and third students are poor and the second student is good. In this chapter, we examine the probabilities of events occurring in experiments that produce discrete distributions. In particular, we will study the binomial distribution, the Poisson distribution, and the hyper-geometric distribution
like Discrete distributions we have Continuous distributions
Continuous Distributions:
Continuous distributions are constructed from continuous random variables in which values are taken on for every point over a given interval and are usually generated from experiments
Continuous Distributions-----> Experiments "Measured
Discrete Distributions----------> Experiments "Counted"
In continuous Distributions we calculated area, the area under curve=1
The many continuous distributions in statistics include the uniform distribution, the normal distribution, the exponential distribution, the t distribution, the chi-square distribution, and the F distribution.
Sampling and Sampling distributions:
Here we calculate population parameter like mean and standard deviation by using sample statistic, here we will draw different samples for each sample we will get some statistic, we will make probability distributions for this statistic called sample distributions.
we will estimate the population mean by using Z and t-statistic, in this we will learn point estimate and interval estimate also called confidence interval.
explanation:
A point estimate is a statistic taken from a sample that is used to estimate a population parameter. A point estimate is only as good of its sample.for each sample the sample statistic may vary o we go for interval estimate/
CI=point estimate*2(Standard Error)
Hypothesis Testing:
hypothesis means weather or not the claim is valid . we conduct the hypothesis about population not about sample .we are testing the probability of those assumptions,if probability of assumptions rare enough or less they are probably wrong This is called RARE EVENT RULE .
example:
statistics never prove right, we can not say the person is innocent we can say NOT GUILTY,
so why we have two NULL HYPO:
a)rejecting Null hypo: I have enough evidence to prove Ho is wrong
b)fail to reject Null hypo: I dont have enough evidence to prove Ho is wrong
evidence come from sample data, fail to reject means not Accepting, may be he did crime but not enough evidence to claim,not guilty means might not be an innocent Alternative Hypo:
we have to make decision by using evidence, for that we should know about Significance level Significance level: alfha=0.001 C.I =90%,alfha=0.05 C.I=95%, alfha=0.10 C.I=99%
Critical Values:
separate the Rejection region and fails to rejection region Alfah=0.05 if right tail test Z=1.645 Rejection Region: If our test statistic z falls in this region, we can reject Null hypo,compare critical and test statistic both are in same units like Z-score or probability or mean intervals
P-value Method:
probability associated with test statistic, associated with Z , in traditional method
(i)according to significance level we make critical value
(ii)after by sample we find test statistic
(iii)next we compare both critical and test statistic but in this method we attach with only test stat , we find the area according to test statistic, this area is P-value
(iv) we compare this p with alfha
(v) p<alfha reject Ho
(vi) instead of compare critical and test statistic we compare corresponding area or probability associated with
alfha--->area associated with critical, p---->area associated with test statistic
(vii) reject p<=alfha, fail to reject p>alfha ,
p-value more and more means that area might be come in to non rejection region
Like this we have many statistic techniques has used to get good insights from data