In: Statistics and Probability
1)
a. What kind of attribute is Degree of illness?
b. How categorical and measurement data are typically summarized?
c. When can we apply central limit theorem?
d. If I know the proportion of people in the world who are susceptible to certain disease, how would we predict if a random individual will be susceptible to that disease?
e. How statistics and probability are related to each other in data science tasks?
f. 80% student in the class rooms are undergrad and 50% of the students weigh less than 160 lbs. what is the probability that the student is both graduate and weighs less than 160 lbs?
g. Tom and harry are going to take a driver's test at the nearest DMV office. Tom estimates that his chances to pass the test are 70% and harry estimates his as 80%. Tom and harry take their tests independently. What is the probability that at most one of the two friends will pass the test?
h. What is instance based representation of learning?
i. Difference between classification and association rules?
j. What representation is useful for hierarchical clustering?
k. How to handle missing values in the data?
1. Ordinal information: a form of categorical data during which order is critical
Ex: Degree of illness
None, mild, moderate, severe, going, gone
2. The objects being studied are “measured” supported some quantitative trait?
Measurement information classified as separate or Continuous
Measurement information is sometimes summarized mistreatment “averages” (or “means”).
• Average variety of siblings fall one998 Stat 250 students have is 1.9.
• Average weight of male fall 1998 Stat 250 students is 173 pounds.
• Average weight of feminine fall 1998 Stat 250 students is 138 pounds.
3. What the Central Limit Theme is expression is that if you construct a bar chart that's made from samples say thirty and larger in size, the ensuing bar chart can tackle the form of "Normal" distribution. Once you ensure this then you may use the tests that revolve around a standard distribution, being careful together with your conclusions because the results relate to giant sample sizes you used, to not the individual population. In most cases is healthier to work out the elemental distribution (make a histogram) you're operating with and choose your tests consequently.
5.
Data science usually uses applied math inferences to predict or analyze trends from information, whereas applied math inferences use chance distributions of knowledge. Therefore knowing chance and its applications are necessary to figure effectively on information science issues.