Question

In: Statistics and Probability

In this exercise, we will look at descriptive statistics and how to explore and summarize data...

In this exercise, we will look at descriptive statistics and how to explore and summarize data sets. For this, we use the Heart Disease dataset from the UCI data repository. This dataset consists of 4 small datasets of people with heart disease admitted to 4 hospitals.

For now, we only work with the file. this data consists of 271 instances with 7 attributes. The attributes are described as below:

Age: age in years

sex: 1 = male; 0 = female

cp: chest pain type

Value 1: typical angina

Value 2: atypical angina

Value 3: non-anginal pain

Value 4: asymptomatic

Trestbps: resting blood pressure

Chol: cholesterol level

Thalach: maximum heart rate achieved

heart_problem: 1= have heart problem; 0=No heart problem

Instruction: Use Microsoft Excel to do your work. Please submit your work as ONE MS excel file and create one tab for each question. Show your work as rigorously as possible. name the file as lastname_fastname_hw1.excel.

Using the attached data, answer the following questions:

1. How many patients have heart disease? (0.5)

2. What is the average Cholesterol level of people with heart disease and without heart disease? What is the standard deviation? (1)

3. What is the median and average age of people with,

a. cholesterol higher than 240.0? (0.5)

b. cholesterol higher than 240.0 with heart disease? (0.5)

c. cholesterol higher than 240.0 without heart disease? (0.5)

4. Create a histogram of resting blood pressure. (1)

5. Create boxplots based on the sex of the patients for the following attributes:

a. cholesterol level (1.5)

b. maximum heart rate achieved (1.5)

6. For each Box plot, answer the following questions:

a. What is the H-Spread (Q3-Q1) of cholesterol level for male and females? (0.5)

b. What are the Lower Hinge and Upper Hinge values for maximum heart rate for male and female? (0.5)

7. In order to find if two attributes are related and their values change together, we can use Scatter plot. Follow the instruction below and answer the questions:

a. Create two scatter plots of age and resting blood pressure for people with heart disease and without heart disease. Is there any visual correlation? (1+1)

b. Calculate the average resting blood pressure of each age (HINT : Use Groupby for age) for people with heart disease. (1)

c. Calculate the average resting blood pressure of each age (HINT : Use Groupby for age) for people without heart disease. (1)

d. Now create two scatter plots using the previous results. Do you observe a correlation now? Do people without heart disease have higher blood pressure as they age than people with heart disease? (2)

8.Compare the resting blood pressure of people with heart disease and without. (1)

LINK TO Data set

https://docs.google.com/document/d/1KYER8cMeWPcOlMJpegWNIDAF4maIAthKTM3Hrpr8rxk/edit?usp=sharing

Solutions

Expert Solution

1

Count of patients having heart disease :

101

2

Average cholestrol level of people having heart disease

269.1881188

Average cholestrol level of people not having heart disease

239.9529412

Standard deviation of cholestrol level pf people

67.65771142

3

Median age

Average age

Cholestrol > 240

49

48.35251799

Cholestrol > 240 having heart disease

50

49.41935484

Cholestrol > 240 with no heart disease

48

47.49350649

For solution 3, we create subsets for the three categories so that we can calculated median and average for the three categories. We create these categories by applying filter on the columns Cholestrol dummy variable (>240) and heart problem. First we create dummy for Cholestrol by adding for each row one variable “=IF(<chol_variable> > 240,1,0)” and get the following dummy(for few rows).

Cholestrol(>240)

0

1

0

0

0

0

1

1

0

0

0

0

0

1

1

0

1

Chol > 240

Chol > 240 heart patients

Chol > 240 no heart disease

29

31

29

32

33

32

33

35

33

35

36

35

35

38

35

36

40

36

37

41

37

37

43

37

37

46

37

38

48

38

38

48

38

38

48

38

39

49

39

39

50

39

39

50

39

39

51

39

40

57

40

40

59

40

40

60

40

41

65

41

41

32

41

41

39

41

41

40

41

41

43

41

41

48

41

42

48

42

42

48

42

43

53

43

43

54

43

44

54

44

45

55

45

46

57

46

46

58

46

47

44

47

47

44

47

47

46

47

47

47

47

48

49

48

48

52

48

48

52

48

48

52

48

49

52

49

49

53

49

50

53

50

52

55

52

52

55

52

52

55

52

53

56

53

53

56

53

53

59

53

53

65

53

53

41

53

54

43

54

54

44

54

54

47

54

54

50

54

54

52

54

54

52

54

54

54

54

55

56

55

55

58

55

55

65

55

55

55

55

55

55

55

55

55

56

56

57

57

57

57

57

57

58

58

59

59

59

59

60

60

61

61

61

61

62

62

31

33

35

36

38

40

41

43

46

48

48

48

49

50

50

51

57

59

60

65

32

39

40

43

48

48

48

53

54

54

55

57

58

44

44

46

47

49

52

52

52

52

53

53

55

55

55

56

56

59

65

41

43

44

47

50

52

52

54

56

58

65

4 First we create BINS for Histogram :
Here, MIN = 98, MAX = 190
So the bins created are :

110

125

140

155

170

185

Then click on Data Tab -> Data analysis Toolpack -> Histogram -> Input Range : select the column trestbps -> Bin Range : select the above values -> Tick chart output -> OK

Bin

Frequency

95-110

11

110-125

30

125-140

26

140-155

9

155-170

4

170-185

1

More

1

Bins are edited manually to make it understandable.(Else it was just 110,125,….,185,more)



Related Solutions

use methods of descriptive statistics to summarize the data and comment on your findings - Income...
use methods of descriptive statistics to summarize the data and comment on your findings - Income ($1000s) Household Size Amount Charged ($) 54 3 4,016 30 2 3,159 32 4 5,100 50 5 4,742 31 2 1,864 55 2 4,070 37 1 2,731 40 2 3,348 66 4 4,764 51 3 4,110 25 3 4,208 48 4 4,219 27 1 2,477 33 2 2,514 65 3 4,214 63 4 4,965 42 6 4,412 21 2 2,448 44 1 2,995 37...
This week you will explore descriptive statistics. You may not have noticed how often you are...
This week you will explore descriptive statistics. You may not have noticed how often you are presented with statistics in the media and in everyday conversations. It is common for people to make statements like “Statistics show that… [insert claim].” You are surrounded by statistics every day. You see them on news shows, hear them on the radio, read about them in magazines, newspaper and the internet. Tell about two that you have seen recently. Write if you think if...
How we can use descriptive statistics for quantitative data analysis? Describe detailed calculation with an example...
How we can use descriptive statistics for quantitative data analysis? Describe detailed calculation with an example of survey data.
Use methods of descriptive statistics to summarize the data. Comment on the findings. Please include formulas used and steps to complete it.
Use methods of descriptive statistics to summarize the data. Comment on the findings. Please include formulas used and steps to complete it. Data: Golfer Earnings ($1000s) Scoring Avg. Greens in Reg. Putting Avg. Ai Miyazato 57017 72.00 0.702 30.04 Alena Sharp 27127 72.80 0.689 30.65 Alison Lee 136411 70.72 0.716 29.17 Alison Walshe 66038 72.45 0.653 29.55 Amelia Lewis 16524 73.33 0.636 29.72 Amy Anderson 20459 73.40 0.708 31.60 Amy Yang 470755 70.47 0.752 30.03 Angela Stanford 93913 71.46 0.718...
List three types of descriptive -statistics- commonly used in research. Why do we use descriptive statistics...
List three types of descriptive -statistics- commonly used in research. Why do we use descriptive statistics in both descriptive methods research and experimental methods research? Why are descriptive statistics so important in quantitative research? Could we do quantitative experimental research without descriptive statistics? If so or not, why or how? What are your thoughts given what we are learning?
According to Casto (2018), “Descriptive statistics summarize the utility, efficacy and costs of medical goods and...
According to Casto (2018), “Descriptive statistics summarize the utility, efficacy and costs of medical goods and services. Increasingly, health care organizations employ statistical analysis to measure their performance outcomes.” Based on the reading please select one of the following topics: Health Care Utilization Resource Allocation Needs Assessment Quality Improvement Product Development Research the topic in the Library and locate one article to review. Share the descriptive statistics from the article using your own words. Include the statistical analysis that was...
Use appropriate descriptive statistics to summarize each of the two variables for the 40 Gulf View...
Use appropriate descriptive statistics to summarize each of the two variables for the 40 Gulf View condominiums, and each of the two variables for the 18 No Gulf View condominiums. What are the means and standard deviations of the four variables? . Compare your summary results. Discuss any specific statistical results that would help a real estate agent understand the condominium market. In particular, what are the percent discounts between the average list and sale price for Gulf View and...
What are the appropriate descriptive statistics to summarize the Company-Z daily sales in Pre- and Post-...
What are the appropriate descriptive statistics to summarize the Company-Z daily sales in Pre- and Post- COVID-19 Y1 & Y2?   Can you visualize both random variables separately using the graphing technique? Explain why you used these descriptive statistics and this graphing technique?                Given; Date 1-Nov-2019 2-Nov-2019 3-Nov-2019 4-Nov-2019 5-Nov-2019 6-Nov-2019 Pre-COVID-19 Y1 4365.5 4365.8 4366.3 4365.9 4365.7 4366.3 X1 7.0 7.1 7.2 7.7 7.3 6.0 Date 1-Apr-2020 2-Apr-2020 3-Apr-2020 4-Apr-2020 5-Apr-2020 6-Apr-2020 Post-COVID-19 Y2 3612.2 3617.0 3614.9 3612.3 3617.5...
There are a number of descriptive statistics that could be used to describe data from a...
There are a number of descriptive statistics that could be used to describe data from a study. Name and define at least 5
What is descriptive statistics? What is inferential statistics? Why do we care about the level of...
What is descriptive statistics? What is inferential statistics? Why do we care about the level of measurement of a variable? Why do we use n-1 rather than n in calculating sample variance? What are confounding variables, and what effect do they have on assessing cause-and-effect relationships? When would you prefer median to mean as a measure of central tendency? Why don’t we just sum the deviations from the mean to measure dispersion of a variable? When is it legitimate to...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT