Question

In: Statistics and Probability

Using PROC FORMAT and PROC FREQ for following data: (a) Define an appropriate format for the...

Using PROC FORMAT and PROC FREQ for following data:

(a) Define an appropriate format for the gender variable.

(b) Produce a 2 X 2 table with gender as the rows and lenses as the columns.

(c) Calculate the relative risk and provide a one sentence written interpretation explicitly stating which groups are being compared and defining the outcome.

(d) Perform a chi-squared test of association between gender and needing contact lenses. What are the results of the test (i.e. do you reject the hypothesis?)

Obs

id

carrot

gender

latitude

lenses

1

1

0

1

33

1

2

2

0

2

46

1

3

3

1

1

32

1

4

4

0

2

26

0

5

5

1

1

25

1

6

6

1

2

48

0

7

7

0

1

39

1

8

8

0

2

24

0

9

9

0

1

35

0

10

10

0

1

42

1

11

11

1

1

35

0

12

12

0

2

44

0

13

13

1

1

35

1

14

14

1

1

25

0

15

15

1

1

24

0

16

16

1

1

38

0

17

17

1

1

28

1

18

18

0

1

43

0

19

19

1

1

44

0

20

20

0

1

46

1

21

21

0

1

37

1

22

22

0

2

33

0

23

23

0

2

42

1

24

24

1

2

31

1

25

25

0

2

46

1

26

26

0

1

32

1

27

27

0

2

30

0

28

28

0

2

27

1

29

29

1

1

45

0

30

30

1

1

39

0

31

31

0

2

47

1

32

32

1

1

39

0

33

33

1

1

48

1

34

34

0

1

47

0

35

35

0

1

32

0

36

36

0

1

31

0

37

37

1

2

26

1

38

38

0

2

28

1

39

39

0

1

25

1

40

40

1

2

25

0

41

41

1

1

31

0

42

42

1

2

47

1

43

43

1

1

32

1

44

44

1

2

24

1

45

45

1

2

37

0

46

46

1

2

26

0

47

47

0

2

41

1

48

48

0

2

43

1

49

49

0

1

45

1

50

50

0

1

27

1

51

51

1

1

31

0

52

52

0

2

40

0

53

53

0

2

37

0

54

54

1

2

48

0

55

55

0

2

26

0

56

56

0

2

33

1

57

57

0

1

48

1

58

58

1

2

24

1

59

59

0

1

32

1

60

60

1

1

40

1

61

61

0

2

45

0

62

62

1

1

40

0

63

63

0

1

36

1

64

64

0

2

42

0

65

65

1

2

44

0

66

66

0

1

44

1

67

67

1

2

47

0

68

68

1

2

27

1

69

69

1

1

33

1

70

70

0

1

29

1

71

71

0

1

42

0

72

72

1

1

40

0

73

73

0

2

44

1

74

74

1

2

41

0

75

75

1

2

26

1

76

76

1

2

27

0

77

77

0

2

29

1

78

78

0

1

33

1

79

79

1

2

31

1

80

80

1

2

33

0

81

81

1

1

43

1

82

82

1

2

33

1

83

83

0

2

43

1

84

84

0

1

39

1

85

85

1

2

47

0

86

86

1

1

46

1

87

87

1

2

27

0

88

88

1

2

38

0

89

89

1

1

34

0

90

90

1

1

40

0

91

91

1

1

27

1

92

92

0

1

29

1

93

93

1

1

43

1

94

94

0

1

40

0

95

95

1

1

31

0

96

96

1

2

38

0

97

97

0

2

30

1

98

98

1

2

26

0

99

99

0

1

43

1

100

100

0

2

33

1

Solutions

Expert Solution

(a) Gender: Gender is nominal variable having two categories of male and female.

1 represents male and 2 represents female.

(b) Gender has two categories and lenses have also two categories.

Gender: 1 - male

2 - female

Lenses: 1 - Need lense

0 - No need of lense

The 2x2 contingency table for gender and lenses is:

Lenses
Need lense No need lense
Gender Male 30 22
Female 23 25

(c) Relative risk

The formula of relative risk is:

From contingency table A = 30, B = 22, C = 23 and D = 25

By plugging the values in the formula of relative risk:

Interpretation: Male has 1.2 times the risk of having or needing lenses than female.

(d) Chi-squares test of association.

The null and alternative hypothesis are:

H0: There is no association between gender and needing contact lenses.

H1: There is an association between gender and needing contact lenses.

Chi-squared test statistics:

The formula of chi-square test statistics is:

Where O - observed frequencies which are given in contingency table

and E - Expected frequencies.

Here have to find the expected frequencies.

Let us first find the expected value for the first cell that is for 30, the first row total is 52 and first column total is 53 and the overall that is grand total is 100

So the first cell expected value is:

The expected value for the first row and second column is:

The expected value for the second row and the first column is:

The expected value for the second row and second column is:

The table for calculation of test statistics is:

O E (O-E)2 (O-E)2/E
30 27.56 5.9536 0.216023
22 24.44 5.9536 0.243601
23 25.44 5.9536 0.234025
25 22.56 5.9536 0.263901

Test statistics is nothing but the sum of the last column.

Chi-square critical value:

To find critical value we need an alpha that is level of significance and degrees of freedom.

Alpha = 0.05 and

Degrees of freedom = (number or row - 1)*(number of column - 1) = (2 - 1)*(2 - 1) = 1

By using Chi square table the critical value for area 0.05 and degrees of freedom 1 is 3.841

Decision rule:

If critical value > test statistics then we fail to reject the null hypothesis otherwise reject the null hypothesis.

Here critical value(3.841) is > test statistics(0.96) so we fail to reject the null hypothesis.

Conclusion: That is there is no association between gender and needing contact lenses.


Related Solutions

Consider the program and data that follows, reflecting paired data: PROC FORMAT; VALUE $OPINION 'P'='Positive' 'N'='Negative';...
Consider the program and data that follows, reflecting paired data: PROC FORMAT; VALUE $OPINION 'P'='Positive' 'N'='Negative'; RUN; DATA A; LENGTH AFTER BEFORE $ 1; INPUT AFTER $ BEFORE $ COUNT; FORMAT BEFORE AFTER $OPINION.; DATALINES; N N 26 N P 38 P N 18 P P 159 ; Add to this SAS code to identify the appropriate p value for evaluating whether a significant change occurred from baseline to follow-up? ​​​​​​​ 0.0075 < 0.0001 0.0382 0.3383
Using appropriate example define the following terms as used in data analysis          scale level measuremnt signaficance...
Using appropriate example define the following terms as used in data analysis          scale level measuremnt signaficance level a sample statistic
Analysis of Unobserved Component Models Using PROC UCM Analyzes "PROC UCM" WITH STATISTICAL SOFTWARE SAS or...
Analysis of Unobserved Component Models Using PROC UCM Analyzes "PROC UCM" WITH STATISTICAL SOFTWARE SAS or R. THE FOLLOWING DATA CONTAINING THE SUGAR PRICE INDEX, SINCE JANUARY 2010 UNTIL DECEMBER 2015 SHOW THE CODE USED AND THE RESULTS DATE valor ene-10 375.5 feb-10 360.8 mar-10 264.8 abr-10 233.4 may-10 215.7 jun-10 224.9 jul-10 247.4 ago-10 262.7 sep-10 318.1 oct-10 349.3 nov-10 373.4 dic-10 398.4 ene-11 420.2 feb-11 418.2 mar-11 372.3 abr-11 345.7 may-11 312.2 jun-11 357.7 jul-11 400.4 ago-11 393.7...
Using SAS. (Unemployed Females Data) Use PROC X11to analyze the monthly unemployed females between ages 16...
Using SAS. (Unemployed Females Data) Use PROC X11to analyze the monthly unemployed females between ages 16 and 19 in the United States from January 1961 to December 1985 (in thousands). Unemployed Data: July       1 60572 August     2 52461 September 3 47357 October    4 48320 November   5 60219 December   6 84418 January    7 119916 February   8 124350 March      9 87309 April     10 57035 May       11 39903 June      12 34053 July      13 29905 August    14 28068 September 15 26634 October   16 29259...
which of the following is the appropriate debit/credit format for recording a business transaction?
which of the following is the appropriate debit/credit format for recording a business transaction?
Using Python, Regular Expressions, .map() and other functions as appropriate to format existing address records and...
Using Python, Regular Expressions, .map() and other functions as appropriate to format existing address records and eliminate records with missing critical fields.Critical fieldsincludeFirstName, Lastname, Zipcode+4, and Phone number for customers. For this exercise, create an array to hold data with these 4 fields containing at least 25records. The Zipcode field should contain either traditional 5-digit Zipcode(e.g. 21801)or Zip+4 format(e.g 21801-1101). The phone numbers should contain 10-digit (e.g. 5555555555)or formatted 10-digit(e.g. 555-555-5555). Some records might be corrupt so the data needs...
Using mathematical notation where appropriate, briefly define the following properties of preferences: (i) completeness, (ii) transitivity,...
Using mathematical notation where appropriate, briefly define the following properties of preferences: (i) completeness, (ii) transitivity, (iii) monotonicity, (iv) convexity, (v) continuity and (vi) rationality
State three reasons why you need to document budget and present data in clear format appropriate...
State three reasons why you need to document budget and present data in clear format appropriate to budget reporting?
Outline an appropriate health teaching plan for the ADOLESCENT Write a 350-word paper using APA Format...
Outline an appropriate health teaching plan for the ADOLESCENT Write a 350-word paper using APA Format Assignment
Define the following terms. Drag the terms on the left to the appropriate blanks on the...
Define the following terms. Drag the terms on the left to the appropriate blanks on the right to complete the sentences. ResetHelp heritability QTL monozygotic twins correlation polygenes additive alleles dizygotic twins 1.   is a statistic that varies from -1 to +1 and describes the extent to which variation in one trait is associated with variation in another. 2.   is a measure of the degree to which the phenotypic variation of a given trait is due to genetic factors. 3.   are situations in...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT