In: Math
A) A study is conducted to assess the relationship between the use of marijuana during pregnancy and adverse delivery outcomes, defined as major congenital malformations. The following variables are used in the analysis.
Delivery outcome: major congenital malformation versus other delivery.
Risk factors:
1. Marijuana usage during pregnancy: yes or no
2. Race: White or non-white
3. SES categorized as: low, middle, or high
4. Maternal age
5. Any previous stillbirth: yes or no
a) Write down a model to evaluate this relationship including terms in the model for the confounding factors and interactions between marijuana usage and each of the other factors. Be sure to state the coding scheme you are using to represent the variables in the model.
b) Write down the odds ratio corresponding to the model in part (a) for the odds of malformation given marijuana usage relative to the odds of malformation given no usage.
Delivery outcome: major congenital malformation versus other delivery.
Risk factors:
1. Marijuana usage during pregnancy: yes or no
2. Race: White or non-white
3. SES categorized as: low, middle, or high
4. Maternal age
5. Any previous stillbirth: yes or no
A -
This problem can be modeled using binary classification where outcome is major congenital malformation or other delivery.
The independent variables that we can consider for this model are -
1. Marijuana usage during pregnancy: yes or no
2. Race: White or non-white
3. SES categorized as: low, middle, or high
4. Maternal age
5. Any previous stillbirth: yes or no
6. Interaction of Marijuana usage and Race
7. Interaction of Marijuana usage and SES categorized as
8. Interaction of Marijuana usage and Maternal age
9. Interaction of Marijuana usage and Any previous stillbirth
Confounding variables -
Socio-economic status is likely to be negatively correlated with Marijuana usage during pregnancy (Assuming low socio-economic status to be less informed and thereby smoking up more) and with major congenital malformation (Assuming genetic factors)
In that case - Socio-economic status is a confounding variable
While modelling the variables, we use one hot encoding for all the categorical independent variables and hence for each categorical independent variable we have 1 or more variable.
Whereas, we will categorize age into groups such as (less than 20, 20-25, 25-30 and greater than 30)
Our final model will be a function of - Marijuana usage (0 or 1) , Race (0 for White and 1 for not white) , Maternal age (<20) , Maternal age (20-25) , Maternal age (25-30) and so on
That is, function of all binary variables and a dependent binary variable
The function can be anything well suited to the problem (example - cross entropy loss)
B -
Major congenital malformation | Other delivery | |
Marijuana usage - yes | A | B |
Marijuana usage - no | C | D |
Odds ratio for malformation relative to other delivery = A/(A+B)/C/(C+D)
= A * (C+D) / C * (A+B)