Question

In: Economics

Suppose you are commissioned by the CDC to investigate the role of some key socio-economic factors...

Suppose you are commissioned by the CDC to investigate the role of some key socio-economic factors that may be impacting the death rate due to COVID-19. What will be a Regression Model that you can express. Please write that model and explain why the variables you chose may be of any interest. How would you perform any Hypothetical test to validate your arguments?

Solutions

Expert Solution

The model was executed as a three-step strategy. Firstly, in order to visualise a base mortality risk assessment (or pre-COVID mortality risk scenario) a multi-criteria Analytic Hierarchy Process (AHP) [31] was used to compute weights (relative importance) for the nine static indicators. The pair-wise comparison in AHP is a common technique to assess the significance of each indicator [32] with a tolerable degree of inconsistency in each pairwise comparison [33]. The first and second author independently evaluated the relative importance of the factors and the discrepancies were accordingly resolved. The relative importance of weights was handpicked in accordance with the analysis of research literature, which has established the impact of various indicators on COVID-19 mortality [26]. The computed weights are summarised in a table (see Table 1). The baseline scenario represented the health risk in general terms without focusing on the COVID-19 pandemic. It showed the strength of each nation based on their economy, health infrastructure, and demography. Secondly, a multivariate linear regression model was conducted where the dependent variable was a normalised COVID-19 mortality for a country as of 13 May 2020. The independent variables were the nine static socio-economic factors described earlier. Thirdly, the regression model was repeated as mentioned in the second step but this time with the on-top addition of the six dynamic factors associated with COVID-19, giving a total of 15 independent variables. The third scenario that included COVID-19 related data alongside stringency data and static variables provided a reflection of the current pandemic state of the world.

Table 1

Base scenario weights of static factors.

Variable Weight
Average Population Density 0.027
Population 0.039
Health Expenditure 0.058
GDP 0.09
DALY 0.157
Nurses 0.157
Physicians 0.157
Hospital Beds 0.157
A65abp 0.157
Consistency Ratio < 0.01

For the regression models, the regression predictors were then assessed for relative importance via assigning of weights using the relaimpo package [34]. Lastly, the weights obtained from the modelling were aggregated with their ranks in the form of a weighted sum (see Equation (1)):

Riski=∑j=1nwjaij

(1)

where, w = weight, a = rank value, i represents each country, and j represents each factor value of ith country.

The second stage of the analysis was a linear regression model using nine static variables as independent factors and a COVID-19 normalised mortality on 13 May 2020 as the dependent variable. The results are referred to in a table (see Table 2). R2 was such that it could explain 69% variance in the entire dataset. The ratio of the elderly in the population (or A65abp) emerged as a significant predictor. The GDP of countries and number of hospital beds were nearing that significance. Consequently, these predictors were also assigned higher relative weights by the relampo package in R; 19% for A65abp and 22% for GDP. As a means to check multicollinearity, the variance inflation factor for all predictor variables was lower than nine.

Table 2

Regression results for risk of mortality where for p-values “***” represents p<0.001 m, “**” represents p<0.01, and “*” represents p<0.05.

Regression Model R2 Significant
p-Values
Top Weights
Static factors 0.69 A65abp *** A65abp (0.19),
GDP (0.22)
Static and dynamic factors 0.88 A65abp ***,
nurses *,
susceptible *,
active ***,
mortality growth **
active (0.20),
susceptibles (0.15),
mortality growth (0.11),
A65abp (0.10)

For the third step in the analysis, the regression modelling was repeated with the addition of six dynamic variables associated with COVID-19, giving a total of 15 independent variables. Then, the model was able to explain up to 88% variance in the data. The dynamic variables tended to heavily dominate over the static socio-economic factors with three dynamic factors having significant predictive power. The ratio of the elderly was yet again a significant predictor towards COVID-19 mortality risk as was the number of nurses. Furthermore, the government-enforced stringency level did not emerge as a significant predictor in this model. As a means to check multicollinearity, the variance inflation factor for all predictor variables (except GDP) was lower than 7. A table (see Table 3) is presented that summarises the top 10 countries sorted on the basis of the current mortality risk and their predicted risk ranking (both pre-COVID and on 13 May 2020) and the latter using the modelling analysis comprising of both static and dynamic indicators. The table shows that at least for this subset of 10 countries, they are at a COVID-19 mortality risk level where they were anticipated to be consistent with their baseline risk assessment.

Table 3

Top 10 countries ranked on actual mortality rate and their predicted risk assessment.

Country Name Mortality Rate
(Actual)
Pre-COVID-19
Mortality Risk Rank
(Predicted)
COVID-19 Mortality
Risk Rank as at
13 May 2020 (Predicted)
San Marino 1213.6 41 3
Belgium 774.2 7 8
Andorra 636.3 46 60
Spain 580.1 35 41
Italy 514.1 14 17
United Kingdom 499.1 25 16
France 403.5 11 13
Sweden 339.8 9 11
Netherlands 322.8 10 12
Ireland 308.4 27 33

A spatial map illustrates the mortality risk of COVID-19 as predicted by the third step of the analysis (see Figure 2). A spatial map was also drawn based on the change from baseline in COVID-19 mortality risk as projected from the linear regression modelling technique, which used a conglomerate of both static and dynamic factors (see Figure 3), essentially a difference between Figure 2 and Figure 3). The map clearly indicates that most countries were at a level of expected risk or lower risk on 13 May 2020 compared with what was originally predicted in the base scenario (noting how most countries are coloured in shades of yellow, orange, or green, which refers to a reduction or equivalence in risk from what was expected). All materials related to the modelling such as R code, output and base data is provided in the form of a supplementary file.

Conclusions

In this paper, a mortality risk-based evaluation of COVID-19 on a global scale using data as at 13 May 2020 is presented. Using a multi-weighted approach, a range of unique scenarios using a mixture of static and dynamic variables were incorporated. The main finding was that the ratio of the elderly in a population clearly emerged as a significant mortality risk predictor for COVID-19, however this must be considered in light of the residency makeup of individual countries. In addition, a conglomerate of static socio-economic factors and dynamic factors associated with COVID-19 growth and spread had higher predictive capability. The current stringency of government-imposed restrictions was also not observed to have an impact. In general, as on 13 May 2020, from a spatial perspective the current mortality risk projections of COVID-19 may be considered as lower or as expected for most countries around the world.

The earliest Covid-19 patients were recorded in the data set on January 22, 2020. We have taken examples from January 22, 2020 to June 29, 2020. It consists of 160 instances and five attributes. These attributes have information about the date of recording, confirmed cases, recovered cases, deaths, and growth rates related to CoViD-19 patients. The following estimates are made from the data set to explore and extract useful information.

Correlation coefficients

The statistical measure correlation coefficient is the strength of the relationship between the relative motions of two variables. The range is defined as -1 to +1. Incorrect correlation measurement occurs when values greater than +1 and less than -1. The correlation measurement at -1 is completely negative, the correlation measurement at +1 is positive, and the value at 0.0 is the nonlinear relationship between the two variables [24].

Related statistics can be used to define the relationship between different attributes of the disease. A correlation coefficient can be calculated to determine the correlation level between the confirmed cases and the recovered cases under the current pandemic situation and the rate of increase in deaths and mortality, as shown in Table 1 and Figure 3. We found that in Covid-19 confirmed case and recovered case the correlation between these two variables is highly positive.

Table 1: Correlation Coefficients of attributes

Confirmed

Recovered

Deaths

Increase rate

Confirmed

1.000000

0.986051

0.988177

-0.378478

Recovered

0.986051

1.000000

0.950569

-0.337027

Deaths

0.988177

0.950569

1.000000

-0.401742

Increase rate

-0.378478

-0.337027

-0.401742

1.000000

ARIMA Model Results

In the ARIMA model, we choose the parameters p, d, q [28]. For this reason, even without drawing graphics, we use auro_arima to find the appropriate parameters. The auro_arima work works by directing differencing tests like Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller or Phillips– Perron to decide the request for differencing, d, and afterward fitting models inside scopes of characterized start_p, max_p, start_q, max_q ranges [25]. In the event that the occasional discretionary is empowered, auto_arima likewise tries to distinguish the ideal P and Q hyper-boundaries in the wake of directing the Canova-Hansen to decide the ideal request of occasional differencing, D. The following figure 4 shows the parameters obtained by the auro_arima model.

When viewing the residual plot from the auto_arima model, as shown in Figure 5.

The output of the auto_arema model is explained as follows:

Standardized residual: The error of the residual is near the mean of the zero line and has a uniform variance.

Histogram and density plot: In the figure below, the density plot shows the equal distribution around the zero line average.

QQ-plot: In the QQ chart, all blue dots (ordered distribution of residuals) are on the red line, and any deviations will be skewed by the line. It is usually distributed along N (0, 1) and is considered to be uniformly distributed.

Correlogram: Correlogram or ACF plots show that the residual error isn't autocorrelated. Any autocorrelation implies that Residual error.

The optimal values of p, d, and q obtained by the auto_arima model are 1, 2, and 2, respectively. Now, using the best parameters obtained (1, 2, 2) to create an ARIMA model, the results are shown in figure 6.

Figure 6 above shows the importance of the ARIMA model. In this figure, we will focus on the coefficient table. The coef section shows the weight of each element and how each element affects the time series. P> | z | this section provides advice on the importance of the weight of each element. Here, the p-value of each weight is less than or close to 0.05, so it is wise to include each weight in our model.

These views make us think that our model can create a good fit, which can help us understand time series information and calculate future value. Although we have a reasonable fit, we can occasionally change some limitations of the ARIMA model to improve the model's aggressiveness. We have obtained a model for the time series and can now use it to create estimates [26]. We first compare the predicted value with the actual estimated value of the time series, which will help us understand the accuracy of the prediction. The numbers and associated confidence intervals we have now created can now be used to additionally understand time series and predict what to store. Our data shows that relying on time series can maintain a consistent growth rate.

As our predictions for the future say, it is normal to be less optimistic about our values. This is reflected by the deterministic interval generated by our model, as we further develop, the deterministic interval will become larger and larger. We start predicting death cases in a test data set that maintains 95% confidence. Figure 7 below shows the prediction results.

In the figure below, the actual death of the training data set is shown by the blue line, and the predicted death is shown by the red line. The prediction of death on the red line has dropped, which means that in the future, the incidence of deaths will become shorter and shorter, as more and more people recovered quickly, and people maintained the social distance in this pandemic situation.

By using statistical data, we created summary metrics that classify and collect residuals into single value, which are related to the model's a predictive ability.

In order to judge the prediction results, let us apply commonly used accuracy indicators, the results are shown in table 2.

Table 2: Correlation Coefficients of attributes

Measures of Accuracy

Value

Mean Absolute Error (MAE)
0.12044588473307338
Mean Squared Error (MSE)
0.023012953284359018
Root Mean Squared Error (RMSE)
0.15170020858376898
Mean Absolute Percentage Error (MAPE)
0.009196691386663233

The MAE of our model is 0.1204, which is quite small suppose our data death case starts at 0.01.

For MSE, the value 0.0230 is less than MAE. We found this to be the case: MSE is an order of magnitude smaller than MAE.

The value 0.1517, of RMSE is similar to standard deviation and is a measure of how much the residual distribution is.

Around 0.91% MAPE implies the model is about 99.09% accurate in predicting the test set observations.

Regression Model Results

In order to find out which factor has the most significant influence on the forecasted output and how the various factors identify each other, we will consider different input functions such as "confirmation case", "recovered case" and "increase rate". Based on these characteristics, we will predict the deaths of Covid-19 patients. The data set splited into 80%:20% as training and testing respectively.

In multiple linear regression, then regression the model has selected the best coefficients for all attributes [27]. The coefficients of the regression model are shown in Table 3 below.

Table 3: coefficients of regression model

Attributes

Coefficient

Confirmed

0.103305

Recovered

-0.100568

Increase rate

69.616876

From the table 3, it is clear that if increase in “recovered case” by 1 unit, there is decrease of “death case” by 0.1005 units vice versa. Similarly, increase in “confirmed case” and “increase rate” by 1 unit, there is increase in “death case” by 0.1033 units and 69.6168 units respectively.

Now we predict the test data to check the difference between the actual value and the predicted value in Table 4 below.

Table 4: Difference between the actual value and predicted value

Instance Number

Actual Value

Predicted Value

110
286697

221975.301362

112
297539  
286646.565236
143
430047

423127.482077

7
133
-6528.684075
44
3459

-2713.950271

101
244129  
236968.993751
122
342565  

329894.990367

66
31990   
47224.597929
85
148157  

160515.287829

86
157022  
167041.159151
133
386298  
376198.729391
92
193926  
198189.689192
26
1868
-1385.556916
146
443685  
438945.896459
119
328483  
318945.015040
62
19026   
25233.066196
51
5411 
808.770349
97
221109  
221511.564448
128
365380  
355638.073651
90
180475  
187102.115303

When plotting and comparing the actual value and the predicted value, as shown in Figure 8.

As shown in the multiple regression model shown in Table 4 and Figure 8, the initial predicted number of deaths has increased compared with actual deaths, but as we progress in the data table, compared with actual deaths, the predicted deaths the number has decreased from the month of May 2nd 2020.

Overall, this study shows that the reduction in deaths worldwide is a good sign for human society.


Related Solutions

How do "socio-demographic" factors affect levels of economic growth?
How do "socio-demographic" factors affect levels of economic growth?
What role does socio-economic status play in shaping trends?
What role does socio-economic status play in shaping trends?
Assume that you work in a cell biology lab and investigate the role of some biomarkers...
Assume that you work in a cell biology lab and investigate the role of some biomarkers in cell differentiation. Your advisor asked you to determine whether or not the cells that you have prepared in the lab has any potential to form stem cells. In previous experiments, other researchers have found that early biomarkers known as Oct4, Sox2, KLF4 and MYC were an indicator for stem cell formation. After you labelled your cells and counted the expression of these biomarkers...
in your OWN words - Discuss the different socio, cultural, economic and political factors that shape...
in your OWN words - Discuss the different socio, cultural, economic and political factors that shape accounting standards in countries across the world.  How do these factors determine the primary role of accounting and the purpose of financial reporting in a country?
Mention some key factors that have contributed to the UAE’s economic growth over the last 15...
Mention some key factors that have contributed to the UAE’s economic growth over the last 15 years. You may use quantitative data and information to illustrate your arguments
please be thorough discuss how socio, economic and politics play a role in public health
please be thorough discuss how socio, economic and politics play a role in public health
What are some of the ways that we can fix the disparities based on socio economic...
What are some of the ways that we can fix the disparities based on socio economic status ?
What are some of the key factors you would consider as a manager to decide on...
What are some of the key factors you would consider as a manager to decide on multi-product pricing? In a monopolistically competitive market, advertizing plays a major role in keeping consumers informed of potential product differentiation features. Please share your personal thoughts about what massive advertising campaigns, do you think massive advertising budgets are complementing new product innovations or it is taking away from some funds that could have been used in new product innovations. Please feel free to use...
Discuss the Economic Analysis by considering Key Economic Factors for Saudi Aramco
Discuss the Economic Analysis by considering Key Economic Factors for Saudi Aramco
What role(s) are rural community banks established to play in the socio-economic development of Ghana? In...
What role(s) are rural community banks established to play in the socio-economic development of Ghana? In your own estimation, and with specific example from your area (using a practical case) how well have they performed or are performing such role(s) in the annals of Ghana.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT