Question

In: Economics

Suppose you are commissioned by the CDC to investigate the role of some key socio-economic factors...

Suppose you are commissioned by the CDC to investigate the role of some key socio-economic factors that may be impacting the death rate due to COVID-19. What will be a Regression Model that you can express. Please write that model and explain why the variables you chose may be of any interest. How would you perform any Hypothetical test to validate your arguments?

Expert Solution

The model was executed as a three-step strategy. Firstly, in order to visualise a base mortality risk assessment (or pre-COVID mortality risk scenario) a multi-criteria Analytic Hierarchy Process (AHP) [31] was used to compute weights (relative importance) for the nine static indicators. The pair-wise comparison in AHP is a common technique to assess the significance of each indicator [32] with a tolerable degree of inconsistency in each pairwise comparison [33]. The first and second author independently evaluated the relative importance of the factors and the discrepancies were accordingly resolved. The relative importance of weights was handpicked in accordance with the analysis of research literature, which has established the impact of various indicators on COVID-19 mortality [26]. The computed weights are summarised in a table (see Table 1). The baseline scenario represented the health risk in general terms without focusing on the COVID-19 pandemic. It showed the strength of each nation based on their economy, health infrastructure, and demography. Secondly, a multivariate linear regression model was conducted where the dependent variable was a normalised COVID-19 mortality for a country as of 13 May 2020. The independent variables were the nine static socio-economic factors described earlier. Thirdly, the regression model was repeated as mentioned in the second step but this time with the on-top addition of the six dynamic factors associated with COVID-19, giving a total of 15 independent variables. The third scenario that included COVID-19 related data alongside stringency data and static variables provided a reflection of the current pandemic state of the world.

Table 1

Base scenario weights of static factors.

Variable	Weight
Average Population Density	0.027
Population	0.039
Health Expenditure	0.058
GDP	0.09
DALY	0.157
Nurses	0.157
Physicians	0.157
Hospital Beds	0.157
A65abp	0.157
Consistency Ratio < 0.01

For the regression models, the regression predictors were then assessed for relative importance via assigning of weights using the relaimpo package [34]. Lastly, the weights obtained from the modelling were aggregated with their ranks in the form of a weighted sum (see Equation (1)):

Riski=∑j=1nwjaij

(1)

where, w = weight, a = rank value, i represents each country, and j represents each factor value of ith country.

The second stage of the analysis was a linear regression model using nine static variables as independent factors and a COVID-19 normalised mortality on 13 May 2020 as the dependent variable. The results are referred to in a table (see Table 2). R2 was such that it could explain 69% variance in the entire dataset. The ratio of the elderly in the population (or A65abp) emerged as a significant predictor. The GDP of countries and number of hospital beds were nearing that significance. Consequently, these predictors were also assigned higher relative weights by the relampo package in R; 19% for A65abp and 22% for GDP. As a means to check multicollinearity, the variance inflation factor for all predictor variables was lower than nine.

Table 2

Regression results for risk of mortality where for p-values “***” represents p<0.001 m, “**” represents p<0.01, and “*” represents p<0.05.

Regression Model	R2	Significant p-Values	Top Weights
Static factors	0.69	A65abp ***	A65abp (0.19), GDP (0.22)
Static and dynamic factors	0.88	A65abp **, nurses , susceptible , active , mortality growth	active (0.20), susceptibles (0.15), mortality growth (0.11), A65abp (0.10)

For the third step in the analysis, the regression modelling was repeated with the addition of six dynamic variables associated with COVID-19, giving a total of 15 independent variables. Then, the model was able to explain up to 88% variance in the data. The dynamic variables tended to heavily dominate over the static socio-economic factors with three dynamic factors having significant predictive power. The ratio of the elderly was yet again a significant predictor towards COVID-19 mortality risk as was the number of nurses. Furthermore, the government-enforced stringency level did not emerge as a significant predictor in this model. As a means to check multicollinearity, the variance inflation factor for all predictor variables (except GDP) was lower than 7. A table (see Table 3) is presented that summarises the top 10 countries sorted on the basis of the current mortality risk and their predicted risk ranking (both pre-COVID and on 13 May 2020) and the latter using the modelling analysis comprising of both static and dynamic indicators. The table shows that at least for this subset of 10 countries, they are at a COVID-19 mortality risk level where they were anticipated to be consistent with their baseline risk assessment.

Table 3

Top 10 countries ranked on actual mortality rate and their predicted risk assessment.

Country Name	Mortality Rate (Actual)	Pre-COVID-19 Mortality Risk Rank (Predicted)	COVID-19 Mortality Risk Rank as at 13 May 2020 (Predicted)
San Marino	1213.6	41	3
Belgium	774.2	7	8
Andorra	636.3	46	60
Spain	580.1	35	41
Italy	514.1	14	17
United Kingdom	499.1	25	16
France	403.5	11	13
Sweden	339.8	9	11
Netherlands	322.8	10	12
Ireland	308.4	27	33

A spatial map illustrates the mortality risk of COVID-19 as predicted by the third step of the analysis (see Figure 2). A spatial map was also drawn based on the change from baseline in COVID-19 mortality risk as projected from the linear regression modelling technique, which used a conglomerate of both static and dynamic factors (see Figure 3), essentially a difference between Figure 2 and Figure 3). The map clearly indicates that most countries were at a level of expected risk or lower risk on 13 May 2020 compared with what was originally predicted in the base scenario (noting how most countries are coloured in shades of yellow, orange, or green, which refers to a reduction or equivalence in risk from what was expected). All materials related to the modelling such as R code, output and base data is provided in the form of a supplementary file.

Conclusions

In this paper, a mortality risk-based evaluation of COVID-19 on a global scale using data as at 13 May 2020 is presented. Using a multi-weighted approach, a range of unique scenarios using a mixture of static and dynamic variables were incorporated. The main finding was that the ratio of the elderly in a population clearly emerged as a significant mortality risk predictor for COVID-19, however this must be considered in light of the residency makeup of individual countries. In addition, a conglomerate of static socio-economic factors and dynamic factors associated with COVID-19 growth and spread had higher predictive capability. The current stringency of government-imposed restrictions was also not observed to have an impact. In general, as on 13 May 2020, from a spatial perspective the current mortality risk projections of COVID-19 may be considered as lower or as expected for most countries around the world.

The earliest Covid-19 patients were recorded in the data set on January 22, 2020. We have taken examples from January 22, 2020 to June 29, 2020. It consists of 160 instances and five attributes. These attributes have information about the date of recording, confirmed cases, recovered cases, deaths, and growth rates related to CoViD-19 patients. The following estimates are made from the data set to explore and extract useful information.

Correlation coefficients

The statistical measure correlation coefficient is the strength of the relationship between the relative motions of two variables. The range is defined as -1 to +1. Incorrect correlation measurement occurs when values greater than +1 and less than -1. The correlation measurement at -1 is completely negative, the correlation measurement at +1 is positive, and the value at 0.0 is the nonlinear relationship between the two variables [24].

Related statistics can be used to define the relationship between different attributes of the disease. A correlation coefficient can be calculated to determine the correlation level between the confirmed cases and the recovered cases under the current pandemic situation and the rate of increase in deaths and mortality, as shown in Table 1 and Figure 3. We found that in Covid-19 confirmed case and recovered case the correlation between these two variables is highly positive.

Table 1: Correlation Coefficients of attributes

	Confirmed	Recovered	Deaths	Increase rate
Confirmed	1.000000	0.986051	0.988177	-0.378478
Recovered	0.986051	1.000000	0.950569	-0.337027
Deaths	0.988177	0.950569	1.000000	-0.401742
Increase rate	-0.378478	-0.337027	-0.401742	1.000000

ARIMA Model Results

In the ARIMA model, we choose the parameters p, d, q [28]. For this reason, even without drawing graphics, we use auro_arima to find the appropriate parameters. The auro_arima work works by directing differencing tests like Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller or Phillips– Perron to decide the request for differencing, d, and afterward fitting models inside scopes of characterized start_p, max_p, start_q, max_q ranges [25]. In the event that the occasional discretionary is empowered, auto_arima likewise tries to distinguish the ideal P and Q hyper-boundaries in the wake of directing the Canova-Hansen to decide the ideal request of occasional differencing, D. The following figure 4 shows the parameters obtained by the auro_arima model.

When viewing the residual plot from the auto_arima model, as shown in Figure 5.

The output of the auto_arema model is explained as follows:

Standardized residual: The error of the residual is near the mean of the zero line and has a uniform variance.

Histogram and density plot: In the figure below, the density plot shows the equal distribution around the zero line average.

QQ-plot: In the QQ chart, all blue dots (ordered distribution of residuals) are on the red line, and any deviations will be skewed by the line. It is usually distributed along N (0, 1) and is considered to be uniformly distributed.

Correlogram: Correlogram or ACF plots show that the residual error isn't autocorrelated. Any autocorrelation implies that Residual error.

The optimal values of p, d, and q obtained by the auto_arima model are 1, 2, and 2, respectively. Now, using the best parameters obtained (1, 2, 2) to create an ARIMA model, the results are shown in figure 6.

Figure 6 above shows the importance of the ARIMA model. In this figure, we will focus on the coefficient table. The coef section shows the weight of each element and how each element affects the time series. P> | z | this section provides advice on the importance of the weight of each element. Here, the p-value of each weight is less than or close to 0.05, so it is wise to include each weight in our model.

These views make us think that our model can create a good fit, which can help us understand time series information and calculate future value. Although we have a reasonable fit, we can occasionally change some limitations of the ARIMA model to improve the model's aggressiveness. We have obtained a model for the time series and can now use it to create estimates [26]. We first compare the predicted value with the actual estimated value of the time series, which will help us understand the accuracy of the prediction. The numbers and associated confidence intervals we have now created can now be used to additionally understand time series and predict what to store. Our data shows that relying on time series can maintain a consistent growth rate.

As our predictions for the future say, it is normal to be less optimistic about our values. This is reflected by the deterministic interval generated by our model, as we further develop, the deterministic interval will become larger and larger. We start predicting death cases in a test data set that maintains 95% confidence. Figure 7 below shows the prediction results.

In the figure below, the actual death of the training data set is shown by the blue line, and the predicted death is shown by the red line. The prediction of death on the red line has dropped, which means that in the future, the incidence of deaths will become shorter and shorter, as more and more people recovered quickly, and people maintained the social distance in this pandemic situation.

By using statistical data, we created summary metrics that classify and collect residuals into single value, which are related to the model's a predictive ability.

In order to judge the prediction results, let us apply commonly used accuracy indicators, the results are shown in table 2.

Table 2: Correlation Coefficients of attributes

Measures of Accuracy	Value
Mean Absolute Error (MAE)	0.12044588473307338
Mean Squared Error (MSE)	0.023012953284359018
Root Mean Squared Error (RMSE)	0.15170020858376898
Mean Absolute Percentage Error (MAPE)	0.009196691386663233

The MAE of our model is 0.1204, which is quite small suppose our data death case starts at 0.01.

For MSE, the value 0.0230 is less than MAE. We found this to be the case: MSE is an order of magnitude smaller than MAE.

The value 0.1517, of RMSE is similar to standard deviation and is a measure of how much the residual distribution is.

Around 0.91% MAPE implies the model is about 99.09% accurate in predicting the test set observations.

Regression Model Results

In order to find out which factor has the most significant influence on the forecasted output and how the various factors identify each other, we will consider different input functions such as "confirmation case", "recovered case" and "increase rate". Based on these characteristics, we will predict the deaths of Covid-19 patients. The data set splited into 80%:20% as training and testing respectively.

In multiple linear regression, then regression the model has selected the best coefficients for all attributes [27]. The coefficients of the regression model are shown in Table 3 below.

Table 3: coefficients of regression model

Attributes	Coefficient
Confirmed	0.103305
Recovered	-0.100568
Increase rate	69.616876

From the table 3, it is clear that if increase in “recovered case” by 1 unit, there is decrease of “death case” by 0.1005 units vice versa. Similarly, increase in “confirmed case” and “increase rate” by 1 unit, there is increase in “death case” by 0.1033 units and 69.6168 units respectively.

Now we predict the test data to check the difference between the actual value and the predicted value in Table 4 below.

Table 4: Difference between the actual value and predicted value

Instance Number	Actual Value	Predicted Value
110	286697	221975.301362
112	297539	286646.565236
143	430047	423127.482077
7	133	-6528.684075
44	3459	-2713.950271
101	244129	236968.993751
122	342565	329894.990367
66	31990	47224.597929
85	148157	160515.287829
86	157022	167041.159151
133	386298	376198.729391
92	193926	198189.689192
26	1868	-1385.556916
146	443685	438945.896459
119	328483	318945.015040
62	19026	25233.066196
51	5411	808.770349
97	221109	221511.564448
128	365380	355638.073651
90	180475	187102.115303

When plotting and comparing the actual value and the predicted value, as shown in Figure 8.

As shown in the multiple regression model shown in Table 4 and Figure 8, the initial predicted number of deaths has increased compared with actual deaths, but as we progress in the data table, compared with actual deaths, the predicted deaths the number has decreased from the month of May 2^nd 2020.

Overall, this study shows that the reduction in deaths worldwide is a good sign for human society.

Rahul Sunny answered 3 years ago

How do "socio-demographic" factors affect levels of economic growth?

What role does socio-economic status play in shaping trends?

in your OWN words - Discuss the different socio, cultural, economic and political factors that shape...

in your OWN words - Discuss the different socio, cultural, economic and political factors that shape accounting standards in countries across the world. How do these factors determine the primary role of accounting and the purpose of financial reporting in a country?

Assume that you work in a cell biology lab and investigate the role of some biomarkers...

Assume that you work in a cell biology lab and investigate the role of some biomarkers in cell differentiation. Your advisor asked you to determine whether or not the cells that you have prepared in the lab has any potential to form stem cells. In previous experiments, other researchers have found that early biomarkers known as Oct4, Sox2, KLF4 and MYC were an indicator for stem cell formation. After you labelled your cells and counted the expression of these biomarkers...

Mention some key factors that have contributed to the UAE’s economic growth over the last 15...

Mention some key factors that have contributed to the UAE’s economic growth over the last 15 years. You may use quantitative data and information to illustrate your arguments

please be thorough discuss how socio, economic and politics play a role in public health

What are some of the ways that we can fix the disparities based on socio economic...

What are some of the ways that we can fix the disparities based on socio economic status ?

Discuss the Economic Analysis by considering Key Economic Factors for Saudi Aramco

What are some of the key factors you would consider as a manager to decide on...

What are some of the key factors you would consider as a manager to decide on multi-product pricing? In a monopolistically competitive market, advertizing plays a major role in keeping consumers informed of potential product differentiation features. Please share your personal thoughts about what massive advertising campaigns, do you think massive advertising budgets are complementing new product innovations or it is taking away from some funds that could have been used in new product innovations. Please feel free to use...

What role(s) are rural community banks established to play in the socio-economic development of Ghana? In...

What role(s) are rural community banks established to play in the socio-economic development of Ghana? In your own estimation, and with specific example from your area (using a practical case) how well have they performed or are performing such role(s) in the annals of Ghana.