In: Statistics and Probability
Consider the claim “Using regression analysis to study economic growth is flawed because many of the truly important underlying determinants, such as culture and institutions, are very hard to measure.” Discuss this statement paying particular attention to simple cross-section data and panel data models.
Claim:
"Using regression analysis to study economic growth is flawed because many of the truly important underlying determinants, such as culture and institutions, are very hard to measure"
Discussion:
Introduction:
Someone once said that"It is a mark of a primitive society to view regression as progress". It is indeed flawed to assume that regression can completely solve real-time problems related to economic growth in this era of globalization. The term Economic Growth refers to- the increase in the capacity of the economy to produce goods and services, compared to one period of time to another. On the other hand, Regression Analysis is a statistical method that aims to determine the strength and character of the relationship between a dependent variable and a set of independent variables also known as explanatory variables. The mathematical form of the regression analysis model is of the form:
where,
Now for a valid regression model, there are some assumptions which are needed to be fulfilled by any given dataset, in which we are trying to fit the model. The corresponding assumptions are:
Cross-sectional Data: Cross-sectional data usually refers to the observations of individuals or groups, which is often a subset of the data obtained in the population, at a single point of time. This is like a snapshot of the data necessary for a study, for a given objective. Ex: Number of customers in a coffee shop during a particular day, Number of marks scored in different subjects by a student in an exam, Blood glucose level of all the persons having diabetes, e.t.c
Panel Data/ Longitudinal Data: Panel data refers to the multi-dimensional data, involving measurements over time. Panel data contain observations of multiple phenomena obtained over multiple time periods for the same firms or individuals. Ex: Number of persons smoking for the past five years, etc.
Drawbacks:
Now, In economics, correlations are common. But identifying whether the correlation between two or more variables represents a causal relationship is rarely so easy in our real-time datasets. Countries that trade more with the rest of the world also have higher income levels—but does this mean that trade raises income levels always? To help answer these types of questions, economists use regression analysis. Regressions, as we have seen earlier are used to quantify the relationship between one variable and the other variables that are thought to explain it; regressions can also identify how close and well determined the relationship is. Despite their benefits, regressions are prone to pitfalls and often misused. Take the following four leading difficulties. These are:
Linear Regression Is Limited to Linear Relationships: It means that when the dependent variable and independent variables are not linear in nature. Our model can be highly flawed.
Linear Regression Only Looks at the Mean of the Dependent Variable: Linear regression looks at a relationship between the mean of the dependent variable and the independent variables. Just as the mean is not a complete description of a single variable, linear regression is not a complete description of relationships among variables. You can deal with this problem by using quantile regression.
Linear Regression Is Sensitive to Outliers: Outliers are data that are surprising. Outliers can be univariate (based on one variable) or multivariate. They can mislead the entire analysis.
Data Must Be Independent: Linear regression assumes that the data are independent. That means that the scores of one subject (such as a person) have nothing to do with those of another. This is often, but not always, sensible. Two common cases where it does not make sense are clustering in space and time.
Omitted variables: It is necessary to have a good theoretical model to suggest variables that explain the dependent variable. In the case of a simple two-variable regression, one has to think of the other factors that might explain the dependent variable. In our example, even when IQ is included, the correlation between education and earnings may reflect yet some other factor that is not included. That is, the individuals in the sample may still be different in some "unobserved" way that explains their subsequent earnings, possibly through their education choices. Individuals from wealthy families usually have better access to education, but family wealth may also create more connections in the labor market, leading to higher earnings. Thus, parental wealth may be another variable that should be included. Hence, we see in many cases, we tend to avoid variables, which were supposed to be very important for the model.
Reverse causality: Many theoretical models predict bidirectional causality in a given dataset—that is, a dependent variable can cause changes in one or more explanatory variables. For example, higher earnings may enable people to invest more in their own education, which, in turn, raises their earnings. This complicates the way regressions should be estimated, calling for special techniques applicable for particular cases.
Mismeasurement: Factors might be measured incorrectly for a variety of reasons. For example, aptitude is difficult to measure, and there are well-known problems with IQ tests. As a result, the regression using IQ might not properly control for aptitude, leading to inaccurate or biased correlations between education and earnings. This happens in a lot of cases.
Too limited a focus: A regression coefficient provides information only about how small changes—not large changes—in one variable relate to changes in another. It is useful sometimes to show how a small change in education is likely to affect earnings but it will not allow the researcher to generalize the effect of large changes. If everyone became college-educated at the same time, a newly minted college graduate would be unlikely to earn a great deal more because the total supply of college graduates would have increased dramatically.
Conclusion:
Coming back to where we started: "Using regression analysis to study economic growth is flawed because many of the truly important underlying determinants, such as culture and institutions, are very hard to measure", I would like to say that the most of the determinant or explanatory variables in our regression models are coming from the cultural backgrounds and institutions of various sectors, which ultimately finds its way to give us flawed results.