In: Math
A team of visiting polio eradication workers were informed during their orientation session that population-wide studies done in their host country showed that the risk of polio in villages of that country was strongly epidemiologically associated with the village’s economic/human development circumstances, which ranged greatly from village to village. In some villages, residents lived in hand-constructed huts with no running water, no latrines or sewage disposal areas, and no electricity. In other places, residents lived in wooden or adobe homes which, though modest by Western standards, had all of the above services in place and whose street side craft shops and food markets did a brisk business, catering both to locals and visitors.
Knowing this information, the team went into several villages and attempted to assign a “human development rating” to each family. This was based on that family’s income situation, access to running water, access to elementary school for their children, and the condition of the home. To their surprise, they found that families in all the villages had no difference in polio risk based on the family’s human development rating.
This is an example of Ecological fallacy
Ecological facllacy is a fallacy in ecological studies that may arise when an researcher tries to make a conclusion about an individual based on aggregate data for a group or vice versa.
In this given example, the comparison between villages is the actual point of study, while individual families are considered for data collection, rather than group of families from each village.
Thus data collected is for each individual family but the initial inference was for villages as a whole. Even though the initial conclusion that epidemiologic factors affect polio risk maybe applicable for villages as a whole, the same conclusion may not hold when each families are considered.
How to avoid this fallacy?
To avoid or to remove this error, instead of each families data, the data from each village must be considered as a whole and aggregated. Then the results would apply to the complete dataset.