28 February 2010

Age distribution in household survey data

Indicators in the field of education statistics, such as those defined in the education glossary of the UNESCO Institute for Statistics, are typically calculated for specific age groups. For example, the youth literacy rate is for the population age 15 to 24 years, the adult literacy rate for the population age 15 and over, and the net attendance rates for primary and secondary education are for the population of primary and secondary school age, respectively. The net intake rate is an example for an indicator that is calculated for a single year of age, the official start age of primary school.

For a correct calculation of education indicators it is necessary to have precise age data. In the case of data collected with population censuses or household surveys this means that the ages recorded for each household member should be without error. However, census or survey data sometimes exhibit the phenomenon of age heaping, usually on ages ending in 0 and 5. Such heaping or digit preference occurs when survey respondents don't know their own age or the ages of other household members, or when ages are intentionally misreported.

The presence of age heaping can be tested with indices of age preference such as Whipple's index. Heaping can also be detected through visual inspection of the age distribution in household survey data. Figures 1 and 2 summarize the age distribution in survey data from Brazil, India, Indonesia and Nigeria. The data from Brazil were collected with a Pesquisa Nacional por Amostra de DomicĂ­lios or National Household Sample Survey in 2006. The data for the other three countries are from Demographic and Health Surveys conducted between 2005 and 2008.

Figure 1 shows the share of single years of age in the total survey sample. A preference for ages ending in 0 and 5 is strikingly obvious in the data from India and Nigeria. In the data from Indonesia, age heaping is also present, but to a lesser extent than for India and Nigeria. Lastly, the graph for Brazil is relatively smooth, indicating a near absence of age heaping.

Figure 1: Age distribution in survey data by single-year age group
Line graph with age distribution in survey data by single-year age group
Data source: Brazil PNAD 2006, India DHS 2005-06, Indonesia DHS 2007, Nigeria DHS 2008.

In Figure 2, single ages are combined in five-year age groups, from 0-4 years and 5-9 years to 90-94 years and 95 years and over. Compared to Figure 1, the distribution lines are much smoother, including for India and Nigeria. We can conclude that age heaping is problematic for education indicators that are calculated for single years, for example all children of primary school entrance age, but less so for indicators that are calculated for a larger age group, for example all children of primary or secondary school age or all persons over 15 years of age.

Figure 2: Age distribution in survey data by five-year age group
Line graph with age distribution in survey data by five-year age group
Data source: Brazil PNAD 2006, India DHS 2005-06, Indonesia DHS 2007, Nigeria DHS 2008.

Related articles
External links
Friedrich Huebler, 28 February 2010 (edited 30 September 2010), Creative Commons License
Permanent URL: http://huebler.blogspot.com/2010/02/age.html


Victorious! said...

The problem I have with the education statistics is the age cut-offs are based on UN definitions of adult, youth, or child. These definitions are different from what governments define as the cutoffs for these same categories. For example, in Sierra Leone, the government states that youth is between ages 15 and 35. In Kenya, a youth can be as old as 40. Often times, the UN for example, does not recognize these different age cut-offs.

Do you have any advice on how use international education statistics to make research an analysis of the data easier?

Friedrich Huebler said...

International agencies have to apply the same standards to all countries to make sure that indicators and data are internationally comparable. For example, statistics on the youth literacy rate always refer to the population between 15 and 24 years, regardless of how "youth" is defined nationally.

Victorious! said...

Of course, there needs to be consistency in the data. The point I'm making is that these agencies' research studies are often limited in their analyses, but this is not surprising. For me, it reinforces the need for Africans (politicians, members of civil society, diasporians etc.) to filter the conclusions from these studies before making them the basis a policy or project.

Thank you for your blog. I'll be checking it periodically. It's a good resource for me.