This site uses cookies to provide a better user experience. For further information click here. To remove this message click here.
Top banner

Administrative data introduction

Please find below information about what is administrative data, its advantages and disadvantages, how other countries are making use of this type of data and a brief summary of how administrative data can be linked to other sources.

What is administrative data?
Administrative data refers to information collected primarily for administrative (not research) purposes.  This type of data is collected by government departments and other organisations for the purposes of registration, transaction and record keeping, usually during the delivery of a service.

In the UK, government departments are the main (although not exclusive) purveyors of large administrative databases, including welfare, tax, health and educational record systems. These datasets have for many years been used to produce official statistics to inform policy-making.  The potential for this data to be accessed for the purposes of social science research is increasingly recognised, although as yet has not been fully exploited. Two areas of research – education and health – have seen fairly extensive use of administrative data[1], but most other administrative datasets have not been widely used for research purposes.

The ADLS website provides a wealth of information about many of these data sources and how researchers can access them.

Advantages and disadvantages of using administrative data in research
Administrative datasets are typically very large, covering samples of individuals and time periods not normally financially or logistically achievable through survey methods.  Alongside cost savings, the scope of administrative data is often cited as its main advantage for research purposes.  Other advantages include relieving the burden on survey respondents and providing data on individuals who would not normally respond to surveys. 

The criticisms levelled at these resources relate to the lack of control the researcher has during the data collection stage and how this affects what can be done with the data.  More general concern has also been voiced about the lack of well established theory and methods to guide the use of administrative data in social science research[2].

The table below summarises some of the general advantages and disadvantages of using administrative data.  As with any method of data collection, when deciding whether and when to use administrative data it is necessary to weigh up the pros and cons in relation to the specific research situation.

Advantages of administrative data  Disadvantages of administrative data

Already collected for operational purposes and therefore no additional costs of collection (though costs of extraction and cleaning).

Information collected is restricted to data required for administrative purposes – limited to users of services and administrative definitions.

Collection process not intrusive to target population.

Lack of researcher control over content.

Regularly (sometimes continuously) updated.

Proxy indicators sometimes have to be used.

Can provide historical information and allow consistent time-series to be built up.

May lack contextual/background information.

Collected in a consistent way (if part of national system)

Changes to administrative procedures could change definitions and make comparison over time problematic.

Subject to rigorous quality checks.

Missing or erroneous data.

Near 100% coverage of population interest.

Quality issues with variables less important to the administrator e.g. address details may well not be updated.

Reliable at the small area level.

Metadata issues (may be lacking or of poor quality).

Counterfactuals and controls can be selected post hoc.

Data protection issues.

Captures individuals who may not respond to surveys.

Access for researchers is dependent on support of data providers.

Potential for datasets to be linked to produce powerful research resources (see below).

Underdeveloped theory and methods.

Source: Smith, G., Noble, M., Anttilla, C., Gill, L., Zaidi, A., Wright, G., Dibben, C and Barnes, H. (2004) The Value of Linked Administrative Records for Longitudinal Analysis, Report to the ESRC National Longitudinal Strategy Committee.

Experiences from other countries – Statistics Finland
While in the UK greater awareness of the potential uses of administrative data has developed, several of the Nordic countries (notably Denmark, Sweden and Finland) are more advanced in their utilisation of these resources.  In Finland, for example, the use of administrative data for the purpose of producing social and economic statistics is well established in law as well as practice.  Statistics Finland, the country’s central statistics office, collects almost all (96 per cent[3] of its data from administrative sources.  Since 1990 Finland has also been fully reliant on administrative data to produce the population and housing census.  The Finnish Statistic Act (2004)[4] is based on the central tenet that wherever possible pre-existing administrative data should be sourced before making requests on the general public to provide information. 

In addition to relieving the financial and operational burden of census data collection, Statistics Finland has identified three broad uses for administrative data that are central to their work[5].

1. Direct use of administrative data to produce national economic and social statistics, for example crime rates, election statistics and employment statistics[6].

2. Linking different complementary administrative datasets.  Data linkage is facilitated through concerted collaboration efforts between data holding authorities, and a well established unified system of personal identity codes used across different datasets.

3. Combining survey and administrative data.  Conducting surveys is still an important part of Statistics Finland’s work and administrative data is used to provide sampling frames, improve the quality of survey data once it is collected, check for errors, impute missing data and supplement the data, allowing the survey to concentrate on information not available elsewhere.   

Below are links to the central statistics organisations in three of the Nordic countries and the Netherlands.

1. Statistics Denmark
2. Statistics Finland
3. Statistics Sweden
4. Statistics Netherlands

Data linkage
There are various ways in which extracts of administrative data can be linked with other data sources to create more comprehensive and effective datasets for analysis. The most obvious is the linkage of different years of data within a data source. This is frequently done with datasets like the National Pupil Database (a database containing information on pupils at maintained schools, including their examination results) where the same individuals can be in annual cuts of a data source for many years and a longitudinal record for individuals can be created.  However, administrative data can be successfully linked with a variety of other data sources, for example:

1. Linking individual level administrative data with other individual level administrative data via a unique identifier or fuzzy matching methods (matching personal details like names, date of birth, address etc). Administrative databases tend to be largely department or function specific and there has been little linkage between different datasets.

2. Linking individual level administrative data with cross-sectional or longitudinal survey data usually via fuzzy matching methods. There are a number of examples of this type of linkage (either planned or successfully accomplished), including many of the large longitudinal studies in the UK.

3. Linking individual level administrative data with contextual information on, for example, the neighbourhood (e.g. using the Index of Multiple Deprivation or Index of Deprivation Affecting Children Index) or organisation relevant to the individual (e.g. school or university attended).

For further information on data linkage techniques see Gill (2001)[7] and also the ESRC’s NCRM Research Node A.D.M.I.N (Administrative Data – Methods, Inference and Networks), which runs courses on data linkage[8].

[1] ADLS has produced publications on the uses and potential of administrative data for both education and health research, which can be viewed by clicking here

[2] For a full discussion of theory and methods relating to the use of administrative data see Wallgren, A. Wallgren, B. (2007). ‘Register-based Statistics: Administrative Data for Statistical Purposes’, John Wiley & Sons.

[3] Statistics Finland, ‘Use of Registers and Administrative Data Sources for Statistical Purposes’, 2004:7.

[4] Finnish Statistics Act (2004).

[5] For a fuller discussion of these uses see: Statistics Finland, ‘Use of Registers and Administrative Data Sources for Statistical Purposes’, 2004:16-19 – weblink as above in [4].

[6] These are similar to the type of statistics published in the UK by the Office for National Statistics (ONS).

[7] Gill, L., (2001) ‘Methods for Automatic Record Matching and Linking and Their Use in National Statistics’, National Statistics Methodology Series, No. 25, London: Office for National Statistics.

[8] See

You can now:

Content image