National Pupil Database (NPD)
Full title of dataset: National Pupil Database.
Period covered: From 2002.
Frequency of release: Annual.
Data updated: Throughout the year.
Format output: SPSS, CSV or text file (other options may be available on request).
Size: The NPD holds a number of different datasets, holding thousands of records each.
Data quality: Generally of high standard. Please see specific section for more information.
Example variables: Postcode, ethnicity, attainment, free meals eligibility, special needs.
Weighted or unweighted: Unweighted.
Documentation: All documentation relating to the NPD is available to download from the Department for Education’s website here. Alternatively, the ADLS has grouped this documentation into a single PDF pack, or you can download the individual documents in the ‘Access’ section towards the bottom of this page.
The ADLS provide regular updates on the release of the NPD data. Follow us on Twitter to receive this information.
Contact Details: For applications and also general enquiries please contact the NPD Data Warehouse team at NPD.firstname.lastname@example.org or telephone 01325 735432.
The National Pupil Database (NPD) is one of the richest education datasets in the world, holding a wide range of information about students who attend schools and colleges in England. The NPD combines the examination results of pupils with information on pupil and school characteristics and is an amalgamation of a number of different datasets, including Key Stage attainment data and Schools Census data (formerly known as PLASC) which are linked using a unique identifier for each pupil.
The NPD provides detailed information about children’s education at different stages (pre-school, primary and secondary education and further education). The data held includes detailed information about pupils’ test and exam results, prior attainment and progression at different key stages for all schools in the state sector in England. Attainment data is also held for pupils and students in non-maintained special schools, sixth-form and FE colleges and (where available) independent schools.
The NPD also includes further information about pupils in the state sector and non-maintained special schools such as gender, ethnicity, first language, eligibility for free school meals, information about special educational needs (SEN) and detailed information about pupil absence and exclusions.
The Key Stage data contain the attainment scores for children taking key stage examinations in each year. Thus, each key stage (1 to 5) is contained on a separate file and each of these files contains around 700,000 records (although this depends upon the size of the cohort in each year).
Key Stage tests are taken at five different age points, typically at 7, 11, 14, 16 and 18. Key Stage 1 and Key Stage 3 are now examined through teacher assessment, whilst Key Stage 2 is assessed through national tests. Key Stage 4 relates to attainment in GCSE and equivalent qualifications and covers all candidates who were at the end of Key Stage 4 in the current year and who sat their exams at institutions in England. Key Stage 5 covers all 16-18 year old candidates who attempted a GCE/GCE Applied A Level or GCE Applied Double Award Level or Level 3 qualification equivalent in size to at least one A level in the current year and who sat their exams at institutions in England.
The Schools Census dataset contains approximately eight million records per year and includes variables on the pupil’s home postcode, gender, age, ethnicity, special educational needs (SEN), free school meals eligibility, and schooling history. It covers pupils in state-funded primary, secondary, nursery, special schools and pupil referral units. Schools that are entirely privately funded are not included.
Prior to 2007, the Schools Census dataset was known as the Pupil Level Annual Schools Census (PLASC). The only reason for this name change was because a decision was made to collect this data three times a year. The Key Stage attainment datasets are updated yearly.
Whilst the Key Stage attainment datasets and School Census dataset can be used independently, they are more valuable as a merged dataset in order to combine information on pupil characteristics with attainment scores. As such, the NPD has been widely used for research purposes. The linked dataset can include historic information from previous versions of the School Census/PLASC and for previous attainment exams.
PLUG is run by the University of Bristol and is a source for information, training and documentation about the NPD. Further information is available from here.
There is also a dedicated NPD wiki page which provides useful information and an opportunity to share your thoughts and questions here.
The National Pupils Database covers all pupils in state (or partially state-funded) schools in England. Similar systems operate across the rest of the UK.
The PLASC data was first collected in 2002 and was updated annually until 2007. In 2007, the PLASC was renamed the School Census. The same data are now collected three times a year in January, May and October.
The initial year for which Key Stage attainment data were first collected varies according to the examination of interest. For example Key Stage 2 data was first collected in 1996 and Key Stage 5 data was first collected in 2002.
Variables are updated at different points in the year.
As the NPD is a combination of many datasets, each dataset is updated at different points throughout the year. As an example, the Schools Census is updated three times a year (available in June, August and January) and the Key Stage attainment dataset is updated annually.
For further information on NPD dataset releases download our administrative data timetable here. You can also follow the ADLS on Twitter to receive information about when new NPD releases are available.
Examples of usage
The National Pupil Database forms a significant part of the evidence base for the education sector and supports a number of key priorities around accountability and school improvement. For example:
- It supports schools’ operational decisions by enabling schools and inspectors to interrogate test and examination results in order to identify strengths and weaknesses and help them focus on those areas that need most improvement, including lesson planning and support for individual pupils.
- It is the source for a wide range of analysis and statistics published in statistical first releases (SFRs) on the Department’s website. For example, each year the Department publishes information about National Curriculum assessment and GCSE attainment by key pupil characteristics such as ethnic group, special educational needs status and free school meals eligibility in England.
- The Department uses the data to provide accurate targeting of funding for local authorities and schools, including the Pupil Premium and Revenue Support Grant. The data and statistical analyses are also used to inform, influence and improve education policy and to monitor the performance of the education service as a whole.
- The data held in the NPD is also used in a wide range of research including being linked, with consent, to a variety of external datasets.
Research and Statistics Gateway
The NPD is also the source for analyses in Statistical First Releases (SFRs) published in the Department for Education Research and Statistics Gateway. For example, each year the Department for Education publishes an SFR on National Curriculum assessment and GCSE attainment by key pupil characteristics such as ethnic group, special educational needs status and free school meals take-up in England.
The inclusion of postcode in the School Census enables various residency based analyses, e.g. showing pupil attainment and absence by residence, analyses of cross border movement, distances travelled to school and links to other data sets including IDACI and ACORN.
The NPD has allowed the calculation of Contextualised Value Added (CVA) measures, which take account not only prior attainment, but also a number of other pupil and school characteristics associated with performance differences which are outside schools’ control, such as gender, special educational needs, movement between schools, and family circumstances. KS1-2, KS2-3 and KS3-4 CVA measures were published for the first time in the 2007 Achievement and Attainment Tables (AATs).
Data in the NPD are also being used to match to previous surveys such as the Longitudinal Study of Young People in England (LSYPE) and as such have been used as a sampling frame for surveys carried out as part of research and evaluation of education policies. Linking data in this way has reduced the reporting burden placed on schools.
Some examples of work carried out using the Schools Census / PLASC can also be found on the PLUG website.
Publications using this research data are available from our Publication Hub here.
The National Pupil Database can be used for comparison of attainment and pupil characteristics at school and local authority level as well as for detailed analysis at pupil level.
To understand how the NPD files from different years can be linked together please refer to the document ‘Structure Cohorts’ available from the PLUG website. Examples of different file formats are also available there too.
The Education (Individual Pupil Information) (Prescribed Persons) (England) Regulations 1999 only allows for pupil information to be passed to certain third parties in limited circumstances. This means that the NPD data can only be passed to persons conducting research into the educational achievements of pupils and who therefore require individual pupil information for that purpose. If you are unsure if your research proposal will meet this criteria please contact the ADLS for further advice.
It is recommended that you fully familiarise yourself with the dataset, timelines and variables available prior to completing your application by studying the user guides available from the Department for Education website here or alternatively downloading them from our Department for Education PDF pack here. This also contains information on using and interpreting the attainment variables.
The National Pupil Database is large and can be complex to use, particularly when working with longitudinal data. Due to the size of the datasets it is recommended that each data file is reduced to the key variables and cases of interest prior to carrying out any analyses.
- Version 1 at Key Stage 1/3 is considered as ‘provisional data’ and is the data used to populate the Statistical First Release.
- Version 2 at Key Stage 1/3 is considered as ‘final data’.
- Unamended data at Key Stage 2/4/5 is considered as ‘raw data’.
- Amended data at Key Stage 2/4/5 is data where the schools/LA’s have had an opportunity to inform the DfE of irregularities and errors found within the Unamended data.
- Final data at Key Stage 2/4/5 may have had some fine tuning from the previous 2 processes.
The Secretary of State has specific powers to share pupil data from the NPD with named bodies and third parties who require access to the data to undertake research into the educational achievements of pupils only (and not for any other purpose) under strict terms and conditions. If you are unsure if your research proposal will meet this criteria please contact the ADLS for further advice.
To make an application, the Department for Education have a specialist Data Warehouse team that deal with enquiries and applications for the NPD data. To ensure that the sharing of data is proportionate and that the different users of the data can access the information they need, the Department has developed a number of standard extracts which they hope will fulfil many peoples’ needs and speed up access to the data. The extracts have been organised into four different tiers of access, each with their own governance arrangements as follows:
Tier 1: Individual pupil level data – identifiable and / or identifiable and highly sensitive
Individual pupil level extracts that include identifying and highly sensitive information about pupils and their characteristics including items described as ‘sensitive personal data’ within the Data Protection Act 1998.
Tier 2: Individual pupil level data – identifiable and sensitive
Individual pupil level extracts that include sensitive information about pupils and their characteristics including items described as ‘sensitive personal data’ within the Data Protection Act 1998 which have been recoded to become less sensitive.
Tier 3: Aggregate School level data – identifiable and sensitive
Aggregated extracts of school level data from the Department’s school level database which could include items described as ‘sensitive personal data’ within the Data Protection Act 1998 and could include small numbers and single counts.
Tier 4: Individual pupil level data – identifiable
Individual pupil level extracts that do not contain information about pupils and their characteristics which is considered to be identifying or described as sensitive personal data within the Data Protection Act 1998.
Researchers can still request bespoke extracts of the data if needed and will likely be required to submit a business application with the main application.
To help decide which extract is needed, researchers should consider whether they require a dataset containing pupil level or aggregated data, whether sensitive data is required and whether any identifying and highly sensitive data items are needed. To help make this decision the Department for Education has produced an NPD data table which details the tier each NPD variable is associated with. This can be downloaded from their website here and is also available in our DfE PDF pack here.
Researchers should then fully familiarise themselves with all the NPD user documentation prior to completing an application. This includes the guide and protocol, individual declaration and data security questionnaires. Once satisfied the application pack should then be completed. Researchers should provide as much information in the forms as possible, including the nature of the research and details of all colleagues involved in the analysis of the data. This will help the approvals panel to deal with your request in an efficient manner.
Researchers are required to demonstrate that they will comply with all relevant requirements of the Data Protection Act 1998. In particular that they:
- Have appropriate security arrangements in place to process the data;
- Intend to use the data only for the specified purpose;
- Will keep the data only for the specified length of time; and
- Will not further disclose the data.
For all sensitive variables, a business case (included in the application pack) will need to be completed to justify their use.
Once completed the application pack together with the individual declaration and data security documents should be sent to NPD.REQUESTS@education.gsi.gov.uk.
Each request for access is assessed by the Department for Education’s Data and Statistics Division (DSD) and requests for access to identifiable and highly sensitive data will be escalated to the Data Management Advisory Panel (DMAP).
The application process is free and is expected to be completed within six to eight weeks. Successful applications for NPD data will either be sent as an SPSS, CSV or text file. If required in another format this can be advised in the application.
Data files will be sent electronically as a password protected Zip file so you will need to have a program to unzip files, for example WinZip.
The NPD is released in various different versions throughout the year. For example, the majority of the Key Stage attainment datasets have un-amended, amended and final versions. It is always best to use the latest version of the dataset where possible. There is usually little difference between the amended and final versions and so should have little effect on research outcomes, but it is best to check with the Data Warehouse team first. By following the ADLS on Twitter you can keep upto date with these releases.
The full list of documents available to download from the Department for Education website is as follows:
NPD data sources by year – contains information on the datasets in the NPD and the year they are available from. (PDF)
NPD data tables – contains detailed lists of variables, codings and tier level of access as described above. (Excel)
NPD user guide and protocol – provides the terms and conditions of use, guidelines and explanations about the NPD. (PDF)
NPD application pack – the form needed to make applications for NPD data. (Word – fillable form)
Individual declaration – agreement form to be completed before access to the data is allowed. Ascertains that you understand all the information contained in the user guide. (Word – fillable form)
NPD Data Security self-assessment questionnaire - assesses ability to handle and store NPD data.
NPD Data Security 3rd Party self-assessment questionnaire - assesses ability of additional 3rd parties to handle and store NPD data.
The ADLS produce guidance to help researchers apply for and use administrative data responsibly, including legal and data security guidance. Further information is available from here.
The data recorded in the NPD has a high level of completeness and accuracy. Department for Education derived variables are included in the dataset to allow researchers to reproduce the methodology used to calculate published results on educational attainment.
However, there may be instances where pupil data is missing. For example, if a child did not attend school on the day of an assessment or when a child has previously been educated outside of the state system or moved to England from abroad.
Although the NPD is updated annually it is released in several versions as the individual datasets contained within it are updated at different times. This is because attainment results may be challenged by schools or individuals, and errors may subsequently be found in the data.
When a new version of the NPD becomes available for the first time it is regarded as ‘provisional’ and means it could be revised at a later date. The number of revisions that do occur is generally very small and in most cases would have a negligible impact on any research findings, although it is best to contact the Data Warehouse team for clarification.
Each pupil’s home postcode is collected in the Schools Census/PLASC so it is possible to carry out analyses at small area level. However, postcode data is classified as a restricted access variable and you will have to provide a strong business case to access this information.
The complex nature of the NPD database makes it easy to duplicate requests. Double check the data that you require.
Read the NPD data user tables carefully to determine which data is sensitive and what you actually require access to. Also ensure that you really need the data that you are applying for, as a single file of PLASC/Census data typically contains over seven million records.
If you are applying for sensitive data you must fully justify your reasons for it, otherwise your application is unlikely to be successful.
The NPD is already a linked dataset. It consists of a number of different datasets including the Schools Census/PLASC datasets and the Key Stage Attainment datasets. These are linked using a unique identifier for each pupil.
The NPD datasets can also be linked longitudinally to trace pupils over time and can also be linked with other surveys of young people. For example, data linkage work has already been undertaken, or is planned to link the NPD with the Longitudinal Study of Young People in England, the Millennium Cohort Study, and the Avon Longitudinal Study of Parents & Children.
For more information on this see Hansen, K. and Vignoles, A. (2007), The use of large scale data-sets in educational research. London: TLRP or visit the British Educational Research Association which has a section on the use of large scale datasets in educational research. Further linkages to NPD data can also be seen within the Effective Provision of Pre-School Education (EPPE) project.
Research users / comments
Send us your views and comments
If you would like to send us any comments about your experiences of accessing or using this dataset please click the button below.
‘The NPD has been described as wide-ranging but at various points shallow. It lacks a number of variables common in school effectiveness research and parental occupational class, despite their potential relevance to educational improvement. That said, there is not even a near competitor, the NPD (to me) virtually cries out to be used in mixed quant/qual research, and with hardly any exception (what are now) DfE colleagues have been excellent to work with. ’
Posted by Greater London Authority (19/05/11)
Please contact us with any comments regarding this dataset.