Disclosure risk guidance
If you use sensitive or personal data you must protect the confidentiality and identity of individuals otherwise both you and your organisation could face sanctions or prosecution. The main considerations are:
1. How disclosive is the data?
Assessment of disclosure risk is a complex process and is itself an active research area. However, a rule of thumb is the more detail the data has and the higher the proportion of the population of interest that is captured in the data contained, then the higher the risk.
2. How sensitive is the data?
Sensitivity is perhaps a simpler concept to understand than disclosure risk, although difficult to measure. Data which concerns vulnerable groups or contains information on income, health or the financial circumstances of individuals or households are examples of data which might be considered of higher sensitivity. There are legal definitions of sensitive data contained within but it is widely recognised that these are insufficient at present.
In most cases administrative data will contain sensitive data and present disclosure risk. Therefore, in order to obtain access to administrative data a researcher will have to understand (and implement) data security practices and take appropriate steps to ensure that any output from their research is not itself disclosive.
In practice there are two settings available to be able to research administrative data in a secure manner:
- At a secure setting at the data holding organisation.
- At a secure setting at the researcher’s institution.
Most researchers would prefer the second of these and will need to convince the data holding organisation in their application that they have an appropriate safe setting in place.
It is important to emphasise the critical nature of complying with safe setting policies and procedures. Even a single instance of non-compliance could be very damaging to both the UK research community and could also lead to stiff penalties being applied to your organisation under the Statistics and Services Registration Act 2007. Therefore, you should only pursue access to administrative data after giving careful consideration to your current data security procedures, if your data security practices need to be changed to accommodate access to an administrative dataset and the potential impact that this will inevitably have on your day to day research practice.
A Case study of a disclosure risk audit of an administrative data dissemination policy
The following real case study described by our Disclosure Control Expert outlines a typical process that is involved in the review of diclosure control practice by a data holding organisation.
“I was asked to provide advice to an administrative data holding organisation. They wished to increase the research value of their public data release by adding additional variables, and also wanted to review their current disclosure risk practice.
In order to assess whether such additional data might pose a disclosure risk I carried out a disclosure risk audit. This is in effect a surface level scan of an intended or existing data release policy. Please note that it is not a full disclosure risk analysis which involves complex statistical and computational models. However, for simple data dissemination decisions and for providing a review of existing data dissemination policies it is a very useful tool. The audit methodology I used consisted of the following five components:
(i) I firstly consulted onsite with the data release team. This is a vital component of the service and is designed to clearly establish the nature of the data and the release programme and that of any potentially linkable data.
(ii) The next step was to run some Data Environment Analysis. This is an accurate measurement of the availability of data in the data environment of a given dissemination process that will ensure that any disclosure control techniques are as efficient as possible in terms of maximising the data released. By establishing the parameters of data that could be linked to the released data, it was then possible for me to generate properly grounded scenarios of attack for disclosure risk assessments. I then used up to date information on the availability of individual variables to drive the construction of scenario frames. It allows account to be taken of the availability of individual data and possibly disclosive data from other sources.
(iii) I then carried out Attack Scenario Analysis. These statistical disclosure attack scenarios were developed by Elliot and Dale in 1999. An intruder classification scheme takes account of the data intruder’s perspectives. The focus on attack scenarios allows conceptualisation of the goals and motives of the potential intruder as opposed to more data-focused approaches which examine the risks purely in terms of the structure of the proposed release. The scenario based approach aids the understanding of the overall likelihood of a disclosure attempt by examining the social, psychological and political factors that might motivate an attack. As a result a more systematic understanding of the type of attack can be gained, and therefore the probability of disclosure given such an approach can be more accurately estimated.
(iv) The next step was to do Metadata Analysis. This involved an examination of the metadata for the data release, in particular, variable definitions, sampling, geographical detail, table definitions and the possibility of customisation etc. This involved some exploratory analysis of the data.
(v) Finally I carried out exploratory disclosure risk analyses. It is usually necessary to run a small exemplar disclosure risk analysis such as Subtraction Attribution Probability (SAP) or SUDA (special unique detection algorithm). These give an insight into the risk profile of the data whilst stopping short of a full risk analysis.
After carrying out the audit using the above methods, and considering just the data release in question, I concluded that overall disclosure risk from adding the additional variable to the data release appeared to be low. There were several caveats to this, concerning consistency of policy and practice and the continuing need to monitor both data environment and patterns of user request for data. However, overall I was able to advise that the addition of the variable posed little additional risk and further that there was some scope for relaxing the existing disclosure control strategy perhaps in response to patterns of demand from users.”
Please note that expert advice on disclosure risk and a disclosure risk analysis service is available from the ADLS.
You can now: