Skip to main content

Research Data Management: Preparing your data

Introduction

No matter how well you have managed your data over the course of a research project, there will likely be additional preparation you must do before sharing your data.

You must ensure that there are no legal restrictions on the sharing of your data.

  • Check with your research funder.
  • Make sure your Informed Consent Form allows for the sharing of data.
  • Contact the Office of Sponsored Programs for clarification.

You must take additional care with sensitive data. Sensitive data is data which includes information which could identify a being--this could be a person or, in the case of environmental research, an endangered species. Much of human subject research data is necessarily sensitive data.

Direct identifiers, such as names or Social Security numbers, should obviously not be shared. However indirect identifiers, such as zip codes together with income levels, can also be used to identify, and thus must be treated carefully. See the Resources below for guidance in dealing with sensitive data.

Keep in mind, the National Institutes of Health, which funds massive amounts of research including sensitive data maintains: “all data should be considered for data sharing.”

You must ensure that your data is coherent and contextualized. This includes clear file naming, avoiding proprietary file formats, and using standardized metadata schemas.

Definitions

Sensitive data = Sensitive data is data which includes information which could identify a being; this could be a person or, in the case of environmental research, an endangered species.

Anonymization = The process of changing identifiers, whether by removing, substituting, distorting, generalizing or aggregating.

Direct identifiers = Variables which point explicitly to particular individuals or units. Any variable that functions as an explicit name can be a direct identifier, such as a license number, phone number, or email address.

Indirect identifiers = Variables which make unique cases visible.

Restricted data = Data which are made available under strict, secure conditions. Generally confidential or sensitive data.

Metadata = “Documentation or information about a data set. It may be embedded in the data itself, or exist separately from the data. Metadata may describe the ownership, purpose, methods, organization, and conditions for use of data, technical information about the data, and other information. Many metadata standards exist across a broad range of disciplines and applications.” (Glossary of data management terms, Cornell).

Thanks also to the UK Data Service and the ICPSR’s Guide to Social Science Data Preparation and Archiving.

Resources

The ICPSR’s Guide to Social Science Data Preparation and Archiving includes the chapter, “Preparing Data for Sharing.”

 

Sharing sensitive data, Stanford University Libraries.

 

Guide to anonymizing data from the UK Data Service.

 

Managing and Sharing Data, a short guide by the UK Data Archive.