Skip to Main Content

Research Data Management: Routes for sharing data

Introduction

You can share your data in many ways.

  1. Placing in a disciplinary data repository.
  2. Placing in an institutional repository, a digital collection of the intellectual output of a research institution.
  3. Submitting data to a journal as a supplemental file.
  4. Submit to a data journal.
  5. Housing on a personal or project website, or GitHub.

Disciplinary data repositories

A data repository supports the preservation, discovery, use, reuse, and manipulation of data objects. A data repository often provides added value to data through quality assurance and metadata enhancement. Many data repositories are discipline-specific. The goal is to have a one-stop place for datasets concerning certain subjects, such as astrophysics or archeology.

ICPSR, at the University of Michigan, has one of the largest and oldest data repositories, which focuses on social sciences data.

The Registry of Research Data Repositories allows you to search for a suitable home for your data.

You can also browse this list of disciplinary data repositories.

Scholar@Simmons

The University Archives maintains Simmons University Digital Archives, a repository which collects and makes available materials that document and reflect the history, development, and operations of Simmons University, including the scholarly output of Simmons's faculty and students.  Contact archives@simmons.edu for more information on submitting your research.  

Journal Supplemental Files

Many journals encourage researchers to submit supplemental data files so as to promote research transparency. As you consider where to publish, keep an eye open for journals which have this feature. Additionally, consider open access journals, so as to further increase the reach of your research. Learn more about Open Access.

How to share your dataset on Github

Assuming your data has been fully prepared for sharing, one option is the share your dataset via Github. Github is a web-based hosting service based on the GIT version control system, generally used for software development. At first blush, one may not think it would prove to be a useful venue for open datasets. However, many of the same aspects which make Github attractive for collaborative software projects apply to data sharing as well.

First off, Github is free for open source projects. Secondly, and more importantly, is the aspect of version control. Version control allows you to easily add to and track changes to datasets. Github also allows other users to make copies of your data repository. These copies are now manipulatable independent of the original, but still linked by their histories.

The drawbacks to sharing via Github are:

  • A lack of discoverability. When fellow researchers are looking for datasets, how will they be able to find yours easily? Discoverability is one the strongest points in favor of sharing data via a disciplinary data repository.
  • Long-term preservation. Who will monitor and take care of the dataset for the long-term?
  • Limited data size. Repositories must be less than 1 GB. Individual files must be less than 100MB.

However, sharing via Github may make sense:

  • If you do not have a fitting disciplinary or institutional repository
  • Your data is within Github’s boundaries
  • You want low-cost, low-effort sharing
  • You don't have restricted data

For an in-depth guide to sharing your data via Github see here.