BootcampEPSRC: Data publishing

Data publishing

Section 4 of 7

PathwayImage: Pathway [cropped], Unsplash, Pixabay, Public Domain

If you acknowledge EPSRC funding in a paper you are also expected to make the data supporting that paper available and accessible to others. For papers published after 1st May 2015, the data, or a description of it, should be available online as soon as the article is published. Published research data should also be accessible for a minimum of 10 years from the end of any embargo, or from the date of last use of the data, whichever is later.

Data repositories

For digital data, the easiest way to meet this expectation is often by depositing it with a data repository. A data repository will store your data securely for the long term, will assign a DOI or other persistent identifier to your data, and will also enter a description of your data onto a publicly-accessible catalogue to ensure that it is findable. A repository may also offer additional preservation support such as format shifting to ensure that your data remains usable in the future.

There are different types of data repository - institutional repositories, such as Bristol’s data repository, data.bris, will accept research data from researchers at that institution only. Commercial repositories such as figshare or Zenodo are broader in scope and will usually accept data from any researcher, but may apply stricter limits on deposit size. Disciplinary repositories generally accept any data relating to a particular subject. Using a disciplinary repository means your data will be stored alongside similar datasets, and may increase the visibility of your work. You can explore the repositories available in your discipline at http://www.re3data.org/browse/by-subject/.

The University’s data repository will preserve your data for at least 20 years; if you use a different repository, check that they meet the 10-year requirement. You can use any repository as long as it meets this requirement.

Other methods

Depending on the type of data you use, you may also be able to preserve your data by including it in the text of your paper, or as supplementary information published alongside your paper. These methods are also acceptable to EPSRC, but be aware that there are often restrictions on the size and file formats that can be shared in these ways. This may mean that you are unable to share your data in the way that makes it most usable to others.

It is not recommended to use project or personal websites, or personal storage media, to preserve data as these are less likely to meet the 10-year longevity requirement.

Data that does not support a paper

EPSRC also has expectations around data that is not used to support a paper, but is collected as part of an EPSRC-funded project. Not all data has to be retained, but some may be likely to have future use. The Research Data Service has produced a brief data evaluation guide to help you decide what data to keep. If you have data that fits these criteria, you need to publish either the data or a description of it within a certain time frame:

  • For data generated on or after 1st May 2015, publish the data or a description of it within 12 months of generation
  • For data with no clear date of generation, publish the data or a description of it within 12 months of the end of the grant.

If you choose to publish just a description of your data, also known as ‘metadata for discovery’, you should make sure that this description includes the following details:

  • What research data exists
  • Why it was generated (e.g. details of the associated research project)
  • When it was generated
  • How it was generated
  • How it can be accessed

Note that you are not expected to make your data openly available if there are sound reasons why this is not appropriate. Acceptable reasons for restricting access will be addressed later in this tutorial.

1. You have gathered a large dataset relating to stress testing composite samples for a project that began in January 2016. The dataset has been continually added to from this date as new experiments have been performed, so there is no single date of generation. Some of the data has already been published as it supports papers, but you think that the entire dataset might be useful. When should you make the dataset available?