Bootcamp: What counts as research data?

Image: Twitter Data by Medu La
Image (below, left): Point Cloud Data by Daniel Vasquez

Research data is information that is involved directly in funded or unfunded research activities. Research data is often arranged or formatted in a such a way as to make it suitable for communication, interpretation and processing. Put more simply, research data is all of the information that you use as an integral part of your research.
Digital research data can be regarded as that created in a digital form (born digital) or converted to a digital form (digitised). Research data does not include incidental or administrative data generated in the course of personal activities, desktop or mailbox backups, or data produced by non-research activities such as University administration or teaching.
Research data can be defined by the purpose for which it is used. For instance, the same information might be research data by one researcher but not for another, depending on whether that information is being used as an integral part of a research activity.

For example, a photographic image of an old municipal building in an historical archive is an archived image in an image bank. When used by a researcher to study the history of a city, the photographic image becomes research data, for that researcher. CCTV footage may be archived by a security firm. However, when used by a researcher to study human behaviour or 21st-century surveillance methods, the video footage becomes research data, for that researcher. Data can also be created by researchers for one purpose and used by other researchers at another time for a completely different purpose. Not all research data which you use will be under your control. The need to cite the data of others is extremely common. The term 'your' research data usually refers to that data which you have either created or significantly altered through the course of your research.

1. Which of the following would not generally be considered to be research data?

A. Files generated by an electronic laboratory notebook (ELN)
B. A research progress report, submitted to the funder of the research project
C. A digitised audio recording of an interview
D. An electronic database of results
Answer: B is an example of administrative data

Classification of research data

Research data may be created by an individual researcher, created collaboratively or contributed by someone else during the course of a research project (for example, a public contribution to an online survey). The data may have been created 'from scratch' by research efforts or it may be existing data which has been transformed, adjusted or reinterpreted. The following classification of data was originally compiled by the Research Information Network and highlights the wide range of types that can exist:
Observational: data captured in real time that is usually unique and irreplaceable. For example, remote sensing data, survey data, field recordings, sample data
Experimental: data captured from lab equipment that is often reproducible. For example, gene sequences, chromatograms, magnetic field data
Models or simulation: data generated from test models where model and metadata may be more important than output data from the model. For example, climate models, economic models
Derived or compiled: resulting from processing or combining 'raw' data. For example, text and data mining, compiled databases, 3D models
Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example, gene sequence databanks, collection of letters or archive of historical images

Open question: Under which of these classifications would you assign the data you create?

Examples of research data

Research data is often thought of in fairly narrow terms, such as the results of experiments or a database of statistics. These are relevant and important kinds of research data though the term can also be used in a far broader sense to cover structured and unstructured material in a wide variety of formats (text, numerical, multimedia, software, etc.) This may include any of the following objects:

Documents (text, MS Word), spreadsheets

Scanned laboratory notebooks, field notebooks, diaries

Online questionnaires, transcripts, surveys or codebooks

Digital audiotapes, videotapes and other digital recording media

Scanned photographs or films

Transcribed test responses

Database contents (video, audio, text, images)

Digital models, algorithms, scripts

Contents of an application (input, output, logfiles for analysis software, simulations)

Documented methodologies and workflows

Records of standard operating procedures and protocols

Static and 'live' research data
Often, a research dataset can be also be classified as either 'static' (finalised data, which is no longer in the process of change) or 'live' (still in development or still undergoing some process of change). The difference between the two becomes particularly important should you wish to publish, and later cite, a dataset. Citation in the established sense, can only be achieved when data is no longer undergoing development. This does not rule out the possibility that one published (static) dataset will become out of date and be superseded by another, which may be only slightly different.
One way to reconcile the need to publish a 'locked' dataset, while continuing to develop the same data is by creating periodic 'snapshots' of the data. if this is done at regular intervals the result is a 'timelapse' effect, which can help illustrate how research has evolved.

For example, a blog may be used to collect and collate public reaction to a new social phenomena. The nature of the data involved (personal accounts, images, video and audio recordings) is necessarily 'live' and evolving. In this case the publication of periodic snapshots of the blog would not only make scholarly citation possible, but could also help to demonstrate how public reaction developed over time.

4 thoughts on “Bootcamp: What counts as research data?”

Zoe Parsons says:

2023/02/06 at 13:00

My data set is in digital form and integral to my project. It is observational as i will be utilising digital venidero/audiotapes. I believe it is static and therefore locked as it is no longer undergoing development.
zhuoren no sun says:

2023/02/17 at 06:58

thansk, no comments
Ana T. Castro-Castellon says:

2023/07/17 at 10:22

No comments
Susan Harrow says:

2023/10/11 at 12:30

My data is created by me ‘from scratch’ and is live until such time as it enters publication process.

Comments are closed.