Journal of Postgraduate Medicine, Education and Research

Register      Login

VOLUME 53 , ISSUE 3 ( July-September, 2019 ) > List of Articles

BIOSTATISTIC SERIES

Statistics Corner: Data Cleaning-I

Citation Information : Statistics Corner: Data Cleaning-I. J Postgrad Med Edu Res 2019; 53 (3):130-132.

DOI: 10.5005/jp-journals-10028-1330

License: CC BY-NC 4.0

Published Online: 01-12-2018

Copyright Statement:  Copyright © 2019; The Author(s).


Abstract

Reality check “Let us assume that an investigator collected various demographic, clinical, psychiatric, and radiological characteristics of the study participants.” The investigator took adequate precautions to enter data in a structured format into a spreadsheet. However, before proceeding ahead, the investigator wanted to ensure that data are ready for analysis. In this context, the investigator reviewed the literature and came across the term “data cleaning.” The fellow colleagues advised him to approach a statistician for cleaning and analyses of the data. The investigator was in dilemma, whether to share the data with a statistician before or after cleaning. The investigator reviewed the literature and found some answers regarding the role and responsibilities of the investigator in data cleaning. However, the investigator still had the following questions for data cleaning. • Is data cleaning practice a part of good clinical practice (GCP)? • Is it the responsibility of a statistician to clean and code the data? • Do data cleaning begin after data entry? • How to deal with missing values at the data entry stage?


HTML PDF Share
  1. Dasu T, Johnson T. Exploratory Data Mining and Data Cleaning. New Jersey: John Wiley and Sons, Inc; 2003.
  2. Wickham H. Tidy Data. J Stat Softw 2014;59:1–23.
  3. Den Broeck JV, Cunningham SA, et al. Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities. PLoS Med 2005;2:966–970. DOI: 10.1371/journal.pmed.0020267.
  4. Errington TM, Iorns E, et al. Science forum: An open investigation of the reproducibility of cancer biology research. Elife 2014;3:e04333.
  5. Johnson VE, Payne RD, et al. On the Reproducibility of Psychological Science. J Am Stat Assoc 2017;112:1–10. DOI: 10.1080/01621459.2016.1240079.
  6. Collaboration OS. Estimating the reproducibility of psychological science. Science 2015;349:aac4716. DOI: 10.1126/science.aac4716.
  7. Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature 2012;483:531.
  8. American Statistical Association. Ethical Guidelines for Statistical Practice https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx (2018).
  9. Kishore K, Kapoor R. Statistics Corner: Structured Data Entry. JJ Postgr Med Edu Res 2019;53:94–97.
  10. Kishore K, Kapoor R. Statistics Corner: Measurement Scales. J Postgr Med Edu Res 2019;53:46–47.
PDF Share
PDF Share

© Jaypee Brothers Medical Publishers (P) LTD.