Transcript Slide 1
Maintaining Data Integrity in a cluster randomised trial (The PRINCE Study) Bernard McCarthy ,Declan Devane Collette Kirwan , Dympna Casey, Kathy Murphy, Lorraine Mee, Adeline Cooney School of Nursing & Midwifery National University of Ireland Galway Aim of the Presentation This presentation will highlight the process undertaken to minimise errors in the study database prior to analysis for the PRINCE study. Why this Topic ??? •Multiple stages involved in a research study before the collected information is ready for analysis. •Well reported that errors are introduced into the information at different points along the process (Goldberg, Niemierko & Turchin 2008, Day, Fayers & Harvey 1998, King & Lashley 2000) •Yet the integrity of any study is dependent on the quality of the data available. Background A cluster randomised controlled trial evaluating the effectiveness of a structured pulmonary rehabilitation education programme for improving the health status of people with chronic obstructive pulmonary disease (COPD). PRINCE Trial – CONSORT FLOW DIAGRAM PRINCE Trial Quantitative Outcome Measurements Primary: Chronic Respiratory Questionnaire (CRQ) Four Domains: Combined Domains: Dyspnoea Physical Fatigue Emotional Function Psychological Mastery Secondary: Incremental Shuttle Walk Test Muscle Test Self-Efficacy for Managing Chronic Disease Economic Analysis: EQ5D Utilisation of Health care Services The higher the quality of the data entry, the greater the “reliability” resulting in more “convincing inference” from the study. (Day, Fayers & Harvey 1998) Where do the errors arise? Error arising in databases usually arise from one of 3 sources: Originating at the original data collection Incorrect interpretation of what was entered on the original documentation Transcription errors when entering the data into the database. (Goldberg, Niemierko & Turchin 2008) Original Data Errors • Little can be done after the event to overcome the first source of these errors as very few research teams can afford or undertake double transcription of the original data collection. “Getting it right first time” is what is critical here. • Streamlining of data collection tools • Adequate training of the data collectors • Verification of answers with the participants can improved quality of initial data collection. Inputting Errors • Post interview computer entering of research data for analysis is a tedious duty and is well known for being “an error prone task” (Polit and Beck 2010) • Overcome by direct computer entry at initial interview, followed by verification of the inputted answers with the patient. Data Verification for Errors • Several methods of data verification exist that are useful for identification of errors in transcribing especially typographical and to a lesser extent interpretation errors. • No matter which method is utilised it is impossible to identify all the errors • Most of the literature on this topic is based on the level of error rates for different forms of data entry. The Process of Validating • 1st port of call is visually scanning of the electronic data for obvious errors (Polit & Beck 2004, Day et al. 1998) often refer to this as: “exploratory data analysis” • The first stage of this process was to identify and check outliers • “range for outliers” were mainly based on: clinical judgments, or values falling outside three standard deviations of the mean (Day, Fayers & Harvey 1998) Wild Codes & Consistency Checking • Stage 2 was identifying “wild codes” Codes which appear in the data set that are not possible options for selection. (Polit & Beck 2004). • Stage 3 “consistency checking” was undertaken on certain components of the data and focused on the internal consistency of the data (Polit & Beck 2004, Goldberg, Niemierko & Turchin 2008). – The data set was checked for errors of compatibility between answers. In the current data set patient indicating that they use inhaler but have no medications listed under medications. Key Points • Returning to the original case record form or the patient notes available for verification was essential. • Visual verification of data is deemed critical to data quality no matter what alternative form of sophisticated validation is undertaken. (Day, Fayers & Harvey 1998) Clustering of Data Errors • Goldberg, Niemierko & Turchin (2008) in their analysis of research databases identified that data entry errors were often grouped around location in the database fields in the data entry forms. • The presence of one data error on the demographic information screen greatly increased the probability of another data error in another field on the same screen. • This justifies the need to increase vigilance in areas surrounding where errors are found. Double Data Entry or Not??? • Double data entry is the primary data verification method utilised in clinical trials to ensure quality (Kleinman 2001, Day, Fayers & Harvey 1998). • An increasing debate has arisen in the literature over the need for complete double data entry. • The controversy centres on the level and type of errors identified against the additional cost and effort involved. Gibson et al. (1994) and several other studies highlighted that the typical gain in data quality cannot justify the cost of complete double data entering. Alternative to Double Data Entry • Klienman (2001) presents the adaptive double data entry system (ADDER) which helps decide on which forms for an individual data inputted needs to be double entered. • The decision is based on the estimated probability that a forms contain errors. The probability is calculated on the rate of errors identified in the previous group of double entered forms. • ADDER offers increased data quality at a minimal cost for those who believe complete double date entry is not necessary. • . Alternative continued • Target auditing of specific documents was raised by Rostami, Nahm and Pieper (2009). • Auditing only “critical data” in terms of the data imputer’s record of errors. • They argue that it is reducing the “critical variable error rate” which is more important than the “overall error rate” when examining data entry. Why one would waste resources auditing data that is “not critical to the final analysis” when this would decrease the emphasis on the essential items? • The Institute of Medicine cautions that, errors or spurious data found on examination of any part of your dataset will call into question the entire dataset The Approach Taken for PRINCE • King and Lashley (2000) presented a validation approach which avoids double data entry. • Using single data entry in conjunction with visual record verification of selected records identified using a statistically developed continuous sampling plan (CSP). • Rate of visual verification is determined by 2 factors the proportion of errors identified and the anticipated average outgoing quality. • From this information the CSP was developed CSP Plan • The CSP developed in which (i) indicates the number of consecutive records needed to be clear of errors before the sampling of a fraction of the records can commence (f). • Once an error is found in the records 100% checking re commences until (i) number of clear consecutive records have been found. • This CSP approach is reported to; reduces time associated with double data entry, it enables the calculation of the gain in data record quality and demonstrates a large improvement in data quality over single data entry alone Continuous Sampling Plan PRINCE A CSP-1 gives the number of successive records with no data entry errors that must be inspected i before a random sample fraction f of records will begin. Whenever an error is found, the error is corrected and the successive record checking using i is repeated. An incoming data field error rate of 0.4% was calculated from a visual inspection of 2490 completed fields (9 field errors; none on primary outcome). To maintain an average outgoing quality (AOQ) of 0.4%, a CSP-1 plan of i = 15 and f = 0.2 (20%) was implemented An important principle of data entry quality was raised by Day, Fayers & Harvey (1998) who stated that; “building quality controls into the system is more productive than just adding on checks onto the end”. •This is further supported by Rostami ,Nahm and Pieper (2009) who concluded that: much higher data quality can be achieved by undertaking sequenced small audits across the duration of data entry, rather than a large scale audit of the data when all the data has been entered. • The PRINCE team support this since, repeated errors in inputting on the same component of the form were highlighted early. Thus corrective action was put in place to eliminate recurring errors. • The continuous sampling plan presented by King & Lashley (2000) integrates very well with this concept allowing small audits and continuous corrective action to ensure data quality. Questions & Discussion