Data Preparation and Description

Download Report

Transcript Data Preparation and Description

Data Preparation and
Description
Data Preparation: Introduction
• Once the data begin to flow, a
researcher’s attention turns to data
analysis.
• Data preparation includes editing, coding,
and data entry;
– It is the activity that ensures the accuracy of
the data and their conversion from raw form to
reduced and classified forms that are more
appropriate for analysis.
Data Preparation: Introduction
• Preparing a descriptive statistical
summary is another preliminary step
leading to an understanding of the
collected data;
– It is during this step that data entry errors may
be revealed and corrected.
Data Preparation: Editing
• The customary first step in analysis is to
edit the raw data.
• Editing detects errors and omissions,
corrects them when possible, and certifies
that maximum data quality standards are
achieved.
Data Preparation: Editing
• The editor’s purpose is to guarantee that
data are:
– Accurate;
– Consistent with the intent of the question and
their information in the survey;
– Uniformly entered;
– Complete; and
– Arranged to simplify coding and tabulation.
Data Preparation: Editing
• In the following question asked of adults aged
18 or older, one respondent checked two
categories, indicating that he was a retired
officer and currently serving on active duty.
– Please indicate your current military status:
•
•
•
•
•
•
Active duty
Reserve
Retired
National Guard
Separated
Never served in the army
Data Preparation: Editing
• The editor’s responsibility is to decide
which of the responses is both
– consistent with the intent of the question or
other information in the survey, and
– most accurate for this individual participant.
Data Preparation: Editing
Two types of editing are field editing and central
editing.
• Field Editing: In large projects, field editing
review is the responsibility of the field
supervisor;
– When entry gaps are present from interviews, a
callback should be made rather than guessing what
the respondent “probably would have said”.
– Self-interviewing has no place in quality research.
– Validating the field research is the control function of
the supervisor.
• It means he or she will reinterview some percentage of the
respondents to make sure they have participated.
• Many research firms will recontact about 10 percent of the
respondents in this process of data validation.
Data Preparation: Editing
• Central Editing: For a small study, the use of a
single editor produces maximum consistency. In
large studies, editing tasks should be allocated
so that each editor deals with one entire section.
– When replies are inappropriate or missing, the editor can sometimes
detect the proper answer by reviewing the other information in the data
set.
• It may be better to contact the respondent for correct information, if time and
budget allow.
• Another alternative is for the editor to strike out the answer if it is
inappropriate. Here an editing entry of “no answer” is called for.
– Another problem that editing can detect concerns faking an interview
that never took place.
• This “armchair interviewing” is difficult to spot, but the editor is in the best
position to do so.
• One approach is to check responses to open-ended questions. These are
most difficult to fake. Distinctive response patterns in other questions will
often emerge if data falsification is occurring. To uncover this, the editor
must analyze the set of instruments used by each interviewer.
Data Preparation: Coding
• Coding involves assigning numbers or other
symbols to answers so that the responses can
be grouped into a limited number of categories.
• In coding, categories are the partitions of a data
set of a given variable. For example, if the
variable is gender, the partitions are male and
female.
• Categorization is the process of using rules to
partition a body of data.
• Both closed and free-response questions must
be coded.
Data Preparation: Coding
• The categorization of data sacrifices some data
detail but is necessary for efficient analysis.
• Most software programs work more efficiently in
the numeric mode;
– Instead of entering the word male or female in
response to a question that asks for the identification
of one’s gender, we would use numeric codes, e.g., 0
for male and 1 for female
• Numeric coding simplifies the researcher’s task
in converting a nominal variable, like gender, to
a “dummy variable”
Data Preparation: Missing Data
• In survey studies, missing data typically occur
when participants accidentally skip, refuse to
answer, or do not know the answer to an item on
the questionnaire.
• In longitudinal studies, missing data may result
from participants dropping out of the study, or
being absent for one or more data collection
periods.
• Missing data also occur due to researcher error,
corrupted data files, and changes in the
research or instrument design after data were
collected from some participants, such as when
variables are dropped or added.
Data Preparation: Missing Data
• The strategy for handling missing data consists
of two-step process:
– the researcher first explores the pattern of missing
data to determine the mechanism for missingness
(the probability that a value is missing rather than
observed), and
– then selects a missing-data technique. The three
basic types of techniques which can be used to
salvage data sets with missing values are:
• Listwise deletion
• Pairwise deletion
• Replacement of missing values with estimated scores
Data Preparation: Data Entry
• Data entry converts information gathered by
secondary or primary methods to a medium for
reviewing and manipulation.
• Keyboarding remains a mainstay for researchers
who need to create a data file immediately and
store it in a minimal space on a variety of media.
• However, researchers have profited from more
efficient ways of speeding up the research
process, especially from bar coding and optical
character and mark recognition.
Data Preparation: Data Entry
• Keyboarding: A full screen editor, where an
entire data file can be edited or browsed, is a
viable means of data entry for statistical
packages like SPSS or SAS.
– SPSS offers several data entry products, including
Data Entry Builder which enables the development of
forms and surveys, and Data Entry Station which
gives centralized entry staff, such as telephone
interviews or online participants, access to the survey.
– Both SAS and SPSS offer software that effortless
accesses data from databases, spreadsheets, data
warehouses, or data marts.
Data Preparation: Data Entry
• Bar-code technology is used to simplify
the interviewer’s role as a data recorder.
When an interviewer passes a bar-code
over the appropriate codes, the data are
recorded in a small, lightweight unit for
translation later
• Researchers studying magazine
readership can scan bar codes to denote a
magazine cover that is recognized by an
interview participant.
Data Preparation: Data Entry
• Optical Character Recognition (OCR):
– Users of a PC image scanner are familiar with OCR
programs which transfer printed text into computer
files in order to edit and use it without retyping.
• Optical scanning of instruments is efficient for
researchers.
– Optical scanners process the marked-sensed
questionnaires and store the answers in a file.
– This method has been adopted by researchers for
data entry and preprocessing due to its faster speed,
cost savings on data entry, convenience in charting
and reporting data, and improved accuracy.
– It reduces the number of times data are handed,
thereby reducing the number of errors that are
introduced.