Fostering Interdepartmental Knowledge Communication

Download Report

Transcript Fostering Interdepartmental Knowledge Communication

Data preparation for use
in SEM
Ned Kock
Data in table format
Some groups of
columns correspond
to a latent variable.
Each column
corresponds to a
manifest variable.
Each row often contains the answers from one
subject under a particular condition, and is also
known as a “case”.
Missing values
• A missing value is an empty cell in a data table.
• Missing values are a fact of life in many areas of
research, including behavioral research.
• In terms of behavioral research, missing values
may be present when:
– Respondents do not answer one or more questions in a
questionnaire.
– A researcher empties a data cell when a respondent
answers a question with non-usable data; e.g., by
responding with a “0” (zero) when asked for his or her
age.
Examples of missing values
Datasets with
missing values
are a common
occurrence in
behavioral
research, as well
as other types of
research.
Percentage of missing data
A simple Excel formula can
be used to calculate the
percentage of missing data
for a manifest variable.
How much is too much?
A recent Monte Carlo
simulation suggests that as
much as 30% may be okay.
More than that can lead to
problems.
Supporting source: Kock, N.
(2014). Single missing data
imputation in PLS-SEM.
Laredo, TX: ScriptWarp
Systems.
Dealing with missing values
• A first step is to make an effort to ensure that no more
than 30% of the data is missing in each column of a data
table.
• The above can be accomplished by employing data
collection techniques that minimize missing data; e.g.,
targeted questionnaires and interviews.
• Then the remaining missing cells can be filled using one
of the several imputation methods, such as:
–
–
–
–
–
Arithmetic Mean Imputation
Multiple Regression Imputation
Hierarchical Regression Imputation
Stochastic Multiple Regression Imputation
Stochastic Hierarchical Regression Imputation
Missing data imputation with WarpPLS
Main menu > Settings > View or change missing data imputation settings:
Using deletion, listwise or pairwise, to deal with missing data:
Researchers have traditionally used deletion methods, often listwise and pairwise deletion,
to deal with missing data. A report by the American Psychological Association Task Force
on Statistical Inference stated that these techniques are ‘‘among the worst methods
available for practical applications’’.
Supporting source: Kock, N. (2014). Single missing data imputation in PLS-SEM. Laredo,
TX: ScriptWarp Systems.
Missing data imputation performance
Main menu > Settings > View or change missing data imputation settings:
Results from a Monte Carlo simulation:
Multiple Regression Imputation yielded the least biased mean path coefficient estimates,
followed by Arithmetic Mean Imputation. With respect to mean loading estimates,
Arithmetic Mean Imputation yielded the least biased results, followed by Stochastic
Hierarchical Regression Imputation and Hierarchical Regression Imputation.
Supporting source: Kock, N. (2014). Single missing data imputation in PLS-SEM. Laredo,
TX: ScriptWarp Systems.
Replacing missing values with SPSS
Creating source data file for WarpPLS
• Source data files contain the data used in a
WarpPLS analysis.
• They are often referred to as “raw data files”.
• Source data files should be prepared as follows:
– They should be .xls or .xlsx files (Excel), or plain text
files with the names of the variables first followed by
each data case in the same order as the variables listed
(missing data points do not have to be imputed a-priori).
– If text files, variable names and numeric data should be
separated from each other by tabs.
– If text files, the suffix of the data file should be
designated as .txt.
Using Excel to create a .txt file
Important tips
• One file format that usually works well for a .txt file, and
that is widely available is the ASCII tab-delimited
format.
• If you are using Excel to create a .txt file, save the Excelformatted file first, and create the .txt file with a different
name.
• With Excel, have only one worksheet with the raw data.
• You can also create .txt tab-delimited files using SPSS, in
which case it is important to instruct SPSS to write the
variable names into the .txt file.
– The above is done by default when you use Excel.
Reading raw data file in WarpPLS
File import wizard
Viewing and accepting data
Acknowledgements
Adapted text, illustrations, and ideas from the
following sources were used in the preparation of the
preceding set of slides:
1.
2.
3.
4.
5.
6.
Kock, N. (2015). WarpPLS 5.0 User Manual. Laredo, TX:
ScriptWarp Systems.
Kline, R.B. (1998), Principles and Practice of Structural
Equation Modeling, The Guilford Press, New York, NY.
MS Excel, SPSS, and WarpPLS software applications.
Rencher, A.C. (1998), Multivariate Statistical Inference and
Applications, John Wiley & Sons, New York, NY.
SPSS’ web site: www.spss.com.
WarpPLS software.
Final slide