Restructuring longitudinal data

Download Report

Transcript Restructuring longitudinal data

A guide to the unknown…


A dataset is longitudinal if it tracks the same type
of information on the same subjects at multiple
points in time or space. For example, part of a
longitudinal dataset could contain specific
students and their standardized test scores in six
successive years.
One type of Longitudinal data is also known as
“Panel data” and is data from a (usually small)
number of observations over time on a (usually
large) number of cross-sectional units like
individuals, households, firms, or governments.




Subset of hierarchical data — observations that are
correlated because there is some tie to same unit.
E.g. in educational studies, where we observe
student i in school u. Presumably there is some tie
between the observations in the same school.
In such data, observe yj,u where u indicates a unit
and j indicates the j’th observation drawn from
that unit. Thus no relationship between yj,u and yj,u’
even though they have the same first subscript.
In true longitudinal data, t represents comparable
time.

One approach to working with longitudinal
data sets is to restructure the data set- either
going from one observation per subject to
several or vice versa. For example, you may
have several diagnosis codes in a single
observation (visit) and want to compute
frequencies of each possible diagnosis code. To
do this, you will find it more convenient to
have one observation for each diagnosis code,
resulting in possibly several observations per
subject.


Data structure analysis includes making sure that
all the components of the data structures are
closely related, that closely related data are not in
separate structures, and that the best type of data
structure is being used. The data may be a lot
easier to manage and understand when it is a
representation which tries to abstract its relevant
similarities.
Often, in data warehouses, data restructuring
involves changing some aspects of the way
wherein the database is logically or physically
arranged.









There are generally four types of data restructuring operations namely:
Trimming
Flattening
Stretching
Grafting
In trimming, the extracted data from the input is placed in the output
without having to change any of the change in the hierarchical
relationships but some unwanted components of the data removed.
In flattening, the operation produced a form from a structure branch of an
input by extracting all information at the level of the values of the basic
attributes of the branch.
The stretching operating can produce a data structure output which has
hierarchical levels than the input.
Finally, a grafting operating involves combining two hierarchies
horizontally to form a wider hierarchy by matching common values.









In SPSS you go to data/restructure. This allows you to
restructure your data from multiple variables(columns)
in a single case to groups of related cases(rows) or vice
versa, or you can choose to transpose your data.
SPSS SYNTAX:
VARSTOCASES
/ID=id
/MAKE trans1 FROM VAR00001 VAR00002 VAR00003 VAR00004
/INDEX=Index1(4)
/KEEP=
/NULL=KEEP.




















You can create observations using an array staement and a do loop or you can simply transpose the existing
data.
data neonatal;
infile 'F:\Thesis Docs\Data\neonatal.txt' delimiter='09'x truncover dsd missover obs=104;
input location $ _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_
_2003_ _2004_ _2005_ _2006_ _2007_;
run;
proc sort data=neonatal;
by location;
run;
proc transpose data=neonatal
out=neonatal2
name=year
prefix=neonatal;
by location;
var _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_
_2003_ _2004_ _2005_ _2006_ _2007_;
run;
data neonatal3 (drop=neonatal2);
set neonatal2;
run;
proc print data=neonatal3 noobs;
run;

Restructuring is fun!