Row * Level Metadata

Download Report

Transcript Row * Level Metadata

Row – Level Metadata
Gregory Steffens
Associate Director, Programming
Novartis
Why Do We Need Row-Level Metadata?
If we know why, we will know how to design it and when to use it
 A requirement for describing tall-thin data sets in studies and in data
standards
• Storing data in –TESTCD --ORRES kinds of data set requires more than a
simple metadata that can describe data sets and variables
• These data sets have several variables that the simple metadata cannot
describe, including ORRES, ORRESU, STRESN, STRESC, STRESU,
STRESPOS, etc.
• In the simpler world these test results and attributes would be stored in
short-wide data sets in variables like HEIGHT, HEIGHT_UNIT, WEIGHT,
WEIGHT_UNIT, SYSBP, SYSBP_UNIT, SYSBP_POS
• Storing these test results in ORRES kinds of variables does not mean we
need less metadata, a lesser number of variables does not mean a lesser
amount of metadata. ORRES contains many virtual variables we need to
describe just as if they were in a simple short-wide data set.
 A prerequisite for software to transform data for reporting purposes
2 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
An Example of a Short-Wide Data Set
A variable for each result and result unit
USUBJID
HEIGHT
HEIGHTU WEIGHT
WEIGHTU BMI
1
74
IN
LBS
190
3 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
BMIU
SEX
24.39 KG/M**2 MALE
Some of the Metadata to Describe Short-Wide Data
A simple description of the attributes of these variables
USUBJID
HEIGHT
HEIGHTU WEIGHT
WEIGHTU BMI
1
74
IN
LBS
190
BMIU
SEX
24.39 KG/M**2 MALE
TABLE
COLUMN
CTYPE CLENGTH CLABEL
VS
USUBJID
C
15
Subject ID
VS
HEIGHT
N
8
Subj Height
2.0
VS
HEIGHTU
C
10
Height Unit
HTU
VS
WEIGHT
N
8
Subj Weight
3.0
VS
WEIGHTU
C
12
Weight Unit
WTU
VS
BMI
N
8
Subj BMI
5.2
BMIFORMULA
VS
BMIU
C
7
BMI Unit
BMIU
BMIUNIT
VS
SEX
C
6
Subj Gender
SEX
4 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
CFORMAT
CDERIVATION
USUBID
Same Values in a Tall-Thin Data Set
Results now all in 1 variable and units in 1 other variable
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
5 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Some Metadata to Describe the Tall-Thin Data Set
Row-level metadata must define all the attributes of a variable but for a
subset of the rows defined by each unique value of xxTESTCD
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
TABLE
COLUMN
PARAM
PARAMREL
CTYPE
CLENGTH
CLABEL
CFORMAT
VS
VSTESTCD
HEIGHT
VSORRES
N
8
Subj Height
2.0
VS
VSTESTCD
HEIGHT
VSORRESU
C
10
Height Unit
HTU
VS
VSTESTCD
WEIGHT
VSORRES
N
8
Subj Weight
3.0
VS
VSTESTCD
WEIGHT
VSORRESU
C
12
Weight Unit
WTU
VS
VSTESTCD
BMI
VSORRES
N
8
Subj BMI
5.2
BMIFORMULA
VS
VSTESTCD
BMI
VSORRESU
C
7
BMI Unit
BMIU
BMIUNIT
6 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
CDERIVATION
Categories of Variables in Tall-Thin Data Sets
Metadata must fully describe all the attributes of all the categories
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
VARIABLE TYPE
EXAMPLES
PRIMARY KEYS
USUBJID, VSTESTCD
PARAMETER VARIABLE NAME
VSTESTCD (the last pkey)
PARAMETER VARIABLE VALUES
HEIGHT, WEIGHT, BMI
PARAMETER-RELATED
VSORRES, VSORRESU
vsstresn, vsstresc, vsstresu, vspos, vsloc
7 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
PARAMETER-NONRELATED
SEX
What row-level metadata is NOT!
Not meant to define other relationships in study metadata
 NOT a list of values, ValueList is not simply a list of values
 Row-level metadata is not designed to define all the other
relationships between study variables
 It is designed as metadata, i.e. to describe the ItemDef
attributes of virtual variables. That is, to describe the
attributes of parameter-related variables for each value of
–TESTCD
 It should not be used for non-metadata purposes
• NOT to define the height unit of measure as being inches in the USA but
centimeters in the EU
• NOT to look for males with positive pregnancy test results
8
• NOT to define all the edit checks. That can be data driven but NOT by row| Presentation
| Presenter Namewhich
| Date | Subject
Business Use Only
level Title
metadata,
is |inadequate
to this task because it only enables
single-domain where conjuncts
Problem Solved
Metadata and a pair of macros enables easy transformation of data
 Transforming data between short-wide and tall-thin data sets is now a very
simple macro call
 %dt_wide2thin(data=vitals,out=vs,mdlib=md)
 %dt_thin2wide(data=vs,out=vitals,mdlib=md)
 The tall-thin and short-wide data structures are not perfect for all uses,
summary tables, listings, deriving new parameter results from mutiple
parameter results, comparing parameter results, etc.
 Tall-thin is very better for storage, summary tables
 Short-wide is better for listings, deriving, comparing
 Define file and data transparency achieved
9 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Variable Categories Described
 Primary Keys
• Defined in the COLUMNS metadata set
 Parameter Name
• A special primary key that defines the kind of result for the current row
• Defined in the COLUMNS_PARAM metadata set
 Parameter Value
 Parameter-related
• Each non-key variable whose attributes each differ across rows but are the same
attributes for the subset of rows defined by parameter variable xxTESTCD. These are
“virtual variables”.
• Defined in the COLUMNS_PARAM metadata set
 Parameter-nonrelated
• Each non-key variable whose attributes do not differ across rows and are not
dependent on the parameter variable
• Defined in the COLUMNS metadata set
10 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Columns_param Metadata Set
 The list of attributes in columns_param are identical to the
list in columns. That is to say, everything you need to
describe about a short-wide column must be described
about the tall-thin parameter-related column.
 Storing the study data in tall-thin data sets does not
reduce the amount of metadata definition that is required
 In data set TABLE, when variable COLUMN equals the
value PARAM then the attributes of variable PARAMREL
are described in the columns_param metadata set row
 There are many other variable attributes than in the
example, but these were subsetted to fit in a slide
11 | Presentation Title | Presenter Name | Date | Subject | Business Use Only