Row * Level Metadata
Download
Report
Transcript Row * Level Metadata
Row – Level Metadata
Gregory Steffens
Associate Director, Programming
Novartis
Why Do We Need Row-Level Metadata?
If we know why, we will know how to design it and when to use it
A requirement for describing tall-thin data sets in studies and in data
standards
• Storing data in –TESTCD --ORRES kinds of data set requires more than a
simple metadata that can describe data sets and variables
• These data sets have several variables that the simple metadata cannot
describe, including ORRES, ORRESU, STRESN, STRESC, STRESU,
STRESPOS, etc.
• In the simpler world these test results and attributes would be stored in
short-wide data sets in variables like HEIGHT, HEIGHT_UNIT, WEIGHT,
WEIGHT_UNIT, SYSBP, SYSBP_UNIT, SYSBP_POS
• Storing these test results in ORRES kinds of variables does not mean we
need less metadata, a lesser number of variables does not mean a lesser
amount of metadata. ORRES contains many virtual variables we need to
describe just as if they were in a simple short-wide data set.
A prerequisite for software to transform data for reporting purposes
2 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
An Example of a Short-Wide Data Set
A variable for each result and result unit
USUBJID
HEIGHT
HEIGHTU WEIGHT
WEIGHTU BMI
1
74
IN
LBS
190
3 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
BMIU
SEX
24.39 KG/M**2 MALE
Some of the Metadata to Describe Short-Wide Data
A simple description of the attributes of these variables
USUBJID
HEIGHT
HEIGHTU WEIGHT
WEIGHTU BMI
1
74
IN
LBS
190
BMIU
SEX
24.39 KG/M**2 MALE
TABLE
COLUMN
CTYPE CLENGTH CLABEL
VS
USUBJID
C
15
Subject ID
VS
HEIGHT
N
8
Subj Height
2.0
VS
HEIGHTU
C
10
Height Unit
HTU
VS
WEIGHT
N
8
Subj Weight
3.0
VS
WEIGHTU
C
12
Weight Unit
WTU
VS
BMI
N
8
Subj BMI
5.2
BMIFORMULA
VS
BMIU
C
7
BMI Unit
BMIU
BMIUNIT
VS
SEX
C
6
Subj Gender
SEX
4 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
CFORMAT
CDERIVATION
USUBID
Same Values in a Tall-Thin Data Set
Results now all in 1 variable and units in 1 other variable
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
5 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Some Metadata to Describe the Tall-Thin Data Set
Row-level metadata must define all the attributes of a variable but for a
subset of the rows defined by each unique value of xxTESTCD
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
TABLE
COLUMN
PARAM
PARAMREL
CTYPE
CLENGTH
CLABEL
CFORMAT
VS
VSTESTCD
HEIGHT
VSORRES
N
8
Subj Height
2.0
VS
VSTESTCD
HEIGHT
VSORRESU
C
10
Height Unit
HTU
VS
VSTESTCD
WEIGHT
VSORRES
N
8
Subj Weight
3.0
VS
VSTESTCD
WEIGHT
VSORRESU
C
12
Weight Unit
WTU
VS
VSTESTCD
BMI
VSORRES
N
8
Subj BMI
5.2
BMIFORMULA
VS
VSTESTCD
BMI
VSORRESU
C
7
BMI Unit
BMIU
BMIUNIT
6 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
CDERIVATION
Categories of Variables in Tall-Thin Data Sets
Metadata must fully describe all the attributes of all the categories
USUBJID
VSTESTCD
VSORRES
VSORRESU
SEX
1
HEIGHT
74
IN
MALE
1
WEIGHT
190
LBS
MALE
1
BMI
24.39
KG/M**2
MALE
VARIABLE TYPE
EXAMPLES
PRIMARY KEYS
USUBJID, VSTESTCD
PARAMETER VARIABLE NAME
VSTESTCD (the last pkey)
PARAMETER VARIABLE VALUES
HEIGHT, WEIGHT, BMI
PARAMETER-RELATED
VSORRES, VSORRESU
vsstresn, vsstresc, vsstresu, vspos, vsloc
7 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
PARAMETER-NONRELATED
SEX
What row-level metadata is NOT!
Not meant to define other relationships in study metadata
NOT a list of values, ValueList is not simply a list of values
Row-level metadata is not designed to define all the other
relationships between study variables
It is designed as metadata, i.e. to describe the ItemDef
attributes of virtual variables. That is, to describe the
attributes of parameter-related variables for each value of
–TESTCD
It should not be used for non-metadata purposes
• NOT to define the height unit of measure as being inches in the USA but
centimeters in the EU
• NOT to look for males with positive pregnancy test results
8
• NOT to define all the edit checks. That can be data driven but NOT by row| Presentation
| Presenter Namewhich
| Date | Subject
Business Use Only
level Title
metadata,
is |inadequate
to this task because it only enables
single-domain where conjuncts
Problem Solved
Metadata and a pair of macros enables easy transformation of data
Transforming data between short-wide and tall-thin data sets is now a very
simple macro call
%dt_wide2thin(data=vitals,out=vs,mdlib=md)
%dt_thin2wide(data=vs,out=vitals,mdlib=md)
The tall-thin and short-wide data structures are not perfect for all uses,
summary tables, listings, deriving new parameter results from mutiple
parameter results, comparing parameter results, etc.
Tall-thin is very better for storage, summary tables
Short-wide is better for listings, deriving, comparing
Define file and data transparency achieved
9 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Variable Categories Described
Primary Keys
• Defined in the COLUMNS metadata set
Parameter Name
• A special primary key that defines the kind of result for the current row
• Defined in the COLUMNS_PARAM metadata set
Parameter Value
Parameter-related
• Each non-key variable whose attributes each differ across rows but are the same
attributes for the subset of rows defined by parameter variable xxTESTCD. These are
“virtual variables”.
• Defined in the COLUMNS_PARAM metadata set
Parameter-nonrelated
• Each non-key variable whose attributes do not differ across rows and are not
dependent on the parameter variable
• Defined in the COLUMNS metadata set
10 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
Columns_param Metadata Set
The list of attributes in columns_param are identical to the
list in columns. That is to say, everything you need to
describe about a short-wide column must be described
about the tall-thin parameter-related column.
Storing the study data in tall-thin data sets does not
reduce the amount of metadata definition that is required
In data set TABLE, when variable COLUMN equals the
value PARAM then the attributes of variable PARAMREL
are described in the columns_param metadata set row
There are many other variable attributes than in the
example, but these were subsetted to fit in a slide
11 | Presentation Title | Presenter Name | Date | Subject | Business Use Only