CollaborativeDataManagementforLongitudinalStudies.ppt

Download Report

Transcript CollaborativeDataManagementforLongitudinalStudies.ppt

Collaborative Data Management for
Longitudinal Studies
Stephen Brehm
[coauthors: L. Philip Schumm & Ronald A. Thisted]
University of Chicago
(Supported by National Institute on Aging Grant P01 AG18911-01A1)
Agenda
1. Background on Study
2. Problem – Data Management Deficiencies
3. Solution – Collaborative Data Management
4. STATA Programs – maketest & makedata
Background on Study
• NIH-funded
Longitudinal Study
• Loneliness & Health
• Thousands of
Measures
– Loneliness
– Depression
• 230 subjects
• Repeated Yearly
Problem – Data Management Deficiencies
• Code Not Modular
…Difficult to manage the data cleaning code
…Limited code reuse from year to year
…Difficult to collaborate among interns
• No Established Set of Data Cleaning Steps
…Difficult for research assistants (turn-over)
…Inconsistent data cleaning techniques
…Data cleaning code difficult to read
Problem – Data Management Deficiencies
Research
Assistant
Research
Assistant
Research
Assistant
Core File Set
Research
Assistant
Research
Assistant
Solution – Collaborative Data Management
• Process
–
–
–
–
Established Steps
File System Layout
Automated Tests
Collaboration
• Concepts
– Module
– Batch
– “Data Certification”
• STATA Programs
– maketest
– makedata
Solution – Collaborative Data Management
• Process
–
–
–
–
Established Steps
File System Layout
Automated Tests
Collaboration
• Concepts
– Module Ex:loneliness
– Batch
– “Data Certification”
• STATA Programs
– maketest
– makedata
Solution – Collaborative Data Management
• Process
–
–
–
–
Established Steps
File System Layout
Automated Tests
Collaboration
• Concepts
– Module Ex:loneliness
– Batch Ex:yr1, yr2, yr3
– “Data Certification”
• STATA Programs
– maketest
– makedata
Solution – Collaborative Data Management
Set of Files for Each Module
acquire-[module].do & fix-[module].do
test-[module].do
derive-[module].do
label-[module].do
Year-Specific
Acquire
& Fix
60% Code Reuse – Files Shared Between Years
Test
Derive
Label
STATA Program – maketest
• Purpose:
– Auto-generation of Data Certifying Tests
• Functionality:
– Tests Variable Type
– Checks Consistency of Value Labels
– Verifies Existence of Variable
STATA Program – maketest
• Syntax:
– maketest [varlist] using, [REQuire(varlist)
append replace]
• Example:
– maketest using filename.do, replace
• Options:
– using: specifies file to write
– REQ: requires presence of variables in list
– append: add to existing test .do file
– replace: overwrite existing .do file
STATA Program – makedata
“Bringing
it all
together”
STATA Program – makedata
• Syntax:
– makedata [namelist], Pattern(string) [replace clear
Noisily Batch(namelist) TESTonly]
• Example:
– makedata ats, p("acquire-*.do") b(yr1) clear
replace
• Options:
–
–
–
–
–
–
p: pattern – file naming convention
replace: overwrite existing data file
clear: clear current data in memory
Noisily: full output (default = summary)
b: batch – year, wave, center
TESTonly: only run tests step
Other Applications
• Beyond Longitudinal Data
• Teaching Data Cleaning with STATA
• Contact Information
– Stephen Brehm:
[email protected]
– L. Philip Schumm:
[email protected]
– Ronald A. Thisted:
[email protected]
• Supported by National Institute on Aging
Grant P01 AG18911-01A1