Dias nummer 1

Download Report

Transcript Dias nummer 1

Working sideways in Stata
Jakob Hjort
DataManager, MPH
Department of Cardiology
Aarhus University Hospital
DK-8200 Aarhus
Denmark
2014 Nordic and Baltic Stata Users Group Metting
The rectangular dataset
The rectangular dataset
Statistics
The rectangular dataset
Statistics
results
”It is not the data we want it’s the ssence of data”
The rectangular dataset
Datamanagement
The rectangular dataset
Datamanagement
The rectangular dataset
Datamanagement
Statistics
The rectangular dataset
Datamanagement
Statistics
- transpose?
The rectangular dataset – subset in matrix using mata?
use ”family.dta”, clear
* Dataset with: fam_name, inc_mother & inc_father
mata
st_view(x=0,.,(”inc_mother”,”inc_father”))
income=colsum(x’)’
st_addvar(”long”,”inc_household”)
st_store(.,”inc_household”,income)
end
list fam_name inc_mother inc_father inc_household
The direct approach
generate [type] newvar=exp [if] [in]
Datamanagement
The direct approach
generate [type] newvar=exp [if] [in]
Datamanagement
Weight Height
Ex.: generate BMI=Weight/Height^2
BMI
The direct approach
egen [type] newvar=fcn(arguments) [if] [in] [,options]
Datamanagement
rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss,
rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt,
max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total
The direct approach
egen [type] newvar=fcn(arguments) [if] [in] [,options]
Datamanagement
Ex.: egen income=rowtotal(inc*)
IncJan IncFeb IncMar IncApr IncMay IncJun IncJul
…
income
rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss,
rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt,
max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total
Looking under the skirts – just for inspiration
viewsource _growmin.ado
the rowmin() function of egen
program define _growmin
version 6, missing
gettoken type 0 : 0
gettoken g
0 : 0
gettoken eqs 0 : 0
syntax varlist [if] [in] [, BY(string)]
if `"`by'"' != "" {
_egennoby rowmin() `"`by'"'
}
tempvar touse
mark `touse' `if' `in'
quietly {
gen `type' `g' = .
tokenize `varlist'
while "`1'"!="" {
replace `g' = cond(`1' < `g',`1',`g')
mac shift
}
}
end
Looking under the skirts – just for inspiration
viewsource _growmin.ado
the rowmin() function of egen
program define _growmin
version 6, missing
gettoken type 0 : 0
gettoken g
0 : 0
gettoken eqs 0 : 0
syntax varlist [if] [in] [, BY(string)]
if `"`by'"' != "" {
_egennoby rowmin() `"`by'"'
}
tempvar touse
mark `touse' `if' `in'
quietly {
1.
gen `type' `g' = .
2.
tokenize `varlist'
3.
while "`1'"!="" {
4.
replace `g' = cond(`1' < `g',`1',`g')
5.
mac shift
6.
}
}
1. Initialize
end
target variable
2. Prepare the variable-list
3. Looping:
4. In-the-loop-commands
 Prepare the variable-list
. local vars incJan incFeb incMar incApr incMay incJun ///
incJul incAug incSep incOct incNov incDec
Full specification of each and every variable – OK with 12 but what in case of hundreds?
The list is stored in `vars'
. unab vars: inc*
. unab vars: incJan-incDec
Variables can be specified with wildcards - The expanded list is stored in `vars'
(unab means unabbreviate – however the command itself can’t be un-abbreviated)
. ds inc*
. ds incJan-incDec
incJan incFeb incMar incApr incMay incJun incJul incAug incSep incOct incNov
incDec
Variables can be specified with wildcards - The list is stored in `r(varlist)’
Nice feature: the expanded list is shown for inspection
1. Initialize target variable
2. Prepare the variable-list
3. Looping:
4. In-the-loop-commands
 Looping
”foreach” is the quickest and the most transparent loop command
foreach lvar in incJan incFeb {
// do stuff with "`lvar'”
}
unab lvar: inc*
foreach lvar in `lvar' {
// do stuff with "`lvar'”
}
ds inc*
foreach lvar in `r(varlist)' {
// do stuff with "`lvar'”
}
1. Initialize target variable
2. Prepare the variable-list
3. Looping:
4. In-the-loop-commands
 Looping
Hold + press …
Left single-quote
0
9
altloop command
6
”foreach” is the quickest and the most transparent
=
`
on numeric keypad
foreach lvar in incJan incFeb {
// do stuff with "`lvar'”
}
Hold + press …
alt
0
3
Right single-quote
9
=
’
on numeric keypad
unab lvar: inc*
foreach lvar in `lvar' {
// do stuff with "`lvar'”
}
ds inc*
foreach lvar in `r(varlist)' {
// do stuff with "`lvar'”
}
1. Initialize target variable
2. Prepare the variable-list
3. Looping:
4. In-the-loop-commands
 In the loop
generate minimum=.
unab vars: inc*
foreach lvar in `vars' {
replace minimum = cond(`lvar' < minimum,`lvar’,minimum)
}
generate minimum=.
unab vars: inc*
foreach lvar in `vars' {
replace minimum = `lvar’ if `lvar’<minimum
}
generate minimum=.
unab vars: inc*
foreach lvar in `vars' {
if `lvar’<minimum {
replace minimum = `lvar’
}
}
!
1. Initialize target variable
2. Prepare the variable-list
3. Looping:
4. In-the-loop-commands
Some of the danish participants who might know ”the DREAM database”
will propably be able to see how these approaches can be useful when
working with this fantastic but difficult construction.
Thank you very much