Transcript Stata
CCPR Computing Services
Workshop 2: Stata October 20, 2004
1
Outline
Converting Data between Statistical Packages Stata – Basic Commands Command Syntax Abbreviations Missing Values Combining Data Using do-files Getting Help Updating Stata 2
Converting Data – Windows Stat/Transfer SAS, Stata, S-Plus, SPSS, Excel, and more Windows interface Enter “in” data and “out” data Enter info on other tabs as necessary Check results!
3
Converting Data – Unix Stat/Transfer From within stat-transfer invoke stat/transfer (specific to Unix machine) at stat/transfer prompt, enter:
copy datfile1.ext1 datfile2.ext2
datfile1.ext1 = original file, datfile2.ext2 = new file From Unix prompt
st datfile1.ext1 datfile2.ext2
(replace st with local Stat/Transfer invocation) See manual for more info and options Check results!
4
Converting Data – DBMS/Copy
DBMS/Copy for Unix (without xwindows) From Unix prompt dbmsnox
indatfile.ext1 outdatfile.ext2
ext1
and
ext2
are “pseudo” extensions spsswin = SPSS for Windows Stata7 = Stata 7 sas7sun = SAS for Unix v7 ssdsun = SAS for Unix v6 Example – windows spss to stata7
dbmsnox mydat.spsswin mydat.stata7
Check results!
5
Converting Data
See ATS website for transferring files between SAS, Stata, and SPSS http://www.ats.ucla.edu/stat/sas/faq/convert_pkg.htm
6
Stata - Getting Started
Windows: Programs > Stata8 Command Window: enter commands Results Window Other: review, variables, do-editor Unix: Interactive Stata commands and results show in same window Batch Stata nice +10 stata –b do
myjob.do
7
Basic Commands
Handout 1 (green) Reading raw data insheet, input, infix, infile Using/saving a Stata dataset use,
webuse
save 8
Basic Commands, cont.
Describing data
describe
Summarize
codebook inspect
Listing data
list
Tables of statistics
table
tab1
varlist
(one-way tabulation of variables) tab2 varlist (two-way tabulations of variables) 9
Basic Commands, cont.
Changing data drop
keep
generate
encode var, generate newvar
recode replace 10
Basic Commands, cont.
Labeling data
label variable
label define
label values
label list
11
A few other commands
compress - saves data more efficiently reshape – long/wide sort/ gsort order rename 12
Stata Syntax
Basic command syntax: [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [
weighttype=weight
] [, options] Brackets = optional portions Italics = user specified 13
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 1 (webuse union) Stata Command: .
summarize
Result: Summarizes all dataset variables (_all) 14
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 2 (webuse union) Stata command:
.summarize age
Result: Summarizes variable age 15
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 3 (webuse union) Stata Command:
.summarize age if year >= 80
Result: Summarizes age, includes only observations with year >= 80 16
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 4 (webuse union) Stata Command:
.summarize age if year >= 80 in 1/100
Result: Summarizes variable age, includes only first 100 obs and only obs with year >= 80 17
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 5 (webuse union) Stata Command:
.by black: summarize age if year >= 80
Result: Summarizes age separately for different values of black, including only obs for which year >= 80 18
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 6 (webuse union) Stata Command:
.bysort black: summarize age if year >= 80, detail
Result: Detailed summaries of variable age, separated over different values of black, includes only obs with year >= 80 19
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 7 (webuse union) Generally [=
exp
] used with commands
generate
and
replace
Stata Commands:
.generate agelt30 = age .replace agelt30 = 0 if age < 30 .replace agelt30 = 0 if age >= 30 & age < .
Result: Variable agelt30 set equal to 1, 0, or missing 20
Stata Syntax, cont.
Complete syntax [by
varlist
:] command [
varlist
] [=
exp
] [if
exp
] [in
range
] [weighttype
=weight
] [, options] Example 8 Stata Command:
.summarize race [pweight=final_wt]
Results: Summarizes variable race accounting for probability weight called final_wt. Note: There are four different types of weights in Stata…be careful.
21
Abbreviations in Stata
Abbreviating command, option, and variable names shortest uniquely identifying name is sufficient Example: Variables in use = make, price, mpg Stata command, not abbreviated: .summarize make price Stata command, abbreviated: .su ma p Exceptions describe (d), list (l), and some others Commands that change/delete Functions implemented by ado-files 22
Missing Values in Stata 8
Stata 8 27 representations of numerical “missing” ., .a, .b, … , .z
Relational comparisons Biggest number < . < .a < .b < … < .z
Mathematical functions missing + nonmissing = missing String missing = Empty quote: “” 23
Missing Values in Stata - Pitfalls
Pitfall #1
Stata7 vs. Stata8 missing values:
Stata 7
varname != .
In Stata 8
varname < .
varname
== .
varname
>= .
Pitfall #2
Do NOT: .replace weightlt200 = 0 if weight >= 200 INSTEAD: .
replace weightlt200 = 0 if weight >= 200 & weight < .
24
Combining Data
Append vs. Merge Append – same variables, different observations Merge - same or related observations, different variables Appending data in Stata Handout 2 25
Combining Data- merge and joinby
Demonstrate with two sample datasets: Neighborhood and County samples One-to-one merge Handout 3 One-to-many merge – use match merge Handout 4 Many-to-many merge – use joinby Handout 5 26
Combining Data
Variable _merge (generated by merge and joinby) _merge 1 2 3 Observation in master data Yes No Yes Observation in “using” data No Yes Yes “update” option also includes _merge=4,5 “update” changes default action when matched observation has missing values in master and non missing in “using” data Pitfalls Pitfall_merge1 handout 6 Pitfall_merge2 handout 7 27
Do-files
What is a do-file?
Stata commands can be executed interactively or via a do-file A do-file is a text file containing commands that can be read by Stata Handouts are do-files Stata command .do
dofilename.do
28
Do-files
Why use a do-file?
Documentation Communication Reproduce interactive session? Interactive vs. do-files Record EVERYTHING to recreate results in your do-file!!
29
Do-files > Header, Version Control
Header Include in do-files – name, project, project location, date, purpose, inputs, outputs, special instructions Version Control include version at top of do-file Why?
30
Do-file > End of Line Character
Commands requiring multiple lines delimit ; This command tells Stata to read semi-colons as the end-of-line character instead of the carriage return Comment out the carriage return with /* at the end of line and */ at the beginning of next Comment out the carriage return with /// 31
Do-files > End of line Character
Example 1: #delimit #delimit ; keep firstname lastname birth death age weight height; #delimit cr Example 2: /* */ keep firstname lastname birth /* */ age weight height Example 3: /// keep firstname lastname birth /// age weight height 32
Do-files > Comments
Comments Lines beginning with * will be ignored Words between /* and */ will be ignored (spanning multiple lines ok) Words between // and end of line will be ignored Words between /// and beginning of next line will be ignored (one way to spread command over two lines) 33
Do-files > Comments
Comments - example *SAMPLE EXCERPT OF STATA DO-FILE *This line will be ignored by Stata.
use mydata.dta /* These words will be ignored */ do myjob.do //The remainder of this line will be ignored. keep age race sex ///The remainder of this line will be ignored, including return first_name height weight last_name /*This line continuation of the last line 34
Saving output
Work in do-files and log your sessions!
log using
filename
replace, append log close Output choices: *.log file - ASCII file *.smcl file - nicer format for viewing and printing in Stata 35
Basic Commands, cont.
Graphs are not saved in log files Use “saving” option of graph commands
saving(graph.ext)
Export current graph:
graph export graph.ext
Ex: graph export graph.eps
Supported formats: .ps, .eps, .wmf, .emf .pict
36
Getting Help in Stata
help
command_name
abbreviated version of manual search search
keywords
, local search
keywords
, net search
keywords
, all findit
keywords
same as search
keywords
, all Search Stata Listserver and Stata FAQ 37
Stata Resources
www.stata.com > Resources and Support Search Stata Listserver Search Stata (FAQ) Stata Journal (SJ) articles for subscribers programs free Stata Technical Bulletin (STB) replaced with the Stata Journal Articles available for purchase, programs free Courses (for fee) 38
Updating Stata
help update update all 39