Stata - Welcome to CCPR

Download Report

Transcript Stata - Welcome to CCPR

CCPR Computing Services
Introduction to Stata
Courtney Engel
October 26, 2007
1
Outline

Stata










Command Syntax
Basic Commands
Abbreviations
Missing Values
Combining Data
Using do-files
Basic programming
Special Topics
Getting Help
Updating Stata
2
Stata Syntax

Basic command syntax:
[by varlist:]
command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options]


Brackets = optional portions
Italics = user specified
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/stataslides10.07.log
3
Stata Syntax, cont.
 Complete syntax
[by varlist:]
command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [,
options]

Example 1 (webuse union)

Stata Command:
.bysort black: summarize age if year >= 80, detail

Results:

Summarizes age separately for different values of
black, including only observations for which year
>= 80, includes extra detail.
4
Stata Syntax, cont.

Complete syntax
Obs #
age
agelt30
1
10
1
2
15
1
3
.
.
4
30
0
5
73
0
[by varlist:]
command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options]

Example 2 (webuse union)


Stata Commands:
.generate agelt30 = age
.replace agelt30 = 1 if age < 30
.replace agelt30 = 0 if age >= 30 & age < .
Result:


Variable agelt30 set equal to 1, 0, or missing
Generally [= exp] used with commands generate and
replace
5
Basic Commands –
Load “auto” data and look at some vars

Load data from Stata’s website
webuse auto.dta

Look at dataset
describe

Summarize some variables
codebook make headroom, header
inspect weight length
6
Basic Commands –
Load “auto” data and look at some vars

Look at first and last observation
list make price mpg rep78 if _n==1
list make price mpg rep78 if _n==_N

Summarize a variable in a table
table foreign
table foreign, c(mean mpg sd mpg)
7
Keep/Save a Subset of the Data

“Keep” a subset of the variables in memory
keep make headroom trunk weight length price

List variables in current dataset


List string variables in current dataset


ds
ds, has(type string)
Save current dataset

save autokeep, replace
8
Generating New Variables

Create new variable = headroom squared
generate headroom2 = headroom^2

Generate numeric from string variable
encode make, generate(makeNum)
list make makeNum in 1/5
 Can’t tell it’s numeric, but look at “storage type” in
describe:
Obs #
Headroom
Headroom2
describe make makeNum
1
10
100
2
9
81
3
4
16
9
Generating New Variables, cont.



Create categorical variable from continuous
variable
“price” is integer-valued with minimum 3291 and
max 15906
Generate categorical version - Method 1:
generate priceCat = 0
replace priceCat = 1 if price < 5000
replace priceCat = 2 if price >= 5000 & price < 10000
replace priceCat = 3 if price >= 10000 & price < .
10
Generating New Variables, cont.

Generate categorical version of numerical
variable: Method 2
generate priceCat2 = price
recode priceCat2 (min/5000 = 1) (5000/10000=2) (10000/max=3)

Compare price, priceCat, and priceCat2
table price priceCat
table priceCat priceCat2
11
Variable Labels and Value Labels

Create a description for a variable:
label variable priceCat “Categorical price"

Create labels to represent variable values:
label define priceCatlabels 1 “cheap” 2 “mid-range” 3 “expensive”
label values priceCat priceCatLabels

View results:
describe
list price priceCat in 1/10
12
Reshape > Wide to Long
Wide format:
year
Session
Order
Author1
Author2
2006
P01
3
Biddlecom
Bankole
2006
P01
4
Anyara
Hinde
2006
P01
5
Amouzou
Becker

Wide -> Long:
reshape long author, i(year session order) j(count)

long - reshape from wide to long
author- Stem of the variable going from wide to long
i(year session order)- Uniquely identifies an observation in wide form
j(count)- Variable which will be created to contain suffix of Author i.e. (1 2)



13
Reshape > Long to Wide
Long format:
Year
Session
Order
Author
Count
2006
P01
3
Biddlecom
1
2006
P01
3
Bankole
2
2006
P01
4
Anyara
1
2006
P01
4
Hinde
2
2006
P01
5
Amouzou
1
2006
P01
5
Becker
2

Long -> Wide:
reshape wide author, i(year session order) j(count)

wide - reshape from long to wide
author - variable to be converted from long to wide
i(year session order) - variables uniquely identify observations in wide
j(count)- variable gives the suffix of Author i.e. (1 2)



14
A few other commands





compress - saves data more efficiently
sort/ gsort – ascending/descending observation sort
order- variable order
rename – rename variables
set more on/off – produce results with pause?
15
Abbreviations in Stata

Abbreviating command, option, and variable names


shortest uniquely identifying name is sufficient
Example:


Assume three variables are in use: make, price, mpg
“UN-abbreviated” Stata command:
.summarize make price


Abbreviated Stata command:
.su ma p
Exceptions



describe (d), list (l), and some others
Commands that change/delete
Functions implemented by ado-files
16
Missing Values in Stata 8-10

Stata 8 and later versions



Relational comparisons


Biggest number < . < .a < .b < … < .z
Mathematical functions


27 representations of numerical “missing”
., .a, .b, … , .z
missing + nonmissing = missing
String missing =

Empty quote: “”
17
Missing Values in Stata - Pitfalls

Pitfall #1


Missing values changed after Stata7:
Stata 7
Stata 8 and later
varname != .
varname < .
varname == .
varname >= .
Pitfall #2

Do NOT:
.replace weightlt200 = 0 if weight >= 200

INSTEAD:
.replace weightlt200 = 0 if weight >= 200 & weight < .
18
Combining Data

Append vs. Merge



Append – two datasets with same variables, different
observations
Merge – two datasets with same or related observations,
different variables
Appending data in Stata

Example: append.do
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/append10.07.log
19
Combining Data- merge and joinby

Demonstrate with two sample datasets:


Neighborhood and County samples
One-to-one merge

onetoone.do
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetoone10.07.log

One-to-many merge – use match merge

onetomany.do
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetomany10.07.log

Many-to-many merge – use joinby

manytomany.do
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/manytomany10.07.log
20
Combining Data


Variable _merge (generated by merge and joinby)
_merge
Observation in master data
Observation in “using” data
1
Yes
No
2
No
Yes
3
Yes
Yes
Pitfalls


Merging unsorted data
Many-to-many using merge instead of joinby
21
Do-files

What is a do-file?



Stata commands can be executed interactively or
via a do-file
A do-file is a text file containing commands that
can be read by Stata
Running a do-file within Stata
.do dofilename.do
22
Do-files

Why use a do-file?





Documentation
Communication
Reproduce interactive session?
Interactive vs. do-files
Record EVERYTHING to recreate results in
your do-file!
23
Do-files > Documentation Header

File header includes:
 Name (email)
 Project
 Project location
 Date
 Software Version
 Purpose of program
 Inputs
 Outputs
 Special Instructions
*Josie Bruin ([email protected])
*HRS project
*/u/socio/jbruin/HRS/
*October 5, 2007
*Stata version 8
*Purpose: Create and merge two datasets in Stata,
* then convert data to SAS
*Input programs:
* HRS/staprog/H2002.do,
* HRS/staprog/x2002.do,
* HRS/staprog/mergeFiles.do
*Output:
* HRS/stalog/H2002.log,
* HRS/stalog/x2002.log,
* HRS/stalog/mergeFiles.log
* HRS/stadata/Hx2002.dta
* HRS/sasdata/Hx2002.sas
*Special instructions: Check log files for errors
* check for duplicates upon new data release
Do-files > Comments

Comments



Lines beginning with * will be ignored
Words between // and end of line will be ignored
Spanning commands over two lines:


Words between /* and */ will be ignored, including end of
line character
Words between /// and beginning of next line will be
ignored
25
Do-file > End of Line Character

Commands requiring multiple lines

delimit ;


Comment out the carriage return with


This command tells Stata to read semi-colons as the
end-of-line character instead of the carriage return
/* at the end of line and */ at the beginning of next
Comment out the carriage return with ///
26
Do-files > Examples
webuse auto, clear
*this is a comment
#delimit ;
summarize price mpg rep78
headroom trunk weight;
#delimit cr
summarize price mpg rep78 headroom trunk weight //this is a comment
summarize price mpg rep78 ///
headroom trunk weight
summarize price mpg rep78 /*
*/ headroom trunk weight
27
Saving output


Work in do-files and log your sessions!
log using filename

replace or append

log close

Output choices:


*.log file - ASCII file (text)
*.smcl file - nicer format for viewing and printing in Stata
28
Saving Output, cont.


Graphs are not saved in log files
Export current graph:



graph export graph.ext
Ex: graph export graph.eps
Supported formats:

.ps, .eps, .wmf, .emf .pict
29
Example using local macro
. local mypath "C:\Documents and Settings\MyStata"
. display `mypath'
C:\Documents invalid name
r(198);
. display C:\Documents and Settings\MyStata
C:\Documents invalid name
r(198);
. display "`mypath'"
C:\Documents and Settings\MyStata
30
Example– foreach, return, display
foreach var of varlist tenure-ln_wage {
quietly summarize `var'
local varmean = r(mean)
display "Variable `var' has mean `varmean’ "
}
+---------------------------------------------------+
|tenure hours wks_work ln_wage |
|---------------------------------------------------|
1. | .0833333
20
27 1.451214 |
2. | .1666667
15
27 2.09457 |
3. | .25
40
27 1.790204 |
4. | .0833333
44
10 1.02862 |
5. | .0833333
20
10 .7409375 |
+----------------------------------------------------+
http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/constructs10.07.log
31
Example using forvalues, display
forvalues counter = 1/10 {
display `counter'
}
forvalues counter = 0(2)10 {
display `counter'
}
32
Example: forvalues, generating random
variables
forvalues j = 1/3 {
generate x`j' = uniform()
generate y`j' = invnormal(uniform())
}
foreach x of varlist x1-x3 y1-y3 {
summarize `x'
}
33
Example – if/else
foreach var of varlist tenure-ln_wage {
quietly summarize `var'
local varmean = r(mean)
if `varmean' > 10 {
display "`var' has mean greater than 10"
}
else {
display "`var' has mean less than 10"
}
}
34
Special Topic: regular expressions


webuse auto
List all values of make starting with a capital
and containing an additional capital:
list make if regexm(make, "^[A-Z].+[A-Z].+")

AND ending in a number
list make if regexm(make, "^[A-Z].+[A-Z].+[0-9]$")
+-------------------+
| make
|
|--------------------|
| Merc. XR-7 |
| Olds Delta 88 |
+--------------------+
35
Special Topic: Exporting results using
outreg





User-written program called outreg
From within Stata, type findit outreg
Very simple!!
Basically add one line of code after each
regression to export results
For an example of code, see
http://www.ats.ucla.edu/stat/stata/faq/outreg.htm
36
Getting Help in Stata

help command_name


search




search keywords, local
search keywords, net
search keywords, all
findit keywords


abbreviated version of manual
same as search keywords, all
Search Stata Listserver and Stata FAQ
37
Stata Resources

www.stata.com > Resources and Support



Search Stata Listserver
Search Stata (FAQ)
Stata Journal (SJ)



Stata Technical Bulletin (STB)



articles for subscribers
programs free
replaced with the Stata Journal
Articles available for purchase, programs free
Courses (for fee)
38
Updating Stata


help update
update all
39
CCPR’s Cluster and helping your research

Software and Data



Efficiency





STATA, SAS, R, Compilers, text editors, etc
HRS, CPS (Unicon version), AddHealth, IFLS, etc
Your PC is available for other work when you submit a job
to the cluster
Faster processors
More RAM
Easy to share data, programs, etc. with colleagues via the
cluster
Obtain access by requesting an account

http://lexis.ccpr.ucla.edu/account/request/
Questions/Feedback

Please email me if you need help in the future

[email protected]