Transcript Document

Advanced Stata Workshop
FHSS Research Support Center
Presentation Layout
•
•
•
•
Visualization and Graphing
Macros and Looping
Panel and Survey Data
Postestimation
Visualization and Graphing in Stata
Life expectancy at birth vs. GNP per capita
Fraction
.2
.1
0 2.5
3
loggnp
3.5
4
4.5
80
.2
.15
.1
Fraction
.05
0 55
60
65
70
75
Life expectancy at birth
75
70
65
60
55
Life expectancy at birth
80
.3
2.5
3
Source: 1998 data from The World Bank Group
3.5
loggnp
4
4.5
Intro To Graphing In Stata
. sysuse auto, clear
“graph” is often optional. So is
“twoway” in this case.
. graph twoway scatter mpg weight //Note that you don't need to type graph or twoway
10
20
30
40
. scatter mpg weight
0
5,000
10,000
15,000
Price
Note: Nearly all graphing commands
start with “graph”, and “twoway” is
a large family of graphs.
Creating Multiple Graphs with “by():”
. twoway scatter mpg weight, by(foreign)
Domestic
Foreign
10
20
30
40
Note that the value label is
displayed above the graphs, and the
variable label is displayed in the
bottom right hand corner.
2,000
3,000
4,000
5,000
2,000
Weight (lbs.)
Graphs by Car type
3,000
4,000
5,000
Overlaying “twoway” graphs
. twoway scatter mpg weight || lfit mpg weight
10
10
20
20
30
30
40
40
. twoway (scatter mpg weight) (lfit mpg weight)
2,000
3,000
Weight (lbs.)
Mileage (mpg)
4,000
Fitted values
The || tells Stata to put the second
graph on top of the first one – order
matters! You don’t need to type
“twoway” twice; it applies to both.
5,000
2,000
3,000
Weight (lbs.)
Mileage (mpg)
4,000
Fitted values
This is another way of writing the
command – it doesn’t matter which
one you use.
5,000
"by()" statements with overlaid graphs
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
“qfitci” is a type of graph which
plots the prediction line from a
quadratic regression, and adds a
confidence interval. The “stdf”
option specifies that the confidence
interval be created on the basis
Foreign
0
10
20
30
40
Domestic
2000
3000
4000
5000
2000
3000
4000
5000
Weight (lbs.)
95% CI
Mileage (mpg)
Fitted values
Graphs by Car type
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
stdf is an option of qfitci.
by(foreign) is an option of
twoway.
"by()" statements with overlaid graphs
Another way of writing the previous
command is:
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
Foreign
0
10
20
30
40
Domestic
2000
3000
4000
5000
2000
3000
4000
5000
Weight (lbs.)
95% CI
Mileage (mpg)
Fitted values
Graphs by Car type
So:
This was is easier to read.
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
This way is easier to type.
Graphs with Many Options and Overlays
You can make pretty impressive graphs just from code, if you overlay the graphs and
specify certain options like: multiple axes, notes, titles and subtitles, axis titles and
labels, and legends.
Code for Previous Graph
.
.
.
.
use http://www.stata-press.com/data/r12/uslifeexp, clear
generate diff = le_wm - le_bm
label var diff "Difference"
#delimit ;
.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
twoway line le_wm year, yaxis(1 2) xaxis(1 2)
|| line le_bm year
|| line diff year
|| lfit diff year
||,
ytitle( "", axis(2) )
xtitle( "", axis(2) )
xlabel( 1918, axis(2) )
ylabel( 0(5)20, axis(2) grid gmin angle(horizontal) )
ylabel( 0 20(10)80, gmax angle(horizontal) )
ytitle( "Life expectancy at birth (years)" )
title( "White and black life expectancy" )
subtitle( "USA, 1900-1999" )
note( "Source: National Vital Statistics, Vol 50, No. 6"
"(1918 dip caused by 1918 Influenza Pandemic)" )
legend(label(1 "White males") label(2 "Black males") );
.
#delimit cr
.
#delimit cr
This may look scary, but it is actually
fairly straightforward. See the
accompanying do-file for
explanation of each component.
Using the Graph Editor
It is often easier to make changes in the
graph editor than to specify all the
options in code.
14
. tsline nci abc
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
12
Sep 24, 2009 - June 7, 2010
16
8
10
Share Price (USD)
14
12
10
8
6
4
6
2
0
01oct2009
01jan2010
01apr2010
01jul2010
date
NASDAQ Composite Index
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
NASDAQ Composite Index
ABC.com, Inc. share price
ABC.com, Inc. share price
Source: CRSP, Bloomberg
Let’s make graph 1 into graph 2 by using the
graph editor tools.
Recording Edits in the Graph Editor
Before you start making changes, click the record button. After
you are done, click it again, and save your changes as a
recording so you can “play” them back later. We will save this
recording as advanced_workshop_1.
Graph Element
Change
Graph Title
Enter Title using quotes to separate lines, color=black
Graph Subtitle
Enter subtitle
Graph Region
X- Axis
Color = Bluish-gray
Range = 0 to 16 by 2, axis line = medium thick, add title, label angle = horizontal,
grid lines = off
title = off, minor ticks = off, suggest # of ticks = 8,
alternate spacing of adjacent labels = on, change label format, label size=small,
axis line = medium thick
Plot 1 line
color=green, width = thick
Plot 2 line
color = blue, width = thick
Caption
Add caption
Y-Axis
Play Your Graph Recording
You can create a graph, open the graph editor, click the green play button, and then play
back your recorded edits.
Or, you can play your edits right from the code:
. tsline nci abc, play(advanced_workshop_1)
You can run your recorded edits on a
graph of a different type, though in this
case not all of your edits will make
sense:
You can also run all of your recorded
edits on a different graph, and just
change the title:
. twoway (scatter nci date) (scatter abc date) ///
> , play(advanced_workshop_1)
. tsline comp_world comp_planet , play(advanced_workshop_1)
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
Sep 24, 2009 - June 7, 2010
Sep 24, 2009 - June 7, 2010
16
14
14
Share Price (USD)
16
12
10
8
6
4
12
10
8
6
4
2
2
0
0
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
Computer World share price
Source: CRSP, Bloomberg
Computer Planet share price
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
NASDAQ Composite Index
Source: CRSP, Bloomberg
ABC.com, Inc. share price
Storing and Moving Your Recordings
Graph recordings are stored as .grec files in your “personal”
folder, under the “grec” folder. Type “personal” to see where
this is; normally it is C:\ado\personal. So by default Stata
should store your .grec files in C:\ado\personal\grec.
. personal
your personal ado-directory is c:\ado\personal\
. dir c:\ado\personal\grec\
0.4k
2/21/13 9:12 advanced_workshop_1.grec
0.7k
3/01/12 9:48 jeff_test_recording_graph_edits.grec
0.9k
5/17/12 15:47 line..grec
1.3k 11/21/12 10:12 x grid.grec
Unfortunately, if you are not faculty, you are probably using lab computers to use
Stata, and when they are re-imaged, you will lose the files in your grec folder. So
you can store the recordings on your flash drive by clicking the Browse button
when you save your recording. Now, when you are in the graph editor and click
the play button, your recording will not appear in the list because it is not stored
where Stata knows to look for it. Never fear, just click Browse, and navigate to
where your .grec file is. If you want your recording to be available right from code,
as in play(advanced_workshop_1), you will need to move it (at least temporarily)
to the “grec” folder, or write the directory location in the code:
play(E:\flashdrive\Graph Recordings\advanced_workshop_1)
Using Schemes in Graphing
Recordings are great if you are going to be making the same kind of graph a lot. But a
recording for a scatter plot will hardly affect a histogram at all, and might even make it
look terrible. If you want to change the look of all graphs that you make, you may want to
make a scheme. Schemes are text files which tell Stata how to draw graphs.
. scatter le year, scheme(economist)
60
55
45
50
45
40
50
55
life expectancy
65
65
60
. sysuse uslifeexp2, clear
. scatter le year
40
1900
1910
1920
Year
1930
1940
1900
1910
1920
Year
1930
1940
More on Schemes
. graph query, schemes
Available schemes are
s2color
s2mono
s2manual
s2gmanual
s2gcolor
s1color
s1mono
s1rcolor
s1manual
sj
economist
see help scheme_s2color
see help scheme_s2mono
see help scheme_s2manual
see
see
see
see
see
see
help
help
help
help
help
help
scheme_s1color
scheme_s1mono
scheme_s1rcolor
scheme_s1manual
scheme_sj
scheme_economist
Schemes are very powerful, because they let your implement a certain look without
specifying a long series of options in every graph, or running every graph through the
graph editor. However, creating schemes is fairly time consuming.
For more on creating your own schemes, see:
http://www3.eeg.uminho.pt/economia/nipe/2010_Stata_UGM/papers/Rising.pdf
And http://www.ats.ucla.edu/stat/stata/seminars/stata_graph/graphsem.txt
Manipulating Graphs: Memory vs. Disk
When you draw a graph, it is stored in memory, under the name Graph.
. sysuse auto, clear
. scatter price mpg
If you draw another graph, it replaces the previous one in memory, and is now called Graph.
. scatter price length
If you want to have multiple graphs up at the same time, you can use the name option.
. scatter price mpg, name(scatter1)
graph save moves your graph from memory to disk, saving it as a .gph file.
. cd C:\Users\nickj22\Downloads\
. graph save scatter1 mygraph1.gph
graph dir lists all graphs in memory and on disk (in the current directory)
. graph dir
Graph
scatter1
mygraph1.gph
graph drop drops a graph from memory. Graphs contain the data files they represent, so if
the dataset is large, they can actually take up quite a bit of memory.
. graph drop scatter1
Manipulating Graphs Demo
See do file for demo
Note: Annotated code is in the do file for all of these
Histogram, with overlaid normal distribution
Avg. education level
Avg. education level
Avg. education level
NE
N Cntrl
20 40 60
12
12
8
20 40 60
10
12
Percent
15
16
8
6
22
22
33
22
17
Avg. education level
Avg. education level
South
West
38
38
31
25
13
6
15
13
8
6
8
5
0
6
50
33
0
20
20
9.5
10
9.5
10
10.5
average education level
Source: US Census, 1980 and 1990
10.5
11
9.5
average education level
2
Percent
normal educ
Percent
0
Percent
More Example Graphs
11
Graphs by Census region
10
10.5
11
More Example Graphs
Use graph bar to make bar graphs
Average July and January temperatures
80
by regions of the United States
81.0
73.5
72.1
60
73.3
46.2
27.9
21.7
0
20
40
46.1
N.E.
N. Central
July
Source: U.S. Census Bureau, U.S. Dept. of Commerce
South
January
West
More Example Graphs
Use graph combine to combine 3 graphs into one:
Life expectancy at birth vs. GNP per capita
Fraction
.1
0 2.5
3
loggnp
3.5
4
4.5
.2
.15
.1
Fraction
.05
0 55
60
65
70
75
Life expectancy at birth
75
70
65
60
55
Life expectancy at birth
80
.2
80
.3
2.5
3
Source: 1998 data from The World Bank Group
3.5
loggnp
4
4.5
More Example Graphs
Graph matrix is a great alternative to a correlation matrix to
investigate relationships between variables
Correlations among 1998 life-expectancy data
50
60
70
80
20
40
60
80
100
3
2
Avg.
annual %
growth
1
0
-1
80
70
60
Life
expectancy
at birth
50
12
Log GNP
per
capita
10
8
6
100
80
60
safewater
40
20
-1
0
1
2
3
Source: The World Bank Group
6
8
10
12
More Example Graphs
Get data labels (called marker labels in Stata) from the values of
another variable
Life expectancy vs. GNP per capita
80
North, Central, and South America
Canada
Jamaica
Chile
Panama
Uruguay
Venezuela
Trinidad
Mexico
Dominican Republic
Ecuador
Para Colombia
Honduras
El Salvador
Peru
Nicaragua
65
70
75
United States
Argentina
Brazil
Guatemala
55
60
Bolivia
Haiti
.5
5
10
GNP per capita (thousands of dollars)
Data source: World Bank, 1998
15
20 25 30
More Example Graphs
Xtline from a panel data set can overlay lines for each value of
panel variable. The labels on the x-axis are often a bit off to start
though, as shown.
Calories Consumed by Subject
4500
4000
3500
Calories consumed
5000
Jan 1 2002 - Jan 1 2003
01jan2002
01apr2002
01jul2002
Date
Tess
Arnold
01oct2002
Sam
01jan2003