Transcript Document
Advanced Stata Workshop FHSS Research Support Center Presentation Layout • • • • Visualization and Graphing Macros and Looping Panel and Survey Data Postestimation Visualization and Graphing in Stata Life expectancy at birth vs. GNP per capita Fraction .2 .1 0 2.5 3 loggnp 3.5 4 4.5 80 .2 .15 .1 Fraction .05 0 55 60 65 70 75 Life expectancy at birth 75 70 65 60 55 Life expectancy at birth 80 .3 2.5 3 Source: 1998 data from The World Bank Group 3.5 loggnp 4 4.5 Intro To Graphing In Stata . sysuse auto, clear “graph” is often optional. So is “twoway” in this case. . graph twoway scatter mpg weight //Note that you don't need to type graph or twoway 10 20 30 40 . scatter mpg weight 0 5,000 10,000 15,000 Price Note: Nearly all graphing commands start with “graph”, and “twoway” is a large family of graphs. Creating Multiple Graphs with “by():” . twoway scatter mpg weight, by(foreign) Domestic Foreign 10 20 30 40 Note that the value label is displayed above the graphs, and the variable label is displayed in the bottom right hand corner. 2,000 3,000 4,000 5,000 2,000 Weight (lbs.) Graphs by Car type 3,000 4,000 5,000 Overlaying “twoway” graphs . twoway scatter mpg weight || lfit mpg weight 10 10 20 20 30 30 40 40 . twoway (scatter mpg weight) (lfit mpg weight) 2,000 3,000 Weight (lbs.) Mileage (mpg) 4,000 Fitted values The || tells Stata to put the second graph on top of the first one – order matters! You don’t need to type “twoway” twice; it applies to both. 5,000 2,000 3,000 Weight (lbs.) Mileage (mpg) 4,000 Fitted values This is another way of writing the command – it doesn’t matter which one you use. 5,000 "by()" statements with overlaid graphs . twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign) “qfitci” is a type of graph which plots the prediction line from a quadratic regression, and adds a confidence interval. The “stdf” option specifies that the confidence interval be created on the basis Foreign 0 10 20 30 40 Domestic 2000 3000 4000 5000 2000 3000 4000 5000 Weight (lbs.) 95% CI Mileage (mpg) Fitted values Graphs by Car type . twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign) stdf is an option of qfitci. by(foreign) is an option of twoway. "by()" statements with overlaid graphs Another way of writing the previous command is: . twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign) Foreign 0 10 20 30 40 Domestic 2000 3000 4000 5000 2000 3000 4000 5000 Weight (lbs.) 95% CI Mileage (mpg) Fitted values Graphs by Car type So: This was is easier to read. . twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign) . twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign) This way is easier to type. Graphs with Many Options and Overlays You can make pretty impressive graphs just from code, if you overlay the graphs and specify certain options like: multiple axes, notes, titles and subtitles, axis titles and labels, and legends. Code for Previous Graph . . . . use http://www.stata-press.com/data/r12/uslifeexp, clear generate diff = le_wm - le_bm label var diff "Difference" #delimit ; . > > > > > > > > > > > > > > > twoway line le_wm year, yaxis(1 2) xaxis(1 2) || line le_bm year || line diff year || lfit diff year ||, ytitle( "", axis(2) ) xtitle( "", axis(2) ) xlabel( 1918, axis(2) ) ylabel( 0(5)20, axis(2) grid gmin angle(horizontal) ) ylabel( 0 20(10)80, gmax angle(horizontal) ) ytitle( "Life expectancy at birth (years)" ) title( "White and black life expectancy" ) subtitle( "USA, 1900-1999" ) note( "Source: National Vital Statistics, Vol 50, No. 6" "(1918 dip caused by 1918 Influenza Pandemic)" ) legend(label(1 "White males") label(2 "Black males") ); . #delimit cr . #delimit cr This may look scary, but it is actually fairly straightforward. See the accompanying do-file for explanation of each component. Using the Graph Editor It is often easier to make changes in the graph editor than to specify all the options in code. 14 . tsline nci abc ABC.com Inc. Closing Share Price vs. Nasdaq Composite Index 12 Sep 24, 2009 - June 7, 2010 16 8 10 Share Price (USD) 14 12 10 8 6 4 6 2 0 01oct2009 01jan2010 01apr2010 01jul2010 date NASDAQ Composite Index Oct 1, 2009 Dec 1, 2009 Feb 1, 2010 Apr 1, 2010 Jun 1, 2010 Nov 1, 2009 Jan 1, 2010 Mar 1, 2010 May 1, 2010 NASDAQ Composite Index ABC.com, Inc. share price ABC.com, Inc. share price Source: CRSP, Bloomberg Let’s make graph 1 into graph 2 by using the graph editor tools. Recording Edits in the Graph Editor Before you start making changes, click the record button. After you are done, click it again, and save your changes as a recording so you can “play” them back later. We will save this recording as advanced_workshop_1. Graph Element Change Graph Title Enter Title using quotes to separate lines, color=black Graph Subtitle Enter subtitle Graph Region X- Axis Color = Bluish-gray Range = 0 to 16 by 2, axis line = medium thick, add title, label angle = horizontal, grid lines = off title = off, minor ticks = off, suggest # of ticks = 8, alternate spacing of adjacent labels = on, change label format, label size=small, axis line = medium thick Plot 1 line color=green, width = thick Plot 2 line color = blue, width = thick Caption Add caption Y-Axis Play Your Graph Recording You can create a graph, open the graph editor, click the green play button, and then play back your recorded edits. Or, you can play your edits right from the code: . tsline nci abc, play(advanced_workshop_1) You can run your recorded edits on a graph of a different type, though in this case not all of your edits will make sense: You can also run all of your recorded edits on a different graph, and just change the title: . twoway (scatter nci date) (scatter abc date) /// > , play(advanced_workshop_1) . tsline comp_world comp_planet , play(advanced_workshop_1) ABC.com Inc. Closing Share Price vs. Nasdaq Composite Index ABC.com Inc. Closing Share Price vs. Nasdaq Composite Index Sep 24, 2009 - June 7, 2010 Sep 24, 2009 - June 7, 2010 16 14 14 Share Price (USD) 16 12 10 8 6 4 12 10 8 6 4 2 2 0 0 Oct 1, 2009 Dec 1, 2009 Feb 1, 2010 Apr 1, 2010 Jun 1, 2010 Nov 1, 2009 Jan 1, 2010 Mar 1, 2010 May 1, 2010 Computer World share price Source: CRSP, Bloomberg Computer Planet share price Oct 1, 2009 Dec 1, 2009 Feb 1, 2010 Apr 1, 2010 Jun 1, 2010 Nov 1, 2009 Jan 1, 2010 Mar 1, 2010 May 1, 2010 NASDAQ Composite Index Source: CRSP, Bloomberg ABC.com, Inc. share price Storing and Moving Your Recordings Graph recordings are stored as .grec files in your “personal” folder, under the “grec” folder. Type “personal” to see where this is; normally it is C:\ado\personal. So by default Stata should store your .grec files in C:\ado\personal\grec. . personal your personal ado-directory is c:\ado\personal\ . dir c:\ado\personal\grec\ 0.4k 2/21/13 9:12 advanced_workshop_1.grec 0.7k 3/01/12 9:48 jeff_test_recording_graph_edits.grec 0.9k 5/17/12 15:47 line..grec 1.3k 11/21/12 10:12 x grid.grec Unfortunately, if you are not faculty, you are probably using lab computers to use Stata, and when they are re-imaged, you will lose the files in your grec folder. So you can store the recordings on your flash drive by clicking the Browse button when you save your recording. Now, when you are in the graph editor and click the play button, your recording will not appear in the list because it is not stored where Stata knows to look for it. Never fear, just click Browse, and navigate to where your .grec file is. If you want your recording to be available right from code, as in play(advanced_workshop_1), you will need to move it (at least temporarily) to the “grec” folder, or write the directory location in the code: play(E:\flashdrive\Graph Recordings\advanced_workshop_1) Using Schemes in Graphing Recordings are great if you are going to be making the same kind of graph a lot. But a recording for a scatter plot will hardly affect a histogram at all, and might even make it look terrible. If you want to change the look of all graphs that you make, you may want to make a scheme. Schemes are text files which tell Stata how to draw graphs. . scatter le year, scheme(economist) 60 55 45 50 45 40 50 55 life expectancy 65 65 60 . sysuse uslifeexp2, clear . scatter le year 40 1900 1910 1920 Year 1930 1940 1900 1910 1920 Year 1930 1940 More on Schemes . graph query, schemes Available schemes are s2color s2mono s2manual s2gmanual s2gcolor s1color s1mono s1rcolor s1manual sj economist see help scheme_s2color see help scheme_s2mono see help scheme_s2manual see see see see see see help help help help help help scheme_s1color scheme_s1mono scheme_s1rcolor scheme_s1manual scheme_sj scheme_economist Schemes are very powerful, because they let your implement a certain look without specifying a long series of options in every graph, or running every graph through the graph editor. However, creating schemes is fairly time consuming. For more on creating your own schemes, see: http://www3.eeg.uminho.pt/economia/nipe/2010_Stata_UGM/papers/Rising.pdf And http://www.ats.ucla.edu/stat/stata/seminars/stata_graph/graphsem.txt Manipulating Graphs: Memory vs. Disk When you draw a graph, it is stored in memory, under the name Graph. . sysuse auto, clear . scatter price mpg If you draw another graph, it replaces the previous one in memory, and is now called Graph. . scatter price length If you want to have multiple graphs up at the same time, you can use the name option. . scatter price mpg, name(scatter1) graph save moves your graph from memory to disk, saving it as a .gph file. . cd C:\Users\nickj22\Downloads\ . graph save scatter1 mygraph1.gph graph dir lists all graphs in memory and on disk (in the current directory) . graph dir Graph scatter1 mygraph1.gph graph drop drops a graph from memory. Graphs contain the data files they represent, so if the dataset is large, they can actually take up quite a bit of memory. . graph drop scatter1 Manipulating Graphs Demo See do file for demo Note: Annotated code is in the do file for all of these Histogram, with overlaid normal distribution Avg. education level Avg. education level Avg. education level NE N Cntrl 20 40 60 12 12 8 20 40 60 10 12 Percent 15 16 8 6 22 22 33 22 17 Avg. education level Avg. education level South West 38 38 31 25 13 6 15 13 8 6 8 5 0 6 50 33 0 20 20 9.5 10 9.5 10 10.5 average education level Source: US Census, 1980 and 1990 10.5 11 9.5 average education level 2 Percent normal educ Percent 0 Percent More Example Graphs 11 Graphs by Census region 10 10.5 11 More Example Graphs Use graph bar to make bar graphs Average July and January temperatures 80 by regions of the United States 81.0 73.5 72.1 60 73.3 46.2 27.9 21.7 0 20 40 46.1 N.E. N. Central July Source: U.S. Census Bureau, U.S. Dept. of Commerce South January West More Example Graphs Use graph combine to combine 3 graphs into one: Life expectancy at birth vs. GNP per capita Fraction .1 0 2.5 3 loggnp 3.5 4 4.5 .2 .15 .1 Fraction .05 0 55 60 65 70 75 Life expectancy at birth 75 70 65 60 55 Life expectancy at birth 80 .2 80 .3 2.5 3 Source: 1998 data from The World Bank Group 3.5 loggnp 4 4.5 More Example Graphs Graph matrix is a great alternative to a correlation matrix to investigate relationships between variables Correlations among 1998 life-expectancy data 50 60 70 80 20 40 60 80 100 3 2 Avg. annual % growth 1 0 -1 80 70 60 Life expectancy at birth 50 12 Log GNP per capita 10 8 6 100 80 60 safewater 40 20 -1 0 1 2 3 Source: The World Bank Group 6 8 10 12 More Example Graphs Get data labels (called marker labels in Stata) from the values of another variable Life expectancy vs. GNP per capita 80 North, Central, and South America Canada Jamaica Chile Panama Uruguay Venezuela Trinidad Mexico Dominican Republic Ecuador Para Colombia Honduras El Salvador Peru Nicaragua 65 70 75 United States Argentina Brazil Guatemala 55 60 Bolivia Haiti .5 5 10 GNP per capita (thousands of dollars) Data source: World Bank, 1998 15 20 25 30 More Example Graphs Xtline from a panel data set can overlay lines for each value of panel variable. The labels on the x-axis are often a bit off to start though, as shown. Calories Consumed by Subject 4500 4000 3500 Calories consumed 5000 Jan 1 2002 - Jan 1 2003 01jan2002 01apr2002 01jul2002 Date Tess Arnold 01oct2002 Sam 01jan2003