Transcript Slide 1
INTRODUCTION TO STATA Third group training course in application of information and communication technology to production and dissemination of official statistics 10 May – 11July 2007 Gereltuya Altankhuyag, Lecturer/Statistician, UNSIAP [email protected] 7/18/2015 1 Objectives Gain experience in using STATA to: Obtain descriptive statistics Tabulate Create Graphs Convert data files Create “Do-files” Create “Log-files” Use user written programs - “Ado-files” 7/18/2015 2 Method of teaching : Lectures using PowerPoint slides Performance of in-class practical exercises using real survey datasets Group assignment* Presentation of group assignment* Note: * Subject to availability of datasets 7/18/2015 3 Contents Resources for learning and using Stata Getting started with Stata Basic commands to inspect datasets Basic commands to create and change variables, labeling etc. Basic commands to reorganize datasets Advanced commands: merging and appending 7/18/2015 4 Contents Basic commands of statistics Basic commands of graphics Programming using “do” files Use of “ado-files” Creating “log-files” 7/18/2015 Inputting data from keyboard, a file or spreadsheet 5 Resources for learning and using STATA 7/18/2015 6 Resources for learning and using STATA STATA Is a statistical package for managing, analyzing and graphing data. Has both a command and menu driven interface. Is Designed for research. Has cross platform compatibility: Windows, MacOS, Unix, Solaris. 7/18/2015 7 Resources for learning and using STATA STATA Has three versions: Small (restricted version) – this is not available for Unix Intercooled (full version) Special Edition – STATA/SE Newest release: Version 9; we will use 7/18/2015 8 Resources for learning and using STATA Stata/SE Intercooled Stata Small Stata Speed Fastest Very Fast Fast No. of variables 32766 2047 99 Observations Memory dependent Memory dependent 1000 (approx) String variable 244 chars 80 chars 80 Chars Matrices 11000 x 11000 800 X 800 40 X 40 Version Professional Professional Small computers9 7/18/2015 Resources for learning and using STATA Manuals Getting Started User’s Guide Reference The Stata website: http://www.stata.com The Stata Press website - contains datasets used throughout the Stata manuals 7/18/2015 http://www.stata-press.com 10 Resources for learning and using STATA Additional subject-specific volumes may be purchased separately. These include : Longitudinal/Panel Data Reference Manual Mata Reference Manual Multivariate Statistics Reference Manual Programming Reference Manual Survey Data Reference Manual Survival Analysis and Epidemiological Tables Reference Manual Time-Series Reference Manual. 7/18/2015 11 Resources for learning and using STATA The Stata listserver – an active group of Stata users communicate over Internet The Stata journal – reviewed papers, regular columns, user-written software http://www.stata-journal.com/ NetCourses – offers training via the Inernet Books and other support materials 7/18/2015 12 Resources for learning and using STATA Technical support: by email, phone or fax [email protected] To subscribe to Statalist, send email to [email protected] body of message: subscribe statalist 7/18/2015 13 Resources for learning and using STATA Searchable Statalist archives at: http://www.stata.com/statalist/archive This includes requests for programs, solutions or advice, as well as answers and general discussions. 7/18/2015 14 Resources for learning and using STATA STATA Features Command prompt driven: Batch mode Interactive mode Modularity in its nature: Stata code can be shared, reused and easy to write extensions. Can incorporate survey design into estimation process. 7/18/2015 15 Resources for learning and using STATA STATA Capabilities Elementary and Specialized Statistical Analysis Graphics: most 2D Data Management – user-friendly Matrix Operations 7/18/2015 16 Resources for learning and using STATA Why STATA? Precise estimation for complex surveys User-written programs for non-standard estimation Excellent tools for panel data analysis Many econometric routines Command driven (in new version, menu-driven as well) 7/18/2015 17 Resources for learning and using STATA Why STATA? Runs efficiently on many platforms Concise and clear documentation (with friendly technical support) Cost less to buy Contact: [email protected] or visit : http://www.stata.com 7/18/2015 18 Getting Started with STATA 7/18/2015 19 Getting Started STARTING UP Click Start ► Programs ► Stata ► StataSE 9 Alternatively, from Windows Explorer, go to folder c:\stata9 Double click wstata.exe 7/18/2015 20 Getting Started 7/18/2015 21 Getting Started 7/18/2015 22 Getting Started Verifying version and installation of Stata: Command called “verinst” Syntax : verinst Result : verinst You are running Stata/SE 9.2 for Windows. Stata is correctly installed. You can type exit to exit Stata. 7/18/2015 23 Getting Started Comparing updates: Update Stata executable folder: \\Unitednations\Stata9\ name of file: wsestata.exe currently installed: 21 Nov 2006 Ado-file updates folder: \\Unitednations\Stata9\ado\updates\ names of files: (various) currently installed: 21 Nov 2006 Recommendation Type -update query- to compare these dates with what is available from http://www.stata.com. 7/18/2015 24 Getting Started RESULTS WINDOW: results and commands displayed here REVIEW WINDOW: past commands appear here VARIABLE WINDOW: variable list shown here 7/18/2015 25 COMMAND WINDOW: commands typed here Getting Started If at least one of the 4 Windows has not displayed, say, VARIABLE WINDOW, click on Window ► Variables or press CTRL6 You can type in only in Command Window You cannot close Results and Command Windows 7/18/2015 26 Getting Started Window Colors Click on Prefs ► General Preferences 7/18/2015 27 Getting Started Fonts of Windows The fonts or font size may be changed in each window by clicking the upper left window button and then clicking on Font. 7/18/2015 28 Getting Started The Command window: Page Up – Steps backwards through the command history Page Down – Steps forward through the command history Tab – Auto-completes a partially typed variable name 7/18/2015 29 Getting Started The Review window: To enter a command from the Review window 7/18/2015 Click once on a past command to copy it to the Command window Double-click on a past command to copy it to the Command window and execute it 30 Getting Started The Review window: Right-clicking on the Review window displays: 7/18/2015 Save Review Contents Copy Review Contents to Clipboard Font 31 Getting Started The Variables window: To enter a variable from the Variable window: Click once on a variable to copy it to the Command window Double-click on a variable and the variable will be copied twice Right-clicking on the Variables window displays a menu Define Notes for Variable “varname” … to open the Notes dialog for variable “varname” Font 7/18/2015 32 Getting Started MENU BAR: TOOL BAR : 7/18/2015 33 Getting Started Ask participants to open STATA Ask participants to open each command of menu bar and explain it. Ask participants to point cursor at each command of tool bar and explain it. 7/18/2015 34 Getting Started In Help Option of Menu/Header bar: Contents (for beginners unfamiliar with STATA commands) Search (for users who know the name of the command or topic they wish to search) 7/18/2015 35 Getting Started Obtaining Online Help on Commands For a user who wants more info on the regress command, enter help regress or use the Menu bar: Help ► STATA Command 7/18/2015 36 Getting Started Obtaining Topic Search We can do a search with the Menu bar: Help ► Search If you want to learn about regression, type search regression or about memory management search memory 7/18/2015 37 Getting Started Obtaining Net Search In the Search pop-up window, we can also do an internet search. Alternatively, we can issue the command net search regression to search on regression 7/18/2015 38 Getting Started Four ways of quitting from Stata Enter in Command Window: exit Press ALT-F4 keys Click on File ► Exit/Clear Click on Close button (X at upper right hand corner of Stata window). 7/18/2015 39 Getting Started Reading Pre-Existing Stata Dataset (1) When the data set is very large, we may want to enter set mem 64m Results: Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------set maxvar 5000 max. variables allowed 1.733M set memory 64M max. data space 64.000M set matsize 400 max. RHS vars in models 1.254M ----------66.987M (2) STATA can read only one dataset at a time. 7/18/2015 40 Getting Started Reading Pre-Existing Stata Dataset In folder c:\intropov\data, we have three Stata files, suppose we wish to read “hh.dta”, then enter use c:\intropov\data\hh.dta or alternatively, issue the two commands: cd c:\intropov\data NOTE: The Default folder is c:\DATA We use the CD command to change use hh directory Stata datasets have extension names as “dta” 7/18/2015 41 Getting Started – “clear” Deletes all contents ( data, variables, labels) from the STATA memory Does not delete any data already saved to the HD Does not clear Review window contents It does not need any arguments Syntax clear 7/18/2015 42 Getting Started – “clear” Use of “clear” command: cd c:\intropov\data clear use hh Or use hh, clear or use c:\intropov\data\hh.dta, clear 7/18/2015 43 Getting Started -Arithmetic operators + * / ^ 7/18/2015 addition subtraction multiplication division power 44 Getting Started - Relational operators > < >= <= == ~= != 7/18/2015 Greater than Less than More than or equal Less than or equal Equal Not equal Not equal 45 Getting Started - Logical operators ~ & | 7/18/2015 Not And Or 46 Getting Started - Numlist Numlist – a list of numbers. 7/18/2015 1/3 3/1 -8/-5 three numbers: 1, 2, 3 the same three numbers in reverse order four numbers: -8, -7, -6, -5 1 2 to 4 10 15 to 30 1 2:4 10 15:30 1(1)3 1(2)9 9(-2)1 four numbers: 1, 2, 3, 4 five numbers: 10, 15, 20, 25, 30 same as 1 2 to 4 same as 10 15 to 30 three numbers: 1, 2, 3 five numbers: 1, 3, 5, 7, 9 five numbers: 9, 7, 5, 3, and 1 1 2 3/5 8(2)12 eight numbers: 1, 2, 3, 4, 5, 8, 10, 12 47 Getting Started - Syntax The basic Stata language syntax is: [by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [,options] 7/18/2015 48 Getting Started varlist – denotes a list of variable names =exp – denotes an algebraic expression command – denotes a Stata command options – denotes a list of options. Many commands take command-specific options. Options are indicated by typing a comma at the end of the command, followed by the options you want to use. For instance: sum, details 7/18/2015 49 Getting Started if / by / in: these are not commands associated with commands when a condition needs to be satisfied 7/18/2015 50 Getting Started if exp - restricts the scope of a command to those observations for which the value of the expression is true if exp - is added at the end of a command with associated variable if any Syntax command …….. if sex == male 7/18/2015 51 Getting Started by varlist– asks Stata to repeat a command for each subset of the data for which values of the variables in the varlist are equal by varlist - is added before a command followed by variable name : and then command Syntax by sex : command …. Note: sort dataset by “sex” and then run this syntax. 7/18/2015 52 Getting Started in range– restricts the scope of the command to a specific observation range. in range - is added at the end of command with associated variable if any Syntax Command … in 1 / 100 7/18/2015 53 Getting Started Command, option and variable names may be abbreviated to the shortest string of characters: . summarize region, detail . sum reg,d Stata respects case: Stata commands are lowercase “Summarize, SUMMARIZE and summarize” are three distinct names 7/18/2015 54 Getting Started - naming A name is a sequence of one to 32 letters (A-Z and (a-z), digits (0-9) and underscores (_). The first character of a name must be a letter or an underscore Not begin variable names with an underscore All of Stata’s buil-in variables begin with an underscore 7/18/2015 55 Getting Started - naming Stata reserves the following names: _all double long _rc _b float _n _se byte if _N _skip _coef in _pi using _cons int _pred with 7/18/2015 56 Getting Started – prefix commands Prefix commands are used to prefix Stata commands An example of prefix command Syntax: by varlist[, option]: by region, sort: sum educhead agehead 7/18/2015 region is by’s varlist Sort is by’s option 57 Getting Started – prefix commands Examples of prefix commands in Stata are: Prefix commands Description by Run command on subset of data svy Run command and adjust results for survey sampling run command with stepwise variable inclusion /exclusion stepwise Capture 7/18/2015 run command and capture its return code 58 Getting Started - weight Weight used for: Estimation of population from a sample Compensate under/over representation of HH in a sample 7/18/2015 59 Getting Started - weight Weight indicates the weight to be attached to each observation. The syntax of weight is: [weightword=exp] “weightword” is not Stata commands Where weightword is one of: 7/18/2015 60 Getting Started - weight Weightword Meaning Weight Default treatment of wieghts fweight Frequency weights pweight Sampling weights aweight Analytic weights iweight Importance weight 7/18/2015 61 Getting Started - weight Frequency weight (fweight): indicates the number of duplicated observations. Must take integer values. For instance: if fweight associated with an observation is 5, that means there are 5 such observations each identical. Syntax 7/18/2015 command var [weightword=weightvar] 62 Getting Started - weight Sampling weight (pweight): the inverse of the probability that this is included observation was sampled. For instance: pweight of 100 indicates that this observation is representative of 100 subjects Syntax 7/18/2015 Command varname [weightword=weightvar] 63 Getting Started - weight Analytic weight (aweight): Inversely proportional to the variance of an observation (δ2/wj). It means that the variance of jth observation is assumed to be (δ2/wj). Useful when working with data that contain averages. Syntax command varname [weightword=weightvar] 7/18/2015 64 Getting Started - weight importance weight (iweight): No formal definition available Indicates the relative “importance” of the observation Syntax 7/18/2015 Command varname [weightword=weightvar] 65 To be continued. … END Introduction to STATA Now please proceed to perform EXERCISE 1 7/18/2015 66