Introduction to STATA

Download Report

Transcript Introduction to STATA

Introduction to
STATA
STATA Workshop
September 6, 2012
1:30 p.m.-2:30 p.m.
Workshop Outline




Opening STATA
How STATA thinks
Using commands
Tracking your work
 Do files
 Logs







Importing data from Excel (two ways)
Generating New Variables
Running OLS regressions
Regression Tests
Manipulating your data
Copying results over to Word
Saving your data and work
Opening STATA
 Econ Lab Computers-- Stata 10 is on all the lab computers,
Stata 11 is on the newer lab computers
 All campus computers--both stata 10 and 11 can be found on
the G drive. Which can be accessed from a remote desktop
connection.
 To map your G drive: go to computer, click on map network
drive, type \\student1\win95library into the server field
 For remote desktop connections open “Remote Desktop
Connection” and connect to “sas.coloradocollege.edu”
Thinking in STATA
Stata is a model for working with data: similar to a word processor
• You can work with a copy of your data that is loaded into the
processor memory. However, there will be no changes to the
copy on the disk unless you explicitly replace the file.
Stata is both connected to the web and your folders
Stata uses commands
Stata can save several different file types:
 .do files—txt files with your commands, for future reference
and editing
 .log files—txt files with your output, for future reference and
printing
 .dta files—data files in stata format
 .gph files—graph files in stata format
 .ado files—programs in stata
Command Results, main place to
monitor your work
Command Summary
Command Window
Data Summary
Commands
 Syntax:
Command
varlist
List of Variables
if exp
If expression
• Set with a
qualifier like >5
meaning greater
than five, or ==20
meaning is
twenty
in range
Observation
number
• written
beginning #/end
#
• Ex—1/10
Most Common Commands
Category
Getting online help
Stata Commands
search, findit, help
Operating system interface
pwd, cd, sysdir, mkdir, rmdir, dir, erase, copy,
type
Using and saving data from
disk
use, save, append, merge, compress
Inputting data into Stata
input, edit, infile, infix, insheet
The Internet and updating Stata
update, net, ado, news
Basic data reporting
describe, codebook, list, browse, count, inspect,
summarize, table, tabulate
Data manipulation
generate, replace, egen, rename, drop, keep, sort,
encode, decode, order, by, reshape, collapse
Formatting
Keeping track of your work
Convenience
format, label
log, notes
display
Getting Help
 Stata will provide information when an error occurs
 Just click on the blue error message to get more information
 A viewer will pop up with a reason for the error
 Search
 To search for the appropriate command type “help” into your
command window.
Working with Directories
 Stata is interactively connected to your folders
 You can directly pull or save files from anywhere on your
computer
 pwd tells you what directory you are currently working in
 use filename open any file saved in that directory
 save filename save a file in stata format
 save filename, replace overwrites the dataset
 mkdir makes a new directory, (a new folder)
 cd  change your directory
You can get to my directory by typing “cd
C:users\cbenson\workshops
*IN General DO NOT SAVE IN THE STATA DIRECTORY
--save your work files elsewhere, like your hdrive.

Exercise 1
 Create a directory for your STATA work in your personal file
on the network.
Tracking your work
 Logs-keeps track of your all your commands and results
 Do Files-keeps your commands and allows you to re-execute
work.
Logs
 Saves your results window
 Create a log by clicking on the notebook (no pencil), or by
typing “log using filename” this will save in the current directory.
•
•
•
•
Suspend a log by typing “log off ”
Re-open a log by typing “log on”
Close a log by typing “log close”
Add to a closed log by typing “log using filename, append”
Do Files
 You’ll want do files for your thesis!
 Do files allow you to keep your commands so that you can
re-run your work at a later date.
 They are very helpful for generating new variables, data
manipulation that is multi-step, and tedious repetitive
commands.
 To start a do-file, click on the notebook with a pencil button,
or go to “window-do file editor—new do file”
Exercise 2
 Create a log for today’s workshop
Importing Data from Excel
 Copy and paste
 In Excel, copy your full data set
 Open your data editor by clicking “data” then “data editor”
 Click on the first cell, and then “paste”
 Use first row as “variable names”
 Save as a “.dta” file
 Save excel file as CSV (comma delimited) and import in Stata.
 In Excel, save as a CSV (comma delimited) file.
 Open Stata, go to file-import-ASCII data created by a spreadsheet.
 Browse for CSV file and import
 Save as a “.dta” file
Clearing Data
 .clear removes any data that you might be working on, unless
you have saved the data, none of the changes you made will affect
the data set.
 This is important to do before you import new data
 Memory Space
 Stata likes to use as little space on your computer as possible
 set mem xxxm expands the memory size
 compress compresses your current data
 set matsize xxx expands the number of variables you can have in your set.
 Dictionaries
 Can specify how you want to import data (search “dictionaries” to
learn more
Exercise 3
 Reset your memory to 100m
 Reset your matsize to 800
 Import the data set “STATA_Lab_DATA” from excel.
 Compress your data
DATA Reporting
 Describe basic information on variables
 Summarize basic descriptive statistics
 Codebook descriptive statistics, lots of information
 List spreadsheet form
 Label create variable labels and values
 Table frequency table
 q  stops STATA in whatever it is running
 Inspect displays simple summary of data’s attributes
 Tabulate table of frequencies
 Count count observations satisfying specified conditions
Exercise 4
 Find the summary statistics including mean, min, max and
standard deviation of each variable
Generating New Variables
 To generate a new variable go
to “data—create or change
data—new variable”You’ll get
a screen like on the side
 Type in an expression that
you want to generate.
 Alternatively, you could
type the command, “generate
new variable name = expression”
Exercise 5
Generate a variable named lnprice = ln(price)
2. Generate a variable that is an indicator variable for
domestic cars (there are additional ways to go about this,
I’ve included one below)
1.
 Generate domestic=0
 Replace domestic=1 if foreign==“Domestic”
3.
Generate fuelefficient=1 if mgp>25
Running OLS Regressions
 To run a basic OLS regression,
go to statistics linear
models and related Linear
regression.You’ll end up with
a window like on the right.
 Insert your dependent variable
and independent variables
from the two drop-down
menus.
Alternatively, you can also type: “regress dependent variable independent variable
names
OLS Continued—The shortcut (ish)
 Using your command window
Regress depvar indepvars [if] [in] [weight] [,options]
Exercise 6
 Run a model using several variables in your data set.
 Example: “regress price mpg headroom trunk weight”

Econometric Tests and Corrections
 Heteroskedasticy
 Normality
 Multicollinearity and high correlation
 Serial Correlation/autocorrelation
Testing for Heteroskedasticity (1)
 Null Hypothesis is that the
error terms are normally
distributed
 If you do have
heteroskedasticity your
standard errors are not reliable
 To test for
heteroskedasticity…
--Directly after your regression,
use the command
imtest, white will
show the White test for
heteroskedasticity
Correcting Heteroskedasticy
 If you find that you have heteroskedasticity (your p-value is
greater than 0.1) then you can run your regression with
robust standard errors.
regress price mpg headroom trunk, robust
Testing for Heteroskedatsticity (2)
 You can also look at the residuals of your regression to see if you have non-
normal errors.
 Commands
--predict resid, r creates residuals saves as “resid”
--plot resid dependent_variable graphs residuals against
the dependent variable
Test for Skewness of Residuals
 Run an Skewness/Kurtosis Test
--predict resid, r
--sktest resid calculates skewness/kurtosis
Exercise 7
 Check to see if you have heteroskedasticity
Detecting Multicollinearity
 To check if you have multicollinearity, you will run a
correlation matrix and see if you have a high rho between
two variables.
correl varlist runs a correlation matrix of all the variables
specified
Typically rhos greater than 0.6 should be looked at with caution.
Detecting Serial Correlation
 Auto correlation is common in time-series data sets
 To test for serial correlation you want to use a Durbin-
Watson test.
 For the Durbin-Watson test you need to time-set your data.
--tsset time_variable or xtset time_variable tells stata your
data is a time series
--dwstat finds the durbin-watson statistic
Exercise 8
 Create a correlation matrix of your data
 Create a time variable
 Time set your data
 Test for auto-correlation
Other Data Manipulation
rename rename a variable
--rename old_name new_name
-drop delete a variable or observations
-keep keep a variable or observation
-replace replace a variable with a another (replace observations)
-sort sort variables in ascending order
-gsort sort variables in ascending or descending order
-encode change a string to numeric
-decode change a numeric variable to a string
-byruns
-mvdecodechanges occurences of numlist to a missing value code
-mvencodechanges missing to specified numbers
Getting Help
 help command  command information
 search keyword searches all sources
 search net keyword only searches the internet
 findit keyword searches unofficial sites as well
 You can also google any problem you are having and you’ll
likely pull up a stata forum at stata.com
Neatly Putting Results into Word
 You want your results to be easily read in a word document.
 The easiest and quickest way to copy your results into a word
document is to
1. Highlight the portion you want
2. Right click on the highlighted portion
3. Click copy as picture
4. Past (ctrl v) into a word document
Practice—copy as picture and paste
 You should end up with something that looks pretty—like this…
. regress price mpg headroom trunk weight
Source
SS
df
MS
Model
Residual
204838391
430227005
4
69
51209597.9
6235173.98
Total
635065396
73
8699525.97
price
Coef.
mpg
headroom
trunk
weight
_cons
-54.79153
-726.5434
23.04248
2.011936
3114.94
Std. Err.
85.91635
462.0322
108.3649
.7036432
3648.08
t
-0.64
-1.57
0.21
2.86
0.85
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.526
0.120
0.832
0.006
0.396
=
=
=
=
=
=
74
8.21
0.0000
0.3225
0.2833
2497
[95% Conf. Interval]
-226.19
-1648.272
-193.1396
.6082062
-4162.779
116.6069
195.1856
239.2246
3.415666
10392.66
Saving your Data and Work
 To save your work, you want to close your work log.
 To save your data, you want to go to file, save as, and name
your .dta file. –Please note that “saving” will only save the
data, not your commands or log.
Conclusion
 This was a brief introduction to Stata. We covered the basics
of opening stata, importing data, generating new variables,
running a basic regression and discussed common problems
and fixes, and saving your work in stata and word.
 The best advice for each of you is to go play around with stata
and have fun.
 If you need or want help, I’m happy to help you.
Questions?
If you have additional questions at a later date, please stop by this lab or email
[email protected]