Transcript Slide 1

INTRODUCTION TO STATA
Third group training course in application of information
and communication technology to production and
dissemination of official statistics
10 May – 11July 2007
Gereltuya Altankhuyag, Lecturer/Statistician, UNSIAP
[email protected]
7/18/2015
1
Objectives

Gain experience in using STATA to:
Obtain descriptive statistics
 Tabulate
 Create Graphs
 Convert data files
 Create “Do-files”
 Create “Log-files”
 Use user written programs - “Ado-files”

7/18/2015
2
Method of teaching :

Lectures using PowerPoint slides

Performance of in-class practical
exercises using real survey datasets

Group assignment*

Presentation of group assignment*
Note: * Subject to availability of datasets
7/18/2015
3
Contents
 Resources for learning and using Stata

Getting started with Stata

Basic commands to inspect datasets

Basic commands to create and change
variables, labeling etc.

Basic commands to reorganize datasets
Advanced commands: merging and
appending
7/18/2015

4
Contents

Basic commands of statistics

Basic commands of graphics

Programming using “do” files

Use of “ado-files”

Creating “log-files”

7/18/2015
Inputting data from keyboard, a file or
spreadsheet
5
Resources for learning and
using STATA
7/18/2015
6
Resources for learning and using
STATA
STATA




Is a statistical package for managing, analyzing
and graphing data.
Has both a command and menu driven interface.
Is Designed for research.
Has cross platform compatibility: Windows,
MacOS, Unix, Solaris.
7/18/2015
7
Resources for learning and using
STATA
STATA

Has three versions:




Small (restricted version) – this is not available for
Unix
Intercooled (full version)
Special Edition – STATA/SE
Newest release: Version 9; we will use
7/18/2015
8
Resources for learning and using
STATA
Stata/SE
Intercooled Stata
Small Stata
Speed
Fastest
Very Fast
Fast
No. of
variables
32766
2047
99
Observations
Memory
dependent
Memory
dependent
1000 (approx)
String
variable
244 chars
80 chars
80 Chars
Matrices
11000 x 11000
800 X 800
40 X 40
Version
Professional
Professional
Small
computers9
7/18/2015
Resources for learning and using
STATA

Manuals





Getting Started
User’s Guide
Reference
The Stata website: http://www.stata.com
The Stata Press website - contains datasets
used throughout the Stata manuals

7/18/2015
http://www.stata-press.com
10
Resources for learning and using
STATA

Additional subject-specific volumes may be
purchased separately. These include :
Longitudinal/Panel Data Reference Manual
 Mata Reference Manual
 Multivariate Statistics Reference Manual
 Programming Reference Manual
 Survey Data Reference Manual
 Survival Analysis and Epidemiological Tables
 Reference Manual
 Time-Series Reference Manual.
7/18/2015

11
Resources for learning and using
STATA




The Stata listserver – an active group of Stata
users communicate over Internet
The Stata journal – reviewed papers, regular
columns, user-written software
http://www.stata-journal.com/
NetCourses – offers training via the Inernet
Books and other support materials
7/18/2015
12
Resources for learning and using
STATA
 Technical support: by email, phone or fax
[email protected]
 To subscribe to Statalist, send email to
[email protected]
body of message: subscribe statalist
7/18/2015
13
Resources for learning and using
STATA

Searchable Statalist archives at:
http://www.stata.com/statalist/archive
This includes requests for programs, solutions
or advice, as well as answers and general
discussions.
7/18/2015
14
Resources for learning and using
STATA

STATA Features
Command prompt driven:




Batch mode
Interactive mode
Modularity in its nature: Stata code can be
shared, reused and easy to write extensions.
Can incorporate survey design into estimation
process.
7/18/2015
15
Resources for learning and using
STATA
STATA Capabilities




Elementary and Specialized Statistical
Analysis
Graphics: most 2D
Data Management – user-friendly
Matrix Operations
7/18/2015
16
Resources for learning and using
STATA
Why STATA?





Precise estimation for complex surveys
User-written programs for non-standard estimation
Excellent tools for panel data analysis
Many econometric routines
Command driven (in new version, menu-driven as
well)
7/18/2015
17
Resources for learning and using
STATA
Why STATA?



Runs efficiently on many platforms
Concise and clear documentation (with friendly
technical support)
Cost less to buy
Contact: [email protected]
or visit : http://www.stata.com
7/18/2015
18
Getting Started with
STATA
7/18/2015
19
Getting Started
STARTING UP
 Click
Start ► Programs ► Stata ► StataSE 9
 Alternatively, from Windows Explorer, go to
folder
c:\stata9
Double click
wstata.exe
7/18/2015
20
Getting Started
7/18/2015
21
Getting Started
7/18/2015
22
Getting Started
Verifying version and installation of Stata:
 Command called “verinst”
 Syntax : verinst
 Result :




verinst
You are running Stata/SE 9.2 for Windows.
Stata is correctly installed.
You can type exit to exit Stata.
7/18/2015
23
Getting Started
Comparing updates:
Update











Stata executable
folder:
\\Unitednations\Stata9\
name of file:
wsestata.exe
currently installed: 21 Nov 2006
Ado-file updates
folder:
\\Unitednations\Stata9\ado\updates\
names of files:
(various)
currently installed: 21 Nov 2006
Recommendation
Type -update query- to compare these dates with what is available
from
http://www.stata.com.
7/18/2015
24
Getting Started
RESULTS WINDOW: results and commands displayed
here
REVIEW WINDOW:
past commands
appear here
VARIABLE WINDOW:
variable list shown here
7/18/2015
25
COMMAND WINDOW: commands typed here
Getting Started
If at least one of the 4 Windows has not
displayed, say, VARIABLE WINDOW, click
on
Window ► Variables
or press CTRL6
 You can type in only in Command Window
 You cannot close Results and Command
Windows

7/18/2015
26
Getting Started
Window Colors

Click on Prefs ► General Preferences
7/18/2015
27
Getting Started
Fonts of Windows

The fonts or font size may be changed in each
window by clicking the upper left window button
and then clicking on Font.
7/18/2015
28
Getting Started
The Command window:
 Page Up – Steps backwards through the
command history
 Page Down – Steps forward through the
command history
 Tab – Auto-completes a partially typed
variable name
7/18/2015
29
Getting Started
The Review window:
 To enter a command from the Review window


7/18/2015
Click once on a past command to copy it to the
Command window
Double-click on a past command to copy it to the
Command window and execute it
30
Getting Started
The Review window:
 Right-clicking on the Review window
displays:



7/18/2015
Save Review Contents
Copy Review Contents to Clipboard
Font
31
Getting Started
The Variables window:
 To enter a variable from the Variable window:
 Click once on a variable to copy it to the Command
window
 Double-click on a variable and the variable will be copied
twice
 Right-clicking on the Variables window displays a
menu
 Define Notes for Variable “varname” … to open the Notes
dialog for variable “varname”
 Font
7/18/2015
32
Getting Started
MENU BAR:
TOOL BAR :
7/18/2015
33
Getting Started
 Ask participants to open STATA
 Ask participants to open each command of
menu bar and explain it.
 Ask participants to point cursor at each
command of tool bar and explain it.
7/18/2015
34
Getting Started
In Help Option of Menu/Header bar:
 Contents (for beginners unfamiliar with STATA
commands)
 Search (for users who know the name of the
command or topic they wish to search)
7/18/2015
35
Getting Started
Obtaining Online Help on Commands

For a user who wants more info on the regress
command, enter
help regress
or use the Menu bar:
Help ► STATA Command
7/18/2015
36
Getting Started
Obtaining Topic Search

We can do a search
with the Menu bar:
Help ► Search

If you want to learn about regression, type
search regression
or about memory management
search memory
7/18/2015
37
Getting Started
Obtaining Net Search


In the Search
pop-up window,
we can also do an
internet search.
Alternatively, we can issue the command
net search regression
to search on regression
7/18/2015
38
Getting Started
Four ways of quitting from Stata



Enter in Command Window:
exit
Press ALT-F4 keys
Click on
File ► Exit/Clear

Click on Close button (X at upper right hand
corner of Stata window).
7/18/2015
39
Getting Started
Reading Pre-Existing Stata Dataset
(1) When the data set is very large, we may want to enter
set mem 64m
Results:
Current memory allocation
current
memory usage
settable
value description
(1M = 1024k)
-------------------------------------------------------------------set maxvar
5000 max. variables allowed
1.733M
set memory
64M max. data space
64.000M
set matsize
400 max. RHS vars in models
1.254M
----------66.987M







(2)
STATA can read only one dataset at a time.
7/18/2015
40
Getting Started
Reading Pre-Existing Stata Dataset

In folder c:\intropov\data, we have three
Stata files, suppose we wish to read “hh.dta”,
then enter
use c:\intropov\data\hh.dta
or alternatively, issue the two commands:
cd c:\intropov\data NOTE: The Default folder is c:\DATA
We use the CD command to change
use hh
directory
 Stata datasets have extension names as “dta”
7/18/2015
41
Getting Started – “clear”





Deletes all contents ( data, variables, labels)
from the STATA memory
Does not delete any data already saved to the
HD
Does not clear Review window contents
It does not need any arguments
Syntax
clear
7/18/2015
42
Getting Started – “clear”

Use of “clear” command:
cd c:\intropov\data
clear
use hh
Or
use hh, clear or
use c:\intropov\data\hh.dta, clear
7/18/2015
43
Getting Started -Arithmetic
operators





+
*
/
^
7/18/2015
addition
subtraction
multiplication
division
power
44
Getting Started - Relational
operators







>
<
>=
<=
==
~=
!=
7/18/2015
Greater than
Less than
More than or equal
Less than or equal
Equal
Not equal
Not equal
45
Getting Started - Logical operators



~
&
|
7/18/2015
Not
And
Or
46
Getting Started - Numlist
Numlist – a list of numbers.
7/18/2015
1/3
3/1
-8/-5
three numbers: 1, 2, 3
the same three numbers in reverse order
four numbers: -8, -7, -6, -5
1 2 to 4
10 15 to 30
1 2:4
10 15:30
1(1)3
1(2)9
9(-2)1
four numbers: 1, 2, 3, 4
five numbers: 10, 15, 20, 25, 30
same as 1 2 to 4
same as 10 15 to 30
three numbers: 1, 2, 3
five numbers: 1, 3, 5, 7, 9
five numbers: 9, 7, 5, 3, and 1
1 2 3/5 8(2)12
eight numbers: 1, 2, 3, 4, 5, 8, 10, 12
47
Getting Started - Syntax

The basic Stata language syntax is:
[by varlist:] command [varlist] [=exp] [if exp]
[in range] [weight] [,options]
7/18/2015
48
Getting Started
varlist – denotes a list of variable names
=exp – denotes an algebraic expression
command – denotes a Stata command
options – denotes a list of options. Many
commands take command-specific options.
Options are indicated by typing a comma at the
end of the command, followed by the options
you want to use. For instance: sum, details
7/18/2015
49
Getting Started
if / by / in:
these are not commands
 associated with commands when a
condition needs to be satisfied

7/18/2015
50
Getting Started



if exp - restricts the scope of a
command to those observations for which
the value of the expression is true
if exp - is added at the end of a command
with associated variable if any
Syntax
 command ……..
if sex == male
7/18/2015
51
Getting Started



by varlist– asks Stata to repeat a command for
each subset of the data for which values of the
variables in the varlist are equal
by varlist - is added before a command
followed by variable name : and then
command
Syntax
 by sex : command ….
Note: sort dataset by “sex” and then run
this syntax.
7/18/2015
52
Getting Started



in range– restricts the scope of the
command to a specific observation range.
in range - is added at the end of
command with associated variable if any
Syntax
 Command … in 1 / 100
7/18/2015
53
Getting Started

Command, option and variable names may be
abbreviated to the shortest string of characters:
. summarize region, detail
. sum reg,d
 Stata respects case: Stata commands are
lowercase
 “Summarize, SUMMARIZE and summarize” are
three distinct names
7/18/2015
54
Getting Started - naming




A name is a sequence of one to 32 letters (A-Z
and (a-z), digits (0-9) and underscores (_).
The first character of a name must be a letter
or an underscore
Not begin variable names with an underscore
All of Stata’s buil-in variables begin with an
underscore
7/18/2015
55
Getting Started - naming

Stata reserves the following names:
_all
double
long
_rc
_b
float
_n
_se
byte
if
_N
_skip
_coef
in
_pi
using
_cons
int
_pred
with
7/18/2015
56
Getting Started – prefix commands

Prefix commands are used to prefix Stata
commands

An example of prefix command Syntax:


by varlist[, option]:
by region, sort: sum educhead agehead


7/18/2015
region is by’s varlist
Sort is by’s option
57
Getting Started – prefix commands

Examples of prefix commands in Stata are:
Prefix
commands
Description
by
Run command on subset of data
svy
Run command and adjust results for survey
sampling
run command with stepwise variable inclusion
/exclusion
stepwise
Capture
7/18/2015
run command and capture its return code
58
Getting Started - weight

Weight used for:
Estimation of population from a sample
 Compensate under/over representation of
HH in a sample

7/18/2015
59
Getting Started - weight
 Weight indicates the weight to be
attached to each observation. The
syntax of weight is:
[weightword=exp]
 “weightword” is not Stata commands
 Where weightword is one of:
7/18/2015
60
Getting Started - weight
Weightword
Meaning
Weight
Default treatment of wieghts
fweight
Frequency weights
pweight
Sampling weights
aweight
Analytic weights
iweight
Importance weight
7/18/2015
61
Getting Started - weight
Frequency weight (fweight):
 indicates the number of duplicated
observations. Must take integer values.


For instance: if fweight associated with an
observation is 5, that means there are 5 such
observations each identical.
Syntax

7/18/2015
command var [weightword=weightvar]
62
Getting Started - weight
Sampling weight (pweight):
 the inverse of the probability that this is
included observation was sampled.


For instance: pweight of 100 indicates that this
observation is representative of 100 subjects
Syntax

7/18/2015
Command varname [weightword=weightvar]
63
Getting Started - weight
Analytic weight (aweight):
 Inversely proportional to the variance of an
observation (δ2/wj). It means that the variance of jth
observation is assumed to be (δ2/wj).


Useful when working with data that contain averages.
Syntax
 command varname [weightword=weightvar]
7/18/2015
64
Getting Started - weight
importance weight (iweight):
 No formal definition available
 Indicates the relative “importance” of the
observation
 Syntax

7/18/2015
Command varname [weightword=weightvar]
65
To be continued. …
END
Introduction to STATA
Now please proceed to perform
EXERCISE 1
7/18/2015
66