Transcript Stata

CCPR Computing Services

Workshop 2: Stata October 20, 2004

1

Outline

  Converting Data between Statistical Packages Stata –  Basic Commands  Command Syntax       Abbreviations Missing Values Combining Data Using do-files Getting Help Updating Stata 2

Converting Data – Windows Stat/Transfer  SAS, Stata, S-Plus, SPSS, Excel, and more   Windows interface Enter “in” data and “out” data  Enter info on other tabs as necessary  Check results!

3

Converting Data – Unix Stat/Transfer        From within stat-transfer invoke stat/transfer (specific to Unix machine) at stat/transfer prompt, enter:

copy datfile1.ext1 datfile2.ext2

datfile1.ext1 = original file, datfile2.ext2 = new file From Unix prompt

st datfile1.ext1 datfile2.ext2

(replace st with local Stat/Transfer invocation) See manual for more info and options Check results!

4

Converting Data – DBMS/Copy

 DBMS/Copy for Unix (without xwindows)    From Unix prompt   dbmsnox

indatfile.ext1 outdatfile.ext2

ext1

and

ext2

are “pseudo” extensions  spsswin = SPSS for Windows   Stata7 = Stata 7 sas7sun = SAS for Unix v7  ssdsun = SAS for Unix v6 Example – windows spss to stata7

dbmsnox mydat.spsswin mydat.stata7

Check results!

5

Converting Data

 See ATS website for transferring files between SAS, Stata, and SPSS  http://www.ats.ucla.edu/stat/sas/faq/convert_pkg.htm

6

Stata - Getting Started

 Windows: Programs > Stata8  Command Window: enter commands  Results Window  Other: review, variables, do-editor  Unix:  Interactive Stata  commands and results show in same window  Batch Stata  nice +10 stata –b do

myjob.do

7

Basic Commands

 Handout 1 (green)   Reading raw data  insheet, input, infix, infile Using/saving a Stata dataset  use,

webuse

 save 8

Basic Commands, cont.

   Describing data 

describe

   Summarize

codebook inspect

Listing data 

list

Tables of statistics   

table

tab1

varlist

(one-way tabulation of variables) tab2 varlist (two-way tabulations of variables) 9

Basic Commands, cont.

 Changing data  drop 

keep

 generate 

encode var, generate newvar

 recode  replace 10

Basic Commands, cont.

 Labeling data 

label variable

label define

label values

label list

11

A few other commands

  compress - saves data more efficiently reshape – long/wide  sort/ gsort  order  rename 12

Stata Syntax

 Basic command syntax: [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [

weighttype=weight

] [, options]  Brackets = optional portions  Italics = user specified 13

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 1 (webuse union)  Stata Command: .

summarize

 Result: Summarizes all dataset variables (_all) 14

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 2 (webuse union)   Stata command:

.summarize age

Result: Summarizes variable age 15

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 3 (webuse union)   Stata Command:

.summarize age if year >= 80

Result:  Summarizes age, includes only observations with year >= 80 16

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 4 (webuse union)  Stata Command: 

.summarize age if year >= 80 in 1/100

Result:  Summarizes variable age, includes only first 100 obs and only obs with year >= 80 17

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 5 (webuse union)   Stata Command:

.by black: summarize age if year >= 80

Result:  Summarizes age separately for different values of black, including only obs for which year >= 80 18

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 6 (webuse union)   Stata Command:

.bysort black: summarize age if year >= 80, detail

Result:  Detailed summaries of variable age, separated over different values of black, includes only obs with year >= 80 19

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 7 (webuse union)   Generally [=

exp

] used with commands

generate

and

replace

Stata Commands: 

.generate agelt30 = age .replace agelt30 = 0 if age < 30 .replace agelt30 = 0 if age >= 30 & age < .

Result:  Variable agelt30 set equal to 1, 0, or missing 20

Stata Syntax, cont.

 Complete syntax [by

varlist

:] command [

varlist

] [=

exp

] [if

exp

] [in

range

] [weighttype

=weight

] [, options]  Example 8    Stata Command:

.summarize race [pweight=final_wt]

Results:  Summarizes variable race accounting for probability weight called final_wt. Note:  There are four different types of weights in Stata…be careful.

21

Abbreviations in Stata

   Abbreviating command, option, and variable names  shortest uniquely identifying name is sufficient Example:    Variables in use = make, price, mpg Stata command, not abbreviated: .summarize make price Stata command, abbreviated: .su ma p  Exceptions describe (d), list (l), and some others   Commands that change/delete Functions implemented by ado-files 22

Missing Values in Stata 8

   Stata 8   27 representations of numerical “missing” ., .a, .b, … , .z

Relational comparisons  Biggest number < . < .a < .b < … < .z

Mathematical functions  missing + nonmissing = missing  String missing =  Empty quote: “” 23

Missing Values in Stata - Pitfalls

Pitfall #1

 Stata7 vs. Stata8 missing values:

Stata 7

varname != .

In Stata 8

varname < .

varname

== .

varname

>= .

Pitfall #2

 Do NOT: .replace weightlt200 = 0 if weight >= 200  INSTEAD: .

replace weightlt200 = 0 if weight >= 200 & weight < .

24

Combining Data

 Append vs. Merge  Append – same variables, different observations  Merge - same or related observations, different variables  Appending data in Stata  Handout 2 25

Combining Data- merge and joinby

 Demonstrate with two sample datasets:  Neighborhood and County samples  One-to-one merge   Handout 3 One-to-many merge – use match merge   Handout 4 Many-to-many merge – use joinby  Handout 5 26

Combining Data

 Variable _merge (generated by merge and joinby) _merge 1 2 3 Observation in master data Yes No Yes Observation in “using” data No Yes Yes   “update” option also includes _merge=4,5  “update” changes default action when matched observation has missing values in master and non missing in “using” data Pitfalls   Pitfall_merge1 handout 6 Pitfall_merge2 handout 7 27

Do-files

 What is a do-file?

 Stata commands can be executed interactively or via a do-file  A do-file is a text file containing commands that can be read by Stata  Handouts are do-files  Stata command .do

dofilename.do

28

Do-files

 Why use a do-file?

 Documentation  Communication  Reproduce interactive session?  Interactive vs. do-files  Record EVERYTHING to recreate results in your do-file!!

29

Do-files > Header, Version Control

 Header  Include in do-files – name, project, project location, date, purpose, inputs, outputs, special instructions  Version Control  include version at top of do-file  Why?

30

Do-file > End of Line Character

 Commands requiring multiple lines  delimit ;  This command tells Stata to read semi-colons as the end-of-line character instead of the carriage return  Comment out the carriage return with  /* at the end of line and */ at the beginning of next  Comment out the carriage return with /// 31

Do-files > End of line Character

  Example 1: #delimit #delimit ; keep firstname lastname birth death age weight height; #delimit cr Example 2: /* */  keep firstname lastname birth /* */ age weight height Example 3: /// keep firstname lastname birth /// age weight height 32

Do-files > Comments

 Comments     Lines beginning with * will be ignored Words between /* and */ will be ignored (spanning multiple lines ok) Words between // and end of line will be ignored Words between /// and beginning of next line will be ignored (one way to spread command over two lines) 33

Do-files > Comments

 Comments - example *SAMPLE EXCERPT OF STATA DO-FILE *This line will be ignored by Stata.

use mydata.dta /* These words will be ignored */ do myjob.do //The remainder of this line will be ignored. keep age race sex ///The remainder of this line will be ignored, including return first_name height weight last_name /*This line continuation of the last line 34

Saving output

 Work in do-files and log your sessions!

 log using

filename

 replace, append  log close  Output choices:   *.log file - ASCII file *.smcl file - nicer format for viewing and printing in Stata 35

Basic Commands, cont.

  Graphs are not saved in log files Use “saving” option of graph commands 

saving(graph.ext)

 Export current graph: 

graph export graph.ext

 Ex: graph export graph.eps

 Supported formats:  .ps, .eps, .wmf, .emf .pict

36

Getting Help in Stata

   help

command_name

 abbreviated version of manual search    search

keywords

, local search

keywords

, net search

keywords

, all findit

keywords

 same as search

keywords

, all  Search Stata Listserver and Stata FAQ 37

Stata Resources

 www.stata.com > Resources and Support      Search Stata Listserver Search Stata (FAQ) Stata Journal (SJ)   articles for subscribers programs free Stata Technical Bulletin (STB)   replaced with the Stata Journal Articles available for purchase, programs free Courses (for fee) 38

Updating Stata

 help update  update all 39