RS_01_Statistics - Graduate Institute of International and

Download Report

Transcript RS_01_Statistics - Graduate Institute of International and

Statistical Methods for the Social Sciences
Rahul Mukherjee
REVIEW SESSION 01
TA: Marcio Cruz
• [email protected]
Office hours
• Wednesdays 09:00-11:00
• Rigot 10
Basic data management
1. Where do I find interesting data? GOOGLE!!! ;-)
Some interesting links to MDEV and MIA students:
MACRO (agregate variables/different countries)
• World Development Indicators (World Bank)
http://data.worldbank.org/data-catalog/world-development-indicators
• World Economic Outlook Database (IMF)
http://www.imf.org/external/pubs/ft/weo/2011/02/weodata/index.aspx
• World Economic Outlook Database
http://stat.wto.org/Home/WSDBHome.aspx
(IMF)
Basic data management
By region:
• US Economy - Federal Reserve Economic Data
http://research.stlouisfed.org/fred2/
• European Union Economy - ECB statistics
http://sdw.ecb.europa.eu/
• China – National Bureau of Statistics of China
http://www.stats.gov.cn/english/statisticaldata/
• Mexico – Banco de México
http://www.banxico.org.mx/estadisticas/index.html
Basic data management
MACRO DATA: You can find macro dataset for most
of countries on their central banks and national
statistics bureau webpages.
• Central Banks
http://www.bis.org/cbanks.htm
• Official National Bureau of Statistics
Basic data management
MICRO (household surveys , firm-level data, etc.)
• Official National Bureau of Statistic
http://www.census.gov/acs/www/data_documentation/public_use_microdata_sample/
http://epp.eurostat.ec.europa.eu/portal/page/portal/microdata/introduction
http://www.esds.ac.uk/international/access/micro.asp
• International Organizations
http://microdata.worldbank.org/index.php/home
• Some blogs provide good links:
https://sites.google.com/site/medevecon/development-economics/devecondata/micro
http://openmicrodata.wordpress.com/
• Faculty webpages
http://dvn.iq.harvard.edu/dvn/dv/JAngrist
Basic Excel
2 . How should I download this data?
Let us start with an example using MACRO data
(from WDI).
•
•
•
•
•
•
.csv , .txt or .xls? What is the difference?
How to manage this data on excel?
How to sort this data?
How to do basic math operations on excel?
How to get basic descriptive statistics on excel?
How to generate a graph?
Statistical packages
Why should I manage data using a statistical package?
It provides you more flexibility and you can keep the information about what
you did in your research!
Some examples of statistical packages:
http://en.wikipedia.org/wiki/List_of_statistical_packages
SPSS – comprehensive statistics package
EViews – for econometric analysis
Stata – comprehensive statistics package;
SAS – comprehensive statistical package
MATLAB – programming language with statistical features;
R – A free implementation of the S language.
S-PLUS – general statistics package
Basic STATA
3 . Where can I find resources and tips for learning STATA?
GOOGLE!!! ;-)
• Stata webpage, universities webpage, etc.
• Resources for learning Stata
http://www.stata.com/links/resources1.html
• Stata Starter Kit: Learning Modules
http://www.ats.ucla.edu/stat/stata/sk/modules_sk.htm
• Getting Started in Data Analysis
http://dss.princeton.edu/training/
Basic STATA
This link provides some exercises from the course's textbook:
Statistical Methods for the Social Sciences, the 3rd edition
by Alan Agresti & Barbra Finlay
http://www.ats.ucla.edu/stat/examples/smss/default.htm
Textbook Examples:
Introduction to the Practice of Statistics
by David Moore and George McCabe
http://www.ats.ucla.edu/stat/examples/mm/default.htm
• How to start on STATA?
• .do, .dta, .log files?
• USE .do FILES!!! Why? You can keep the information about
everything you have done!
• If you need to manage data: use .do file!
.do FILE
• How to use a .do file?
1.
2.
3.
4.
Open STATA
New .do file editor
Set memory (this can improve the performance of STATA), but it depends
on the capacity of your computer. So, if it does not work, you should
demand less memory. (You don’t need to use this command)
ex: set memory 1200m
Define the directory you will work:
cd "C:\Users\My Documents… "
See example: " rs01_example01.do "
Importing data to STATA
4. How to import data from excel to STATA?
Importing data from excel:
Source: http://www.stata.com/support/faqs/data/newexcel.html
1. A rule to remember
Stata expects one matrix or table of data from one sheet, with at most one line of text at the
start defining the contents of the columns.
2. How to get information from Excel into Stata
• Start Excel.
• Enter data in rows and columns or read in a previously saved file.
• Highlight the data of interest, and then select Edit and click Copy.
• Start Stata and open the Data Editor (type edit at the Stata dot prompt).
• Paste data into editor by selecting Edit and clicking Paste.
You can do this (2), but better avoid it! Why???
INSHEET COMMAND
THE BEST WAY TO IMPORT DATA FROM EXCEL!!!
•
•
•
•
•
•
•
•
3.1 insheet command
Launch Excel and read in your Excel file.
Save as a text file (tab delimited or comma delimited) by selecting File and clicking Save
As. If the original filename is filename.xls, then save the file under the
name filename.txt or filename.csv. (Use the Save as type list—specifying an extension
such as .txt is not sufficient to produce a text file.)
Quit Excel if you wish.
Launch Stata if it is not already running. (If Stata is already running, then
either save or clear your current data.)
In Stata, type insheet using filename.ext, where filename.ext is the name of the file that
you just saved in Excel. Give the complete filename, including the extension.
In Stata, type compress.
Save the data as a Stata dataset using the save
command.
Importing data to STATA
Common problems
5.1 Nonnumeric characters
• One cell containing a nonnumeric character, such as a letter, within a
column of data is enough for Stata to make that variable a string variable.
5.2 Spaces
• What appear to be purely numeric data in Excel are often treated by Stata
as string variables because they include spaces
5.3 Cell formats
• Much formatting within Excel interferes with Stata's ability to interpret the
data reasonably. Just before saving the data as a text file, make sure that
all formatting is turned off, at least temporarily. You can do this by
highlighting the entire spreadsheet, selecting Format, and then Cells, and
clicking General.
Importing data to STATA
Common problems
5.4 Variable names
• Stata limits variable names to 32 characters and does not allow within such names any
characters that it uses as operators or delimiters. Also, variable names should start with
a letter.
5.5 Missing rows and columns
• Completely empty rows in a spreadsheet are ignored by Stata, but completely empty
columns are not. A completely empty column gets read in as a variable with missing
values for every observation.
5.6 Leading zeros
• With integer-like codes, such as ICD-9 codes or U.S. Social Security numbers, that do
not contain a dash, leading zeros will get dropped when pasted into Stata from Excel.
One solution is to flag within the first line that the variable is string: add a nonnumeric
character in Excel on that line, and then remove it in Stata.
5.7 Filename and folder
• Confirm the filename and location of the file you are trying to read. Use Explorer or its
equivalent to check.
STATA - data types
• Numeric variables
• String variables
• What is a ‘STRING’ variable ? How to deal with them?
Some basic commands
•
•
•
•
•
•
•
•
•
•
•
•
•
Summary: sum
Conditions: if, &, |
Sort variables: sort
Order variables: order
Generate variables: gen var
Drop variables (columns): drop
Drop rows: drop in
Concatanate variables: concat()
Destring variables: destring var, replace
Generate numerical variables from string variables: tab var, gen(newvar)
Basic math operations : / ; *; -; + or rsum(var1, var2, …, varn);
Replace: replace var
Collapse: collapse (sum) var, by(var) – see help collpase
Linking with class notes…
How to generate a quantitative variable from a
categorical variable?
For example: .
Favorite music type of (rock, jazz, folk, classical)
Command on STATA
tab, gen(name of the var. For example: music)
tab, gen(music)
EXERCISE
The slide on page 30 of the first class notes is the following:
www.stat.ufl.edu/~aa/social/data.html
EXERCISE
Access this webpage (www.stat.ufl.edu/~aa/social/data.html) and do the
following procedure:
1. Download the data in Excel;
2. Plot a graph showing the age of students (on axes x) and the time they
spend on TV (on axes y);
3. Plot a pie graph showing the number of males and females;
4. Save this data as .csv;
5. Transfer this data to STATA
6. Identify which variables are numerical and which one are string.
7. Plot a graph showing the age of students (on axes x) and the time they
spend on TV (on axes y);
8. Plot a pie graph showing the number of males and females;
9. How many of these students are:
D = Democrat, R = Republican, I = independent ?
10. Generate a variable called average_gpa that is:
average_gpa = (high school GPA (on a four-point scale) + college GPA)/2
I have a problem on STATA…
• If you have any doubt about how to use one specific procedure on STATA,
how should you deal with this?
• 1. Google!!! ;-) …. If this doesn’t work:
• 2. Google!!! Try again, maybe you haven’t searched properly… but, if this
doesn’t work:
• 3. Google!!! Try once more, just in case.
• 4 . Command HELP on STATA.
• 5. Send your questions to statalist: http://www.stata.com/statalist/
• 6. Talk to you TA
• 7. Talk to your Professor
• You can talk to your TA whenever you want, but try at least the first 4 steps.
This will be important for developing your skills to deal with Stata! ;-)