Folie 1 - Uni Bamberg

Download Report

Transcript Folie 1 - Uni Bamberg

An Introduction into Stata I
Prof. Dr. Herbert Brücker
University of Bamberg
Seminar “Migration and the Labour Market”
Session 3, June 9, 2011
Contents
1
2
3
•
•
•
•
Introduction into the workplan
Introduction into the dataset
Introduction into STATA I
Overview on working with STATA
Menues and editors
• General editor
• Data editor
• Do File editor
The Grammar of STATA
• commands
• loading data
• describing data
• graphs
Working with Do-Files
1
•
•
•
•
•
•
•
•
Workplan
Forming four teams à 4-5 students
Introduction and outline of research question
Review of literature on labour market effects of
migration (3-5 pages)
Description of the dataset
• Data sources and caveats
• Descriptive statistics and graphs
Presenting the empirical model
Presenting and discussing the regression results
Conclusions
Presenting the papers in class
2
•
•
•
•
•
•
•
The dataset: general information
The IAB employment sample (IABS)
2% random sample of all employees obliged to pay
social security contributions and recipients of
unemployment benefits (e.g. SGB II and III)
Precise information on wages and unemployment spells
Information on education and work experience
Period: 1974-2004 (meanwhile until 2008)
Here we use 1980 – 2004 since information at
beginning of sample period are less reliable
Focus on Western Germany excl. (West-)Berlin due to
unification
2
•
•
•
•
The dataset: Caveats I
Identification of foreigners by nationality
• We use nationality of first spell to control for
nationalisations
Problem to identify immigration of ethnic Germans
(Spätaussiedler)
• We try to identify via programme participation
No civil servants (“Beamte”) and self-employed
• Nothing what we can do.
Wages are censored at legal pension threshold level
(66,000 Euros)
• We impute wages above threshold level
2
•
•
•
The dataset: Caveats II
Missing education information (17%, about 35 per cent
of foreigners)
• We impute education information
We have only daily wages (not hourly wages)
• We exclude all part-time workers
See Brücker/Jahn (2011), Data Section for Description
and FDZ at IAB for description of data set
2
•
•
•
•
The dataset: Organisation
We distinguish 25 years (1980 – 2004)
We distinguish 64 labour market spells by
education (4), work experience (8) and nationality (2)
• 4 x 8 x 2 = 64
We use the following indexes:
• h = native (German)
• f = foreigner
• q = Education
• k = work experience
• t = time
Note that we have also aggregates in the dataset (e.g.
wt, wqt, wqkt and not only whqkt, wfqkt)
General overview of STATA
The desktop of STATA is divided in four different parts:
1.
2.
3.
4.
Review
Results
Variables
command
shows executed commands
shows the results of your commands
the current list of variables in the data set
here the commands have to be typed in
Review window:
Lists your
previous
commands
Result window:
Shows outcome of
your current
command
Variable window:
Shows variables
of your dataset
Command window: Here you
can type your commands
STATA has the following menues/editors you can work with:
1. The desktop menue
2. The data editor
3. The data browser
4. The do file editor
You can run all commands here
Here you can edit the data you
have loaded
Here you can browse the data
you have loaded, but not edit
The do file is a file where you
can edit and execute all types of
commands. Very useful for
replication and memorizing
what you have done. We come
back to this.
The Data Editor.
You can change each cell by hand.
The Data Browser looks similiar.
But you can‘t edit the data.
The Do File Editor.
You can type your commands and
execute your commands there.
(Words in stars are not treated as
commands,
e.g. * Note that … *).
The Grammar of STATA
General Structure of STATA
[prefix :] command [varlist] [if] [in] [weight] [, options]
General structure of STATA
We will concentrate on:
[prefix :] command [varlist] [if] [in] [weight] [, options]
General structure of STATA
We will concentrate on:
[prefix :] command [varlist] [if] [in] [weight] [, options]
What you want to do?
[prefix :] command [varlist] [if] [in] [weight] [, options]
First step how to load data:
> use “Filename” , clear
Practice:
> use “C:\EigeneDateien\Stata\data1.dta” , clear
other option to load data:
-> File -> Open -> Choose your data
General structure of STATA
There are two types of variables (data):
numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc.
string variables, e.g.: no voc train , male, female etc.
How to deal with the data types:
Numerical variables: you can do all mathematical operations,
e.g. var1 + var2, var1/var2, var1*var2 etc.
String variables: You have to use quotation marks for identifcation, e.g.
var1 = 1 if sex == “female”
The black variables are numerical variables.
The red variables are string variable.
[prefix :] command [varlist] [if] [in] [weight] [, options]
Since you have now loaded the data –
How to get an overview of your data?
> describe
gives general information about the data, such as the
number of observations, the amount of variables, the label and the
name of the variables etc.
“describe”
[prefix :] command [varlist] [if] [in] [weight] [, options]
How to get an overview of your data?
> list
enlists the data of every single cell (e.g. persons, groups, classes) in the
data set.
Attention your data might be really large! “-more-” indicates that
there are more information available, either put any key to continue or
“q” in order to “quit”.
General structure of STATA
We will concentrate on:
[prefix :] command [varlist] [if] [in] [weight] [, options]
What is concerned?
[prefix :] command [varlist] [if] [in] [weight] [, options]
stands for either a list of variables or only one variable
which is concerned by the command.
[varlist]
is set into brackets since it’s an optional specification; in
case there is no [varlist] specified, STATA will execute the command
for all variables.
[varlist]
Practice:
In order to get information only about education and wages in the data
set:
> list ed whqkt
[prefix :] command [varlist] [if] [in] [weight] [, options]
Further commands to describe the data set I.:
> tabstat
gives a table with the mean of the variable(s)
> codebook
indicates the codification of the variable with information on the
datatype, range, units, unitvalues, missings, mean, standard deviation,
percentiles
In practice:
tabstat whqkt wfqkt
codebook
tabstat whqkt
[prefix :] command [varlist] [if] [in] [weight] [, options]
Further commands to describe the data set II.:
> summarize
gives the absolute frequencies, the mean, the standard deviation, the
minimum and the maximum of a variable
> tabulate
indicates a table with the absolute and relative distributions of a
certain variable
In practice:
> sum
whqkt wfqkt
> tab
whqkt wfqkt
[prefix :] command [varlist] [if] [in] [weight] [, options]
Practice:
-
how many observations
mean earnings or unemployment rate
standard deviation of earnings and unemployment rate
range of observations (minimum and maximum wage and
unemployment rate)
Note that the descriptive statistics provides already interesting
information about the data, helps to control for outliers and
measurement error and for the interpretation of regression results
(most results refer to the sample mean)
General structure of STATA
We will concentrate on:
[prefix :] command [varlist] [if] [in] [weight] [, options]
Under which condition
[prefix :] command [varlist] [if] [in] [weight] [, options]
With [if] you can set a condition, or make restrictions.
e.g. in order to get to know only the average income of migrants with
the lowest education (no vocational training).
 summarize wfqkt if ed == “no voc train”?
“no voc train” is a string variable (therefore the quotation marks)
and indicates that an individual has no vocational training.
[prefix :] command [varlist] [if] [in] [weight] [, options]
How to create dummies?
What is a dummy variable? A dummy variable has a value of 0 or 1.
With STATA you are also able to make up new variables out of the data.
In order to do so you need the command of “generate” and
“replace”
> gen ed1 = 0
> replace ed1 = 1 if education == “no voc train”
Other example:
> gen ex1 = 0
> replace ex1 = 1 if ex == 1
[prefix :] command [varlist] [if] [in] [weight] [, options]
How to calculate and transform numerical variables
> generate newvar = var1 – var2
STATA knows the mathematic calculations rules (+, -, /, logs, etc.)
Practice: Create the log wage:
> generate ln_whqkt = ln(whqkt)
[prefix :] command [varlist] [if] [in] [weight] [, options]
How to modify variables/dummies?
> replace var = (var1 – var2)/2
STATA knows the mathematic calculations rules (+, -, /, log, etc.)
Practice: Replace the wage by the log wage only for low skilled
> replace ln_whfqkt = ln(whqkt) if ed == “no voc train”
[prefix :] command [varlist] [if] [in] [weight] [, options]
How to create graphics?
> graph twoway line var1 year [if] [in]
STATA produces twodimensional graphs with lines, bars, dots, scatter plots
etc. with the “graph twoway” command, the type of the graph is assigned
after that, e.g. “line”
Practice:
Graph the development of native and foreign wages for the years in our
sample in a given education and experience group.
> graph twoway line whqkt wfqkt year if ed == “no voc train”
& ex == 1
> graph twoway scatter whqkt wfqkt if ed == “no voc train”
& ex == 1
The do-file
STATA also provides a do-file (= text-editor), into which the commands
can be written.
- the do-file can be opened by the command “doedit” or by pressing
“STRG + 8” or by clicking at the do-file bar.
How to execute commands in a do-file?
- you write the command into the text-editor, then mark the text and
press “STRG + d”
- in case of no text is marked, the whole do-file will be executed.
That can create troubles if you have in your list of commands a
mistake. (That happens in most cases.)
The do-file
Reasons to use a do-file:
- your work is documented and reproducible!
- you can include comments into your work by setting a “*” at the very
beginning of the line (they automatically get a green color):
e.g.
>
>
>
>
*load data
use “C:\User\...data1.dta” , clear
*get an overview
describe
- you can save your do-file ->File ->Save
- and you also can open do-files ->File ->Open
- do-files have the extensions “.do”
This is an example of a Do-File.
First I „set more off“ and load the
data.
Second I use a command for
panel regressions.
Third I generate some variables.
The remarks in stars are explaing
what I‘m doing.
Now I mark the lines where I have
the commands I want to execute.
Then I press the execute button.
Next Meeting:
June 30, Room RZ 1.03!