QuantsMeths - Inferential Statistics.ppt

Download Report

Transcript QuantsMeths - Inferential Statistics.ppt

Quantitative Methods
INFERENTIAL STATISTICS
Inferential statistics
Main goal: draw conclusions about the existence
(likelihood) of relationships among variables
based on data subject to random variation
Does A affect B?
Is the individuals’ positioning on B affected by
their positioning on A?
Crosstabs [i]
Variable A
Independent
variable
Causal effect
VariableB
Dependent
variable
Examples of theoretical questions:
- Does age affect the forms of political participation?
- Do more educated citizens have a higher likelihood to vote
for more progressive parties?
- Across European countries, do different social security
systems have a different effect on trust in Government?
Crosstabs [ii]
Different statistical tools for uncovering the existence of a
causal relationship between different type of variables
Independent variable
Level of
measurement
Dependent
variable
Nominal
Ordinal
Nominal
Crosstab
Recoding is
necessary!
Analysis of variance
Regression
Ordinal
Scale
Scale
Crosstabs [iii]
Attention!
The fact that two variables vary simultaneously (e.g. interest
in politics and level of education) does not necessarily mean
that they are logically correlated!
The relationship among two variables is only confirmed
through theoretical considerations
Crosstabs [iv]
Example:
The relatioship between potatoes consumption and
criminality
An empirical research shows that the higher the individual
consumption of potatoes, the higher the likelihood of
committing a crime
Do potatoes push individuals to act against the law??
! Presence of a hidden variable
Crosstabs [v]
Likelihood of
committing a
crime
+
Potatoes
consumption
+
+
Income
Hidden variable
(confunding factor,
lurking variable)
Crosstabs [vi]
Spurious relationship
A statistically proved causal relationship is contradicted by
strong logical considerations, and is explained by the
presence of a « lurking variable ».
Therefore, hypotheses are never « true » or « false ». They
may only be empirically verified (or falsified).
Significance tests and measures of association [i/xvi]
In crosstabs, hypotheses may be verified through observation
But:
- How to assess the veridicity of our observation?
- How to compare different relationships?
Solution: compute statistical indicators to assess the
significativity and strenght of the relationship:
- Chi-square test: level of significativity
- Measures of association: strenght (and direction)
Significance tests and measures of association [ii/xvi]
The Chi-square test
Helpful to estimate if a relationshp between two different
variables is statistically significant
Statistical significance: the relationship between the two
variables is important enough to be generalized to the
population from wich the sample was derived
p-value (level of significance)
Significance tests and measures of association [v/xvi]
In social sciences, as a convention, to a statistically significant
relationship is allowed a 5% risk to fall into one of the two
errors
0.05 (5%) is the significance threshold
The probability of NOT committing one of the two errors is at
least of 95%
If p<.05, relationship is significant
If p>.05, relationship NOT significant. End of the analysis
p-value
Outcome of
test
Statement
greater than
0.05
Fail to reject
H0
No evidence to reject H0
between 0.01
and 0.05
Reject H0
(Accept H1)
between 0.001 Reject H0
and 0.01
(Accept H1)
less than 0.001
Reject H0
(Accept H1)
Some evidence to reject H0
(therefore accept H1)
Strong evidence to reject H0
(therefore accept H1)
Very strong evidence to reject
H0 (therefore accept H1)
Significance tests and measures of association [vi /xvi]
If (and only if!) the relationship is statistically significant, the researcher
may calculate its strenght
The strenght of a relationship is measured through a measure of
association
Standardized coefficient, varying between
-1 (negative relationship) < r/rs > 1.0 (positive relationship)
The closer to ±1.0, the stronger the relationship
Control variable [i/v]
It may be interesting to investigate the relationships between
two variables among different populations of the sample
Example: is the relationship between level of education and
interest in politics the same among women and men?
Gender assumes the role of control variable
(second independent variable)
Control variable [ii /v]
SPSS
procedure
to add a
control
variable
Control variable [iii /v]
Control variable [iv /v]
What does it show?
Control variable [v/v]
The relationship is stronger amongst women than men
The level of education is more important for the explanation
of the interest in politics for women than for men
Analysis of variance [i]
Analysis of variance
Main logic: look at the total variation in a set of scores, by
taking into account the mean for each group (subpopulation)
Close look at the variation within groups and
the variation between groups
Analysis of variance [ii]
Example:
The relationship between interest in politics and the level of
trust in political institutions
Null hypothesis: there is no causality effect between interest
in politics and level of trust in institutions (two variables are
independent)
Work hypothesis: the higher the interest in politics, the
higher the trust
Analysis of variance [iii]
Independent variable (qualitative!)
pol in tr How i nterested i n pol itics
Valid
Missing
Tot al
Frequency
1 Very int erested
320
2 Quite interest ed
941
3 Hardly int erested
594
4 Not at all int erested
183
Tot al
2038
8 Don't know
2
2040
Percent
15.7
46.1
29.1
9.0
99.9
.1
100.0
Valid Percent
15.7
46.2
29.1
9.0
100.0
Cumulat ive
Percent
15.7
61.9
91.0
100.0
Introduction on computing scales [i / xvii]
Starting point: survey data
Variables are measured though questions, precise and clear
Ideal for knowing and exploring behaviour, decisions, ideas
(socio-demographic variables, decisions, political positioning,
…)
Problematic for knowing
phenomena (ex. attitudes)
and
exploring
structural
Introduction on computing scales [ii / xvii]
Attitude
Characteristic strongly anchored in individuals
Stable predisposition
Examples:
- social alienation
- materialism
- intelligence
Very difficult to grasp directly though questions!
Introduction on computing scales [iii / xvii]
Solution:
Measure attitudes though different indicators that are put
into perspective
Starting indicators have to be simple to measure
The process from simple indicators trough the measure of a
structural phenomenon is called scales computing
Introduction on computing scales [iv / xvii]
Scale computing
i1
i2
i3
in
attitude
Introduction on computing scales [v / xvii]
Examples of scales computing
Through the positioning on the left-right scale and the party
voted, we may measure…
… the ideology?
Through the results obtained to a series of attitudinal tests,
the factual knowledge on some phenomena and the capacity
of drawing conclusions on situations, we may measure…
… the intelligence?
Introduction on computing scales [vi / xvii]
Scales computing is done in 4 steps:
i.
ii.
iii.
iv.
Theoretical reasoning on the attitude to be measured
Choice of appropriate indicators
Scale computing, which creates a new variable
Scale diagnostic (do the new variable efficiently grasp
the theoretical phenomenon investigated?)
Introduction on computing scales [vii / xvii]
Example: opinion leaders
Theoretical foundations: Katz 1944, Katz and Lazarsfeld 1955
When dealing with information processes, an elite of citizens
(opinion leaders) receive the information, treats it, and
distributes the more salient components to their entourage
Two-step flow of communication
Introduction on computing scales [viii / xvii]
How to measure opinion leadership?
Via a direct question (“are you an opinion leader”)?
Nonsense
Though different indicators put into perspective?
Yes, but which ones?
Introduction on computing scales [ix / xvii]
Katz and Lazarsfeld propose the two following indicators:
- The individual discusses politics?
- If yes, is he able to convince the people in his entourage
about his opinions?
Main idea: those who discuss often and are able to convince
their entourage, are opinion leaders.
Additive scales [iii / xvi]
Example: political activism
10 variables ask for political activities, 12 last months (y/n):
- Contacted a politician
- Worked in a political party or action group
- Worked in another organisation or association
- Worn or display campaign badge or sticker
- Signed petition
- Taken part in lawful public demonstration
- Boycotted certain products
- Bought products for political, ethical or environmental reasons
- Donated money to political organisation or group
- Participated in illegal protest activities
Effect of a (more) variable(s) upon a dependent variable
Correlation (covariation)
• statistical relationship between
two scale variables
• the correlation coefficient, r,
quantifies the direction and
magnitude of correlation
• r values range from -1 to +1 (rs,
Spearman coeff same values)
• R2 – coefficient of determination
•
0.5 < r2 > 1
strange if r2 = 1 (two variables varying
perfectly)
Regression
• modelling the effect of one
or mode independent scale
variables on a dependent
scale variable
• causality
• linear / non-linear
! No causation without
correlation
Value of r (or rs)
Interpretation
r= 0
The two variables do not vary
together at all.
0<r<1
The two variables tend to
increase or decrease together.
r = 1.0
Perfect correlation.
-1 < r < 0
One variable increases as the
other decreases.
r = -1.0
Perfect negative or inverse
correlation.
r = -1
r=0
no linear relationship
r = +1
Linear regression
the dependent variable is assumed to be a linear function of one or
more independent/explanatory variables plus an error introduced
to account for all other factors:
Dep Var = Constant + β1 x Variable1+β2x Variable2 + β3 x Variable3+
β4 x Variable4.......
Goal of a regression analysis
to obtain estimates of the unknown parameters Beta_1, ..., Beta_K
which indicate how a change in one of the independent variables
affects the values taken by the dependent variable.
! the dependent variable is a quantitative measure of some condition or
behavior. When the dependent variable is qualitative or categorical, then
other methods appropriate.
Estimating the regression model
•
the usual method of estimation for the regression model
Ordinary least squares estimation (OLS)
•
set of assumptions (Gauss-Markov assumptions) sufficient to guarantee that
ordinary regression estimates will have good properties.
– errors u_i have an expected value of zero: E(u_i ) = 0
– independent variables are non-random
– independent variables are linearly independent
– the disturbances u_i are homoscedastic (= the variance of the disturbance is
the same for each observation)
– the disturbances are not autocorrelated (disturbances associated with
different observations are uncorrelated)
Research Project:
Determinants of Support for European Integration
Hypotheses :
H1: The lower the trust in national government the higher the likelihood of support for
European integration.
H2: The higher the level of information, the lower the support for EU membership.
H3: The stronger the belief in democratic norms, the greater the likelihood of
supporting EU membership.
H4: An individual’s attitude is highly influenced by his or her friends.
Research Project:
Determinants of Support for European Integration
Data Collection
• survey conducted June 2006, Bulgaria
• construction of sample:
 nation-wide
 stratified random sample
 representative for the country adult population (18+)
• the survey instrument
 the survey principles: hypotheses, coding, cost efficiency
The Survey Instrument
Ques
tion
No.
B1
Ques tion
Please tell us what is your opinion about the European Union and the accession of your country to the
European Union. Point out what, in your opinion is good and what is bad about European integrat ion.
ANSWER:
B2
B3
On a scale of 1 to 100 (1=”I strongly disagree”, 100=“I strongly agree”) assign a score to each of the
following statements. (Write your score in the brackets.)
[
] Our country's accession to the European Union is a very good thing.
[B2.1]
[
] Our country should join the European Union as soon as possible.
[B2.2]
[
] European integration should be our government’s highest pr ior ity.
[B2.3]
[
] The European Union itself is a good idea.
[B2.4]
[
] The European Union w ill last for a long t ime.
[B2.5]
In the follow ing list , mark with an “X” those inst itut ions that are, in your opinion, institut ions of the
European Union.
[ ] Council of Europe
[ ] European Parliament
[ ] European Court of Human Rights
[ ] NATO
[ ] European Commission
[ ] European Court of Justice
[ ] Council of Ministers
[ ] CEFTA
[ ] Commonwealth of Independent States
[ ] WTO
[ ] IMF
B4
[
] How many minutes a day, on average, do you read (watch, listen to) news?
B5
[
] How much of the time that you spend with friends do you discuss polit ics? (Use a 1 to 100 scale:
1="I never discuss polit ics with my friends", 100="I discuss polit ics with my friends all the time , we
never discuss anything else.")
B6
We would like to understand your employment status. Please mark with an “X” the answer that
applies to you.
Do you have a job or your own bus iness?
[
] Yes
[ ] No
[B6.1]
Are you trying to f ind a job?
[
] Yes
[ ] No
[B6.2]
Are you still in school?
[
] Yes
[ ] No
[B6.3]
B7
[
] On a scale of 1 to 100, rank yourself in terms of wealth, compared to the average person in your
country (1 is very poor, 100 is very rich)
B8
[
] On a scale of 1 to 100, rank your confidence in our country's politic ians. (1=no confidence ,
100=full confidence)
B9
[
] On a scale of 1 to 100 rank your confidence in EU offic ials. (1=no confidence at all, and 100=full
confidence)
The follow ing statements concern the state of the economy in our country. On a scale of 1 to 100
(1=”I strongly disagree”, 100=“I strongly agree”) assign a score to each of them. Write your score in
the brackets.
B10
B11
[
[
[
[
[
[
[
]
]
]
]
]
]
]
Overall, our country’s economy is doing very well.
Most of the people who wish to work can find work.
Most people’s well-being increases at a fast pace.
Most people earn incomes that allow a very good standard of living.
Prices do not increase at a too high rate.
Standard of living in our country will approach fast other EU countries.
Income inequality is at an acceptable level (it is not too high.)
[B10.1]
[B10.2]
[B10.3]
[B10.4]
[B10.5]
[B10.6]
[B10.7]
On a scale of 1 to 100 (1=”I strongly disagree”, 100=“I strongly agree”) assign a score to each of the
following statements. (Write your score in the brackets.)
[
] Though elections are costly, we should pay these costs.
[B11.1]
[
] It is better to have too many political parties than to have only one party. [B11.2]
[
] Elected constituencies (such as Parliament) should be stronger than
non-elected ones (such as governments).
[B11.3]
[
] What percentage of your friends do you think would support your country's EU membership?
B12
We would like to know more about your political views. Please assign a score between 1 and 100 to
each of the following statements (1=”I strongly disagree,” and 100=”I strongly agree.”)
B13
[
] Privatization should be extended to as many activities as possible ,
inc luding the production of electric ity, water supply, rail and air
transportation, and postal services.
[
] Taxes should be extremely low, because the government spends
money ineffic iently.
[
] Labor unionism should not be permitted, because it only increases
costs of production.
[B13.1]
[B13.2]
[B13.3]
To what extent EU accession will have the follow ing positive effects in our country, in your opinion?
B14
On a scale of 1 to 100 (1=”I strongly disagree”, 100=“I strongly agre e”) assign a score to each of the
following 12 possible effects. Write your score in the brackets. Please make sure that you score all 12
items, some of which may be on the next page.
To what extent EU accession will have the following positive effects in our country, in your opinion?
On a scale of 1 to 100 (1=”I strongly disagree”, 100=“I strongly agree”) assign a score to each of the
following 12 possible effects. Write your score in the brackets. Please make sure that you score all 12
items, some of which may be on the next page.
[
[
B14 [
[
[
[
[
[
[
[
[
]
]
]
]
]
]
]
]
]
]
]
Higher wages at home
Lower prices, or prices will increase less than wages
Better jobs at home
Freedom to travel abroad
Additional money from the European Union
Foreign investment
Better public administration (customs, passport services, judiciary)
Better public services (e.g. education, health, and pension systems)
Less corruption
More democracy
Other advantages of EU membership
[B14.1]
[B14.2]
[B14.3]
[B14.5]
[B14.6]
[B14.7]
[B14.8]
[B14.9]
[B14.10]
[B14.11]
[B14.12]
To what extent EU accession will have the following negative effects in our country, in your opinion?
On a scale of 1 to 100 (1=”I strongly disagree”, 100=“I strongly agree”) assign a score to each of the
following 12 possible effects. Write your score in the brackets. Please make sure that you score all 12
items, some of which may be on the next page.
[
[
[
B15 [
[
[
[
[
[
[
[
[
] Loss of national power (EU will tell us what to do)
] Loss of national identity
] Less democracy
] Having to pay taxes to the EU
] Loss of jobs in our country
] Loosing our workforce towards the EU
] Foreigners will buy our land
] Multinational companies will take our businesses
] Prices will increase
] Domestic firms will be driven out of business by foreign competition
] More bureaucracy
] Other disadvantages of EU membership
[B15.1]
[B15. 2]
[B15.3]
[B15.4]
[B15.5]
[B15.6]
[B15.7]
[B15.8]
[B15.9]
[B15.10]
[B15.11]
[B15.12]
Data Analysis - Variables
•
•
•
•
•
•
•
•
•
•
•
EU support (dependent variable; sum of percentage points B2.1 to B2.5)
KnowEU ( sum of scores to B3.1 – B3.11)
Info
Unemployed
HomEc (sum of scored to B10\B10.5) Prices (B10.5)
DemValues (B11)
ConfidHome
ConfidEU
Group (B12)
Priv (B13.1)
Support2 (B14, B15) used as alternative measure of EU support
6.2 Variables: Construction of Index Variable
KnowEU
Question
Council of Europe
European Parliament
European Court of Human Rights
NATO
European Commission
European Court of Justice
Council of Ministers
CEFTA
Commonwealth of Independent States
WTO
IMF
[B3.1]
[B3.2]
[B3.3]
[B3.4]
[B3.5]
[B3.6]
[B3.7]
[B3.8]
[B3.9]
[B3.10]
[B3.11]
Score if
Answer
= “Yes”
-1
1
-1
-2
1
1
3
-2
-3
-4
-1
Score if
no answer
0
0
0
0
0
0
0
0
0
0
0
7. Data Analysis - The Model
• STATA
• EUSupport = Constant + A x Variable1+B x Variable2 +
C x Variable3+.......
• variables used:
• EUsupport (dependent variable), DemSq, Priv, PrivSq, Priv3,
KnowEU, HomEc, ConfidHome, ConfidEU, Group
• DemSq = DemValues2
• PrivSq=Priv2
• Priv3 = Priv3
Regression Results
Number of observations = 696
F(9, 686) = 117.98
Prob. > F = 0.0000
R-squared = 0.6075
Adj. R-squared = 0.6024
EUsupport
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
DemSq
.0003374
.0001516
2.23
0.026
.0000397
.0006351
Priv
-4.213113
.8598857
-4.90
0.000
-5.901437
-2.52479
PrivSq
.1089781
.0228785
4.76
0.000
.0640578
.1538984
Priv3
-.0006617
.0001523
-4.34
0.000
-.0009607
-.0003627
KnowEU
-6.309359
1.555746
-4.06
0.000
-9.363955
-3.254762
HomEc
.1784438
.0257499
6.93
0.000
.1278857
.2290019
ConfidHome
-.8639055
.1495632
-5.78
0.000
-1.157562
-.5702489
ConfidEU
2.164689
.1158671
18.68
0.000
1.937193
2.392186
Group
1.566942
.133408
11.75
0.000
1.305005
1.828879
_cons
194.7167
15.70405
12.40
0.000
163.8829
225.5505
Non-linear Regression
When the relation is non-linear…we use other types of regressions: logit,
probit, etc.
Dependent variable (two values; binary)  logistic (logit) regression
Dependent variable (binary)  probit regression
Useful databases for quants
• http://hsc.uwe.ac.uk/dataanalysis/quantInfAs
sPear.asp
• http://elsa.berkeley.edu/sst/regression.html
• http://www.graphpad.com/articles/interpret/
corl_n_linear_reg/correlation.htm