POL 51: Scientific Study of Politics

Download Report

Transcript POL 51: Scientific Study of Politics

POL 51: Scientific
Study of Politics
Prof. B. Jones
Dept. of Political Science
UC-Davis
Plots and Z-scores
How to do some of the “stuff” in HW 4
 Multiple plots on a single page
 Creating z-scores and finding p-values
 Visualizing political data
 Data: Obama vote share by county

Dot Chart: Obama Vote
dotchart(obamapercent, labels=row.names,
cex=.7, xlim=c(0, 100), main="Support for
Obama", xlab="Percent Obama")
abline(v=50)
Returns:
Support for Obama
San Francisco
Alameda
Marin
Santa Cruz
Sonoma
San Mateo
Mendocino
Santa Clara
Los Angeles
Contra Costa
Monterey
Yolo
Napa
Solano
Humboldt
Alpine
Imperial
Santa Barbara
San Benito
Lake
Sacramento
Mono
Ventura
San Joaquin
San Diego
Merced
San Luis Obispo
Nevada
San Bernardino
Trinity
Riverside
Fresno
Stanislaus
Butte
Orange
Del Norte
Placer
El Dorado
Inyo
Siskiyou
Plumas
Tuolumne
Mariposa
Madera
Amador
Kings
Calaveras
Tulare
Sutter
Yuba
Kern
Colusa
Sierra
Glenn
Tehama
Shasta
Lassen
Modoc
0
20
40
60
Percent Obama
80
100
Interpretation?

Geographical Patterns?
 Central
Valley
 Coastal
 SoCal,
NorCal?
Why might you observe these patterns?
 Z-scores

 NB:
we’re doing this for learning purposes
Z-scores
Easy: create mean, standard deviation
 Then derive z-score using formula from
last slide set:
 R code on next slide

Z-scores and R
#Z scores for Obama
meanobama<-mean(obamapercent)
sdobama<-sd(obamapercent)
zobama<-(obamapercent-meanobama)/sdobama
Interpretation
Z-scores in metric of standard deviations
 Large z imply the observation is further away from mean than
observations with small z.
 Z=0 means the observation is exactly at the mean.
 Dotchart (code):
par(mfcol=c(1,1))
dotchart(zobama, labels=row.names, cex=.7, xlim=c(-3, 3),
main="p-values for Obama Vote Z-scores", xlab="Probability")
abline(v=0)
abline(v=1, col="red")
abline(v=-1, col="red")
abline(v=2, col="dark red")
abline(v=-2, col="dark red")

Obama Vote Z-scores
San Francisco
Alameda
Marin
Santa Cruz
Sonoma
San Mateo
Mendocino
Santa Clara
Los Angeles
Contra Costa
Monterey
Yolo
Napa
Solano
Humboldt
Alpine
Imperial
Santa Barbara
San Benito
Lake
Sacramento
Mono
Ventura
San Joaquin
San Diego
Merced
San Luis Obispo
Nevada
San Bernardino
Trinity
Riverside
Fresno
Stanislaus
Butte
Orange
Del Norte
Placer
El Dorado
Inyo
Siskiyou
Plumas
Tuolumne
Mariposa
Madera
Amador
Kings
Calaveras
Tulare
Sutter
Yuba
Kern
Colusa
Sierra
Glenn
Tehama
Shasta
Lassen
Modoc
-3
-2
-1
0
Z-score
1
2
3
Probability Values







High Z-scores are probabilistically less likely to
be observed than smaller scores.
Consult a z-distribution table
Probability area is given
Can think about probabilities in the “tails”
One-tail (upper or lower)
Two-tail (upper + lower)
R
R code
twotailp<- 2*pnorm(-abs(zobama)) #Gives us area in the upper and lower tails of z
onetailp<- pnorm(-abs(zobama)) #Gives us 1-tail probability area; if
#subtract this from 1, this give us the area
#below this z score (if z is positive) or
#area above this z score (if z is negative)
zp<-cbind(county, onetailp, twotailp, zobama ); zp
Plots

4 plots on one page:
par(mfcol=c(2,2))
boxplot(obamapercent, ylab="Vote Percent", main="Obama Vote: Box Plot", col="blue")
hist(zobama, xlab="Obama Vote as Z-Scores", ylab="Frequency",
main="Histogram of Standardized Obama Vote", col="blue")
hist(obamapercent, ylab="Frequency", xlab="Vote Percent", main="Obama Vote:
Histogram", col="blue")
plot(zobama, onetailp, ylab="One-Tail p", xlab="Z-score", main="Z-scores and pvalues", col="blue")
Obama Vote: Histogram
15
5
10
Frequency
70
60
50
0
30
40
Vote Percent
80
Obama Vote: Box Plot
30
40
50
60
70
80
90
Vote Percent
Z-scores and p-values
0.3
0.0
0.1
0.2
One-Tail p
10
5
0
Frequency
0.4
15
0.5
Histogram of Standardized Obama Vote
-2
-1
0
1
Obama Vote as Z-Scores
2
-1
0
Z-score
1
2