Transcript Slide 1

SIMPLE LINEAR REGRESSION
AND CORRELATION
Prepared by:
Jackie Zerrle
David Fried
Chun-Hui Chung
Weilai Zhou
Shiyhan Zhang
Alex Fields
Yu-Hsun Cheng
Roosevelt Moreno
AMS 572.1 DATA ANALYSIS, FALL 2007.
What is Regression Analysis?
A statistical methodology to estimate the
relationship of a response variable to a set of
predictor variables.
 It is a tool for the investigation of relationships
between variables.
 Often used in economics – supply and demand.
How does one aspect of the economy affect
other parts?
 Was proposed by German mathematician
Gauss.

Linear Regression
 The simplest relationship between x ( the predictor
variable) and Y (the response variable) is linear.
Yi   0   1xi   i, (i  1, 2,..., n).



is a random error with E( i)  0 and Var ( i)  
i
E (Yi )   i   0   1xi
represents the true but
unknown mean of Y. This relationship is the true
regression line.
2
Simple Linear Regression Model
Simple Linear Regression Model

4 Basic Assumptions:
1.
The mean of Y i is a linear function of xi .
2.
The Y i have a common variance
 , which is
2
the same for all values of x.
3.
The errors
4. The errors
 are normally distributed.
i
 are independent.
i
Example -- Sales vs. Advertising- Information was given such as the cost of
advertising and the sales that occurred as a
result.
 Make a scatter plot
 To get a good fit, however, we will use the Least
Squares (LS) method.
Example -- Sales vs. Advertising-Data
Sales($000,000s) ( yi )
Advertising ($000s) ( xi )
28
71
14
31
19
50
21
60
16
35
Example -- Sales vs. Advertising- Try to fit a straight line :
y   0   1x
 Where o = 2.5 and
28  14
1 
 3.5
71  31
 Look at the deviations between the observed
values and the points from the line:
yi  ( 0   1xi ) (i  1, 2,..., n)
Example -- Sales vs. Advertising—
Scatter Plot with a Trial Straight Line fit
http://learning.mazoo.net/archives/000899.html
Least Squares (Cont…)
 Deviations should be as small as possible.
 Sum of the squared deviations:
n
Q    yi  (  0   1xi )
2
i 1
 In our example, Q=7.87
 Least Squares estimates:
0
and
1
minimize
Q
and are denoted by
 0 and  1
Least Squares Estimates

To find ˆ and
0
derivatives of Q.
ˆ1
, take the first partial
n
Q
 2 [ yi  (  0  1 xi )]
 0
i 1
n
Q
 2 xi [ yi  (  0  1 xi )]
1
i 1
Normal Equations
We then set these partial derivatives equal to
zero and simplify.
 These are our normal equations:

n
n
i 1
i 1
 0 n  1  xi   yi
n
n
n
 0  xi  1  x   xi yi
i 1
i 1
2
i
i 1
Normal Equations
 Solve for
 0 and  1 :
n

0 
n
n
n
i 1
n
i 1
n
i 1
( x )( yi )  ( xi )( xi yi )
2
i
i 1
n xi2  ( xi ) 2
i 1

1 
i 1
n
n
n
i 1
i 1
i 1
n xi yi  ( xi )( yi )
n
n
n x  ( xi ) 2
i 1
2
i
i 1

These formulas can be simplified to:
n
n
n
1 n
S xy   ( xi  x )( yi  y )   xi yi  ( xi )( yi )
n i 1
i 1
i 1
i 1
n
n
n
1
S xx   ( xi  x ) 2   xi2  ( xi ) 2
n i 1
i 1
i 1
n
n
n
1
S yy   ( yi  y ) 2   yi2  ( yi ) 2
n i 1
i 1
i 1

Sxy
gives the sum of cross-products of the x’s
and Y’s around their respective means.

Sxx
and
Syy
give the sums of squares of the
differences between the
yi

and
xi
and
xi , and the
yi , respectively.
These expressions can be simplified to:



0  y  1 x 1 
S xy
S xx

The least squares (LS) line, which is an
estimate of the true regression line is:
ˆ
ˆ
ˆ
y  0  1 x
Find the equation of the line for the number of
sales due to increased advertising
 xi  247 ,
 yi  98 ,
2
x
 i  13327 ,
2
y
 i  2038 ,
 x y  5192
i i
and n=5 which allows us to get
x  49.4 ; y  19.6
S xy   xi yi  ( xi )( yi )    15  247  98  350.80
n i 1
i 1
i 1
n
1
n
1
n
n
n
S xx   xi  ( xi )  13327  (247)  1125.20
n i 1
i 1
2
2
1
5
2
Example -- Sales vs. Advertising-
The slope and intercept estimates are:
350.80
ˆ
1 
 0.31andˆ0  19.6  0.31 49.4  4.2
1125.20

The equation of the LS line is:
y  4.29  0.31x
Coefficient of Determination and
Coefficient of Correlation
yˆi  0  ˆ1xi i  1,2,.....n

Residuals are used to evaluate the
goodness of fit of the LS line:
ei  yi  (ˆ0  ˆ1xi ),i  1,2,.....n
Q min  ei
2

Error sum of squares (SSE):
Q min   ei2
Qmin also equals:
n
 y  y 
i 1

i
2
2


  y    yi   S yy
i 1
 i 1 
n
n
2
i
1
n
This is the total sum of squares (SST).

Total Sum of Squares:
n
n
n
n
i 1
i 1
i 1
i 1
SST   ( yi  y )2   ( yˆi  y )2   ( yi  yˆi )2  2 ( yi  yˆi )( yˆi  y )
SSR

SSE
Regression Sum of Squares:
SST  SSR  SSE
SSR
SSE
r 
 1
SST
SST
2
0
, where
is the ratio.
Sales vs. Advertising
Coefficient of Determination and Correlation
 Calculate r2 and r using our data.
n
n
1
SST  S yy   yi2  ( yi2 )  2038  15 (98)2  117.2
n i 1
i 1
 Next calculate SSR
SSR =SST  SSE  117.2  7.87  109.33
 Then,
109.33
r 
 0.933andr  0.933  0.966
117.2
2
 Since 96.6% of the variation in sales is accounted for by
linear regression on advertising, the relationship
between the two is strongly linear with a positive slope.
Estimation of 2


Variance 2 measures the scatter of the Y i
around their means.
The unbiased estimate of the variance is
given by:
n
2
e
i
SSE
s 

n2 n2
2
i 1
Sales vs. Advertising
Estimation of 2

Find the estimate of 2 using our past
results
SSE = 7.87 and n-2=3; so,
7.87
2
s 
 2.62
3
 The estimate of  is:

s  2.62  $1.62 or $162
Statistical Inference on 0
ⅰ. Point Estimator
ⅱ. Confidence Interval
ⅲ. Test
and 


Distributions of  0 and 1
2


x
i

2

ˆ 0 ~ N   0, 

nS
xx 


Point estimators

 xi 2
SE (  0 )  s
nS xx
2



ˆ

 1 ~ N   1,
Sxx 


s
SE (1 ) 
S xx
100(1 -)% IC
   

0  tn2,  SE  0  , 1  tn2,  SE  1 
2
2
 
 

Hypothesis test
 P.Q.
ˆ 0   0
~ tn  2
SE(ˆ 0)
 Hypothesis:
H0 : 1  
0
1
vs.
Ha : 1  10
ˆ 1   1
~ tn  2
SE( ˆ 1)
 Test Statistics:

t0 
1  

0
1
SE ( 1 )

t0 
1

SE ( 1 )
 Reject region: reject Ho at  if:
t0  tn2, / 2
Analysis of Variance
for Simple Linear Regression

The analysis of variance (ANOVA) is a
statistical technique to decompose the
total variability in the yi’s into separate
variance components associated with
specific sources.

Mean square is a sum of squares
divided by its d.f.

Mean Square Regression
SSR
MSR=
1

Mean Square Error
SSE
MSE=
n2

The ratio of MSR to MSE provides an
equivalent to test the significance of the
linear relationship between x and y:
2
2
MSR SSR ˆ Sxx  ˆ 1   ˆ1  H 0 2
 2 
 
 t ~ F 1, n  2
  

2

ˆ
MSE
s
s
 s / Sxx   SE (  1) 
2
1
ANOVA table
Source of
Variation
(Source)
Sum of
Squares
(SS)
Degrees of
Freedom
(d.f.)
Regression
SSR
1
Error
SSE
n–2
Total
SST
n-1
Mean
Square
(MS)
MSR 
SSR
1
MSE 
SSE
n2
F
F
MSR
MSE
Prediction of Future observations

Suppose we fix x at specified value x*

How do we predict the value of the r.v. Y*?

Point Estimator:
*
*
Y     0   1x
*
Prediction Intervals (PI)

The Confidence Intervals for Y* and E(Y*) are
called Prediction Intervals.

Formulas for a 100(1-α)% PI:
 * tn  2, / 2 s
1
n
x * x


Y * tn  2, / 2 s 1 
s  MSE  
2
S XX
1
n
  *   * tn  2, / 2 s
x * x


S XX
1
n
x * x


S XX
2
 Y *  Y * tn  2, / 2 s 1 
2
1
n
x * x


S XX
2
Cautions about making predictions

Note that the PI will be shortest when x*
is equal to the sample mean.

The farther away x* is from the sample
mean the longer the PI will be.

Extrapolation beyond the range of the
data is highly imprecise and should be
avoided.
Example 10.8
 Calculate a 95% PI for the mean groove depth of the
population of all tires and for the groove depth of a
single tire with a mileage of 25,000 (based on the date
from earlier sections).
 In previous examples, we already measured the
following quantities:
x  25   0   1 x*  178.62
*
s  19.02
;
x  16
n9
;
S XX  960
;
t7,0.025  2.365
Example 10.8 (continued)

Now we simply plug these numbers into our
formulas

95% PI for E(Y*):

178.62  2.365 19.02


2
(25

16)
1
9 
960

  [158.73,198.51]

95% PI for Y*:
2 

(25

16)
178.62  2.365 19.02 1  19 
  [129.44, 227.80]
960 

Calibration (Inverse Regression)

Suppose we are given μ*=E(Y*), and we want
an estimate of x*.

We simply solve the linear regression formula
for x* to obtain our point estimator:
 *  0
x* 
1

Calculating the CI is more complicated and is
not covered in this course.
Example 10.9



Estimate the mean life of a tire at wearout
(62.5 mils remaining).
We want to estimate x* when μ*=62.5
From previous examples, we have calculated:
 0  360.64
 1  7.281

Plugging this data into our equation we get:
62.5  360.64
x* 
1000  40, 947.67
7.281
REGRESION DIAGNOSTIC

The four basic assumptions of linear regression
need to be verified from data to assure that the
analysis is valid.
1. The mean of
2. The
Yi
Yi
is a linear function of
have a common variance
the same for all values of
3. The errors
4. The errors
2
xi
,which is
x
 i are normally distributed.
 i are independent.
Checking The Model Assumptions
Checking for Linearity
 Checking for Constant Variance
 Checking for Normality
 Checking for Independence
How to do this?


If the model is correct, then the residuals
ei  yi  yˆi
can be viewed as the “estimates” of the random errors

Residual plots are the primary tool.
 i‘s.
Checking for linearity
If regression of y on x
should
is linear, then the plot of
exhibit random scatter around zero.
ei
vs.
xi
Example 10.10
i
xi
yi
yˆi
1
0
394.33
360.64
33.69
2
4
329.50
331.51
-2.01
3
8
291.00
302.39
-11.39
4
12
255.17
273.27
-18.10
5
16
229.33
244.15
-14.82
6
20
204.83
215.02
-10.19
7
24
179.00
185.90
-6.90
8
28
163.83
156.78
7.05
The plot is clearly parabolic. The linear regression does not fit
the data adequately. Maybe we can try a second degree model:
9
32
150.33
127.66
22.67
y  0  1 x  2 x 2
ei
Checking for Constant Variance
Plot
ei
vs. yˆi .
Since the yˆi are linear functions of
xi , we can also plot ei vs. xi.
If the constant variance assumption is correct, Var (ei )  
The plot of
ei
vs.
yˆi
would be like
2
Checking for normality

Making a normal plot
1. The normal plot requires that the observations form a random sample
with a common mean and variance.
2.
The
yi
do not form such a random sample,
E(Yi )  i
depend on
and hence are not equal.
3. Residuals using to make normal plot (They have a zero mean and an
approximately constant variance.
xi
Checking for normality
Example 10.10
Checking for Independence
A well-known statistical test is the Durbin-Watson Test
n
d
2
(
e

e
)
 u u 1
u 2
n
2
e
u
u 1
1. When d is more near 2, residuals are more independent.
2. When d is more near 0, residuals are more positively
correlated.
3. When d is more near 4, residuals are more negatively
correlated.
CHECKING FOR OUTLIERS AND
INFLUENTIAL OBSERVATIONS
Checking for Outliers
Standard residuals
ei
e 

SE (ei )
*
i
ei
ei
 , i  1, 2,..., n.
2
s
1 ( xi - x )
s 1 
n
S xx
If ei*  2, then the corresponding observation may be regarded an outlier.
Checking for Influential Observations



An influential observation is not necessarily an outlier.
An observation can be influential because it has an extreme
x-value, an extreme y-value, or both.
How can we identify influential observations?
Leverage
yˆi can be expressed as a linear combination of all the y j as follows:
n
yˆi   hij y j ,
j 1
n
h
i 1
ii
 k 1
where the hij are some functions of the x's. We call hij as the leverage.
A rule of thumb is to regard any hii  2(k  1) / n as high leverage.
The observation with high leverage is an influential observation.
In this chapter, k  1, and so hii  4 / n is regarded as high leverage.
1 ( xi  x ) 2
The formula for hii for k  1 is given by hii  
n
S xx
How to Deal with Outliers and Influential Observations?
Detect outliers and
influential
observations
If they are
erroneous
observations or
not
Discard these
observations
Include them in
the analysis
Do Analysis
Two separate analyses may be done, one with and one
without the outliers and influential observations.
Example 10.12
No.
1
2
3
4
5
6
7
8
9
10
11
X
8
8
8
8
8
8
8
19
8
8
8
Y
6.28
5.76
7.71
8.84
8.47
7.04
5.25
12.50
5.56
7.91
6.89
ei*
-0.341
-1.067
0.582
1.735
1.300
0.031
-1.624
0
-1.271
0.757
-0.089
hii
0.1
0.1
0.1
0.1
0.1
0.1
0.1
1
0.1
0.1
0.1
DATA TRANSFORMATIONS

Linearizing Transformations


Simple
Functional relationship

i.e
power form:
y   x

ln y  ln  x
 ln    ln x
then
y  ln y and x  ln x
produce
0  ln  and 1  
DATA TRANSFORMATIONS

Linearizing Transformations


Simple
Functional relationship

i.e
exponential form:
x
y  e
x
ln y  ln  e
 ln    x
then
y  ln y and x  x
produce
0  ln  and 1  
DATA TRANSFORMATIONS
Linearizing Transformations

y
x
y
y
log x
y
x3
y
 1x
y2
x
log y
x
y3
x
1
y
x
y
x
y
x2
x
y
x
y
log x
y
x2
y
 1x
y2
x3
y
x
y3
x
y2
x
y
x
y3
x
DATA TRANSFORMATIONS

Linearizing Transformations
 Ex. 10.13 (Tire tread wear vs. Mileage: Exponential Model)
y
x
DATA TRANSFORMATIONS

Linearizing Transformations
 Ex. 10.13 (Tire tread wear vs. Mileage: Exponential Model)
y
x
DATA TRANSFORMATIONS

Linearizing Transformations
 Ex. 10.13 (Tire tread wear vs. Mileage: Exponential Model)
y
x
DATA TRANSFORMATIONS

Variance Stabilizing Transformations


Based on two-term Taylor-series approximations
Given relationship between mean and variance:
 2   ( )

The following transformation makes variances
approximately equal, even if means differ :
Y  f ( x)  f (u )   (u ) 
1
2
DATA TRANSFORMATIONS

Variance Stabilizing Transformations

Delta Method
Var  h(Y ) 
 h(  )
2
g 2 ( )
Let:
 h(  )
2
then
g 2 (  )  1  h(  ) 
h(  )   gd( )
consequently
2

Var
Y

g
( )



 

 E Y   
h( y )   gdy( y )
1
g ( )
DATA TRANSFORMATIONS


Variance Stabilizing Transformation
Example 1
Var Y   c2  2  c  0
here

Example 2
Var Y   c2   c  0
here
g (  )  c
then
g ( )  c 
then
h( y )   dy
cy
 1c  dyy  1c ln y
h( y )   c dyy
 1c 
dy
y

2
c
y
CORRELATION ANALYSIS
Background on correlation
 A number of different
correlation coefficients are
used for different situations.
The best known is the
Pearson product-moment
correlation coefficient, which
is obtained by dividing the
covariance of the two
variables by the product of
their standard deviations.
Despite its name, it was first
introduced by Francis Galton.
CORRELATION ANALYSIS
•When it is not clear which is the
predictor variable and which is the
response variable?
•When both variables are random?
Bivariate Normal Distribution

Correlation: a measurement of how closely two variables
share a linear relationship. Or the measure of independence.
  corr(X,Y) 
Cov(X,Y)
Var(X)Var(Y)

If  =0 , uncorrelated, that implies independence, but does
not guarantee it

If  =-1 or +1, it represents perfect association

Useful when it is not possible to determine which variable is
the predictor and which is the response.

Health vs Wealth. Which is predictor? Which is response?
Bivariate Normal Distribution

p.d.f. of (X,Y)
f ( x, y ) 

1
2 X  Y 1  
2
e
2

 x    2
 x   X  y  Y   y  Y   
1


X

 2  

 
 

2  
2(1  )  X 
  X   Y    Y   




Properties



p.d.f is defined at -1<<1
Undefined if =±1 and is called degenerate.
The marginal p.d.f of x is
N (X , X2 )

The marginal p.d.f of Y is
N (Y , Y2 )
Bivariate Normal Distribution
How to calculate

Let

f(X,Y) has a covariance =
  X2
A 
  X  Y
  X2
det 
  X  Y
 X  Y 

 Y2 
 X  Y 
2 2
2 2 2
2 2
2










1





X Y
X Y
X Y
2
Y 
where
f ( x)  N (X ,X2 ) ; f (Y )  N (Y ,Y2 )
Calculation
f ( X ,Y ) 
1
 2 
N
2
e
2 1

  x   X  y  Y  A 



2




det A
wwhere N=2 since it is bivariate bi=2, thus:
f ( X ,Y ) 
1
 2 
N
2

2
X
 Y2 1   2 
e
2 1

  x   X  y  Y  A 



2




wwhere
2


1
Y
A1  2 2

 X  Y 1   2    X  Y


  X  Y 

 X2 
Calculation (cont…)
f ( x, y ) 
f ( x, y ) 
1
2 X  Y 1  
2
1
2 X  Y 1  
2
e

  2  x   2  2  y   2  2    x    y     
1


X
X
Y
X Y
X
Y
 Y


2
2 2
 
2(1  ) 
 X Y



e

  x   2 2   x    y     y   2  
1


X
X
Y
Y





2
2
2
 X Y
2(1  )   X
 Y  



Statistical Inference on the Correlation Coefficient ρ

We can derive a test on the correlation coefficient in the same way that we
have been doing in class.


Assumptions

X, Y are from the bivariate normal distribution

R: sample estimate of the population correlation coefficient ρ
Start with point estimator
n
R


i 1
i
 X )(Yi  Y )
n
n
i 1
i 1
 ( X i  X )2  (Yi  Y )2
Get the pivotal quantity




(X
The distribution of R is quite complicated
T: transform the point estimator into a p.q.
T
R n2
1  R2
Do we know everything about the p.q.?

Yes: T ~ tn-2 under H0 : ρ=0
Derivation of T
are these equivalent?
r n  2 ? ˆ1
t

2
SE ( ˆ )
1 r

Therefore, we can use t as a
statistic for testing against the
null hypothesis
H0: β1=0

Equivalently, we can test
against
H0: ρ=0
1
substitute:
s
S
S xx
r  ˆ1 x  ˆ1 xx  ˆ1
sy
S yy
SST
SSE (n  2) s 2
1 r 

SST
SST
then:
2
t  ˆ1
S xx
SST
ˆ1
ˆ1
(n  2) SST
 s 
2
(n  2) s
SE ( ˆ1 )
S xx
 yes, they are equivalent.
Exact Statistical Inference on ρ


Test
H0 : ρ=0 vs. Ha : ρ≠0
H0 : 1  0vs.Ha : 1  0
Test Statistics
t
r n2
1 r2
 Y
 Y
E (Y x)  Y 
X 
x
X
X
tn  2
S
S
S xx
r  ˆ1 x  ˆ1 xx  ˆ1
Sy
S yy
SST
(n  2) s 2
1 r 
SST
2
t
r n2
1 r2
 ˆ1
S xx
ˆ1
SST
(n  2)

SST
(n  2) s 2 s S xx
tn  2
Exact Statistical Inference on ρ (Cont.)

Reject Region :
Reject H0 if t0 > tn-2

Example:
The times for 25 soft drink deliveries (y) monitored as a function of
delivery volume (x) is shown in table next page.
Testing the null hypothesis that the correlation coefficient is equal to 0.
Exact Statistical Inference on ρ
 Data
Y
X
Y
X
Y
X
Y
X
Y
X
7
16.68
7
18.11
16
40.33
10
29.00
10
17.90
3
11.50
2
8.00
10
21.00
6
15.35
26
52.32
3
12.03
7
17.83
4
13.50
7
19.00
9
18.75
4
14.88
30
79.24
6
19.75
3
9.50
8
19.83
6
13.75
5
21.50
9
24.00
17
35.10
4
10.75
Exact Statistical Inference on ρ
 Solution
The sample correlation coefficient is
(X
n
R
i
 X )(Y  Y )
i

2473.34

i 1
n

n
(X  X )
i
2
i 1
(Y  Y )
2
1136.57 * 5784.54
i
i 1
for α = .01,
Reject H0
t0 
r
n2
1 r

2
17.56  t0
0.96 *
25  2
1  0.96
 17.56
2
t23,0.005  2.807
 0.96
Approximate Statistical Inference on ρ

There is no exact method of testing
ρ vs an arbitrary ρ0
Distribution of R is very complicated
T ~ tn-2 only when ρ = 0



To test ρ vs an arbitrary ρ0 use
Fisher’s Normal approximation
 1  1   1 
 1 R 
tanh 1 R  12 ln 

N
 2 ln 

,

1


n

3
 1 R 





Transform the sample estimate
 1  1  0  1 
 1 r 
  ln 

,
 , under H 0 , ~ N  2 ln 
1


n

3
 1 r 
0 
 

1
2
Approximate Statistical Inference on ρ

Test :

H 0 :   0 vs. H1 :   0
 1 r 
  ln 

1

r


1
2
 1  0 
H 0 :   0  ln 
 vs. H1 :   0
 1  0 
1
2


Z statistic:

z0  n  3   0
reject H0 if |z0| > zα/2

Sample estimater:
CI:
1
1
     z /2
n3
n3
e2l  1
e2u  1
   2u
2l
e 1
e 1
  z /2
Approximate Statistical Inference on ρ

Code:
Approximate Statistical Inference on ρ

Output:
Approximate Statistical Inference on ρ

Retaking the previous example:
The times for 25 soft drink deliveries (y) monitored as a function of
delivery volume (x) is shown in table next page.
Testing the null hypothesis that the correlation coefficient is equal to 0.
SAS coding for last example
data corr_bev;
input y x;
datalines;
7 16.68
3 11.5
3 12.03
4 14.88
6 13.75
7 18.11
2 8.00
7 17.83
30 79.24
5 21.5
16 40.33
10 21.00
4 13.5
6 19.75
9 24.00
10 29.00
6 15.35
7 19.00
3 9.50
17 35.1
10 17.90
26 52.32
9 18.75
8 19.83
4 10.75
;
run;
proc gplot data=corr_bev;
plot y*x;
run;
proc corr data=corr_bev outp=corr;
var x y;
run;
SAS analysis for last example
SAS analysis for last example
Pitfalls of Regression and Correlation Analysis

Correlation and causation


Coincidental data


Baldness and lawyers
Lurking variables(third unobserved variable)


Good mood cause good health
Relationship between eating and weight, with
unobserved variable of heredity(metabolism,and
illness).
Restricted range

IQ, school performance (elementary school to
college) college lower IQ’s are less common so there
would clearly be a decrease in the range.
Pitfalls of Regression and Correlation
Analysis

Correlation and linearity

The correlation value may not
be enough to evaluate a
relationship, especially in the
case where the assumption of
normality is incorrect.

This image created by Francis
Anscombe, has common mean
(7.5), standard
deviation(4.12), correlation
(.81) and regression line
y=3+.5x
SIMPLE LINEAR REGRESSION
AND CORRELATION
Prepared by:
Jackie Zerle
David Fried
Chun-Hui Chung
Weilai Zhou
Shiyhan Zhang
Alex Fields
Yu-Hsun Cheng
Roosevelt Moreno
AMS 572.1 DATA ANALYSIS, FALL 2007.