HSM - Fundamentals

Download Report

Transcript HSM - Fundamentals

Spring 2013
CRASHES AS THE BASIS OF SAFETY ANALYSIS
Crash frequency is used as a fundamental indicator of
“safety” in the evaluation and estimation methods
presented in the HSM. Where the term “safety” is used in
the HSM, it refers to the crash frequency or crash
severity, or both, and collision type for a specific time
period, a given location, and a given set of geometric and
operational conditions.
Objective and Subjective Safety
The HSM focuses on how to estimate and evaluate the
crash frequency and crash severity for a particular
roadway network, facility, or site, in a given period, and
hence the focus is on “objective” safety. Objective safety
refers to use of a quantitative measure that is
independent of the observer.
In contrast, “subjective” safety concerns the perception of
how safe a person feels on the transportation system.
Assessment of subjective safety for the same site will
vary between observers.
Definition of a Crash
In the HSM, a crash is defined as a set of events that
result in injury or property damage due to the collision of
at least one motorized vehicle and may involve collision
with another motorized vehicle, a bicyclist, a pedestrian,
or an object. The terms used in the HSM do not include
crashes between cyclists and pedestrians, or vehicles on
rails.
Definition of Crash Frequency
In the HSM, “crash frequency” is defined as the number of
crashes occurring at a particular site, facility, or network
in a one-year period. Crash frequency is calculated
according to Equation 3-1 and is measured in number of
crashes per year.
C rash Frequency =
N um ber of C rashes
P eriod in Y ears
Definition of Predictive Method
The term “predictive method“ refers to the methodology in
Part C of the HSM that is used to estimate the “expected
average crash frequency” of a site, facility, or roadway
under given geometric design and traffic volumes for a
specific period of time.
Definition of Expected Average Crash Frequency
The term “expected average crash frequency” is used in
the HSM to describe the estimate of long-term average
crash frequency of a site, facility, or network under a
given set of geometric design and traffic volumes in a
given time period (in years).
Definition of Crash Severity
Crashes vary in the level of injury or property damage.
The American National Standard ANSI D16.1-1996
defines injury as “bodily harm to a person” (7). The level
of injury or property damage due to a crash is referred to
in the HSM as “crash severity.” While a crash may cause
a number of injuries of varying severity, the term crash
severity refers to the most severe injury caused by a
crash.
Definition of Crash Severity
Crash severity is often divided into categories according
to the KABCO scale, which provides five levels of injury
severity. Even if the KABCO scale is used, the definition
of an injury may vary between jurisdictions.
Definition of Crash Severity
The five KABCO crash severity levels are:
K—Fatal injury: an injury that results in death;
A—Incapacitating injury: any injury, other than a fatal injury, that
prevents the injured person from walking, driving, or normally
continuing the activities the person was capable of performing before
the injury occurred;
B—Non-incapacitating evident injury: any injury, other than a fatal
injury or an incapacitating injury, that is evident to observers at the
scene of the crash in which the injury occurred;
C—Possible injury: any injury reported or claimed that is not a fatal
injury, incapacitating injury, or non-incapacitating
evident injury and includes claim of injuries not evident;
O—No Injury/Property Damage Only (PDO).
Crashes Are Rare and Random Events
Crashes are rare and random events. By rare, it is implied
that crashes represent only a very small proportion of
the total number of events that occur on the transportation
system. Random means that crashes occur as a function
of a set of events influenced by several factors, which are
partly deterministic (they can be controlled) and partly
stochastic (random and unpredictable). An event refers to
the movement of one or more vehicles and or pedestrians
and cyclists on the transportation network.
Crashes Are Rare and Random Events
A crash is one possible outcome of a continuum of events
on the transportation network during which the probability
of a crash occurring may change from low risk to high
risk. Crashes represent a very small proportion of the total
events that occur on the transportation network. For
example, for a crash to occur, two vehicles must arrive at
the same point in space at the same time. However,
arrival at the same time does not necessarily mean that a
crash will occur. The drivers and vehicles have different
properties (reaction times, braking efficiencies, visual
capabilities, attentiveness, speed choice), that will
determine whether or not a crash occurs.
Natural Variability in Crash Frequency
Because crashes are random events, crash frequencies naturally
fluctuate over time at any given site. The randomness of crash
occurrence indicates that short-term crash frequencies alone are not
a reliable estimator of long-term crash frequency. If a three-year
period of crashes were used as the sample to estimate crash
frequency, it would be difficult to know if this three-year period
represents a typically high, average, or low crash frequency at the
site.
This year-to-year variability in crash frequencies adversely affects
crash estimation based on crash data collected over short periods.
The short-term average crash frequency may vary significantly from
the long-term average crash frequency. This effect is magnified at
study locations with low crash frequencies where changes due to
variability in crash frequencies represent an even larger fluctuation
relative to the expected average crash frequency.
Natural Variability in Crash Frequency
Theoretical Process of Motor Vehicle Crashes
Each time a vehicle enters an intersection, a highway segment, or any
other type of entity (a trial) on a given transportation network, it will
either crash or not crash. For purposes of consistency a crash is
termed a “success” while failure to crash is a “failure.” For the Bernoulli
trial, a random variable, defined as X, can be generated with the
following probability model: if the outcome “w” is a particular event
outcome (e.g. a crash), then X (ω) = 1 whereas if the outcome is a
failure then X (ω) = 0. Thus, the probability model becomes:
X
1
0
P(x=X)
p
q
where p is the probability of success (a crash) and q=(1-p) is the
probability of failure (no crash).
Binomial distribution
It can be shown that if there are N independent trials (vehicle passing
through an intersection, road segment, etc.), the count of successes
over the number of trials give rise to a Bernoulli distribution. We’ll
define the term Z as the number of successes over the N trials. Under
the assumption that all trials are characterized by the same failure
process (this assumption is revisited later), the appropriate probability
model that accounts for a series of Bernoulli trials is known as the
binomial distribution, and is given as:
N  n
N n
P  Z  n     p 1  p 
 n 
Equation 1
Where,
n = 0, 1, 2, … , N (the number of successes or crashes)
Poisson Approximation
For typical motor vehicle crashes where the event has a very low
probability of occurrence and a large number of trials exist (e.g.
million entering vehicles, vehicle-miles-traveled, etc.), it can be shown
that the binomial distribution is approximated by a Poisson
distribution. Under the Binomial distribution with parameters N and p,
let p=λ/N , so that a large sample size N will be offset by the
diminution of p to produce a constant mean number of events λ for all
values of p. Then as N -› ∞, it can be shown that:
N
P Z  n  
 n
n
   
 

 1 

N
N
 


N n


n
e

Equation 2
n!
Where,
n = 0, 1, 2, … , N (the number of successes or crashes)
λ = the mean of a Poisson distribution
Poisson Approximation
The approximation illustrated in Equation (2) works well when the
mean λ and p are assumed to be constant. In practice however, it is
not reasonable to assume that crash probabilities across drivers and
across road segments (intersections, etc.) are constant. Specifically,
each driver-vehicle combination is likely to have a probability that is a
function of driving experience, attentiveness, mental workload, risk
adversity, vision, sobriety, reaction times, vehicle characteristics, etc.
Furthermore, crash probabilities are likely to vary as a function of the
complexity and traffic conditions of the transportation network (road
segment, intersection, etc.). All these factors and others will affect to
various degrees the individual risk of a crash.
These and other characteristics affecting the crash process create
inconsistencies with the approximation illustrated in Equation (2).
Outcome probabilities that vary from trial to trial are known as Poisson
trials (note: Poisson trials are not the summation of independent
Poisson distributions; this term is used to designate Bernoulli trials with
unequal probability of events).
Poisson Approximation
The equation below is used for determining if the unequal event of
independent probabilities can be approximated by a Poisson process.
N
d T V ( L ( Z ), P o (  ))  m in{1,  }  p i  m ax p i Equation 3
1 i  N
1
2
i 1
Where,
dTV = total variance between the two probabilities measured
L(Z) and Po(λ);
L(Z) = count data generated by unequal probability of events
Po(λ) = count data generated by unequal events of
independent probabilities with λ=E(Z).
See Barbour et al. (1992) Poisson Approximation. Clarendon Press, New York,
NY for additional information.
Poisson Approximation
The equation below is used for determining if the unequal event of
independent probabilities leads to over-dispersion, VAR(Z) > E(Z).
If
 VAR (W )

  m in{1,  } 
 1  0
 E (W )

Then
d T V ( L (W ), P o (  ))  (  /  )
2
r
r /( r  2 )
For any r > 2, where
 r  m in{1, 
1 / 2
}{ E (| W   | )}
r
1/ r
Crash Data as Poisson Process
Given the characteristics described in the previous overheads, it is often
assumed that crash data on a given site (or entity) follow a Poisson a
distribution. In other words, if one were to count data over time for one
site, the data are assumed to be Poisson distributed.
Example:
3
7
0
1
2
3
3
4
1
i
4
Crash Count
i+1
Time t
Poisson assumption: Where, P o  y |   
λ = Mean of the Poisson distribution
y = Crash count (0, 1, 2, …)
 e
y

y!
Crash Data as Poisson Process
If we have counts = 3, 7, 0, and 3 on an entity, what is ̂λ?
  3e      7 e   
P o  3, 7, 0, 3 |    


 3!   7 ! 
  0e      3e   



 0 !   3! 
Crash Data as Poisson Process
We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function)
 i 1 y i
n
L   

e
 n
n

yi !
i 1
ln L    
1
L

L



y

ln


n



constant
i
i 1

n
n
i 1

yi
n
Crash Data as Poisson Process
We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function)

n
i 1
*
yi

L  
n0
 
*

n
y
i 1 i
n
is maximum at

*
Crash Data as Poisson Process
p(y)
λ
λ*
L  
is maximum at

*
 
*

n
y
i 1 i
n
Crash Data as Poisson Process
Accuracy of estimation ( ˆ ):
    
V A R  ˆ   n  

n
n
V A R ˆ  E   ˆ
2

n
V A R  yi 
i 1
n
2
2
Counts and Predicted Values
 
VAR ˆ
i
1
yi
3
ˆ
3
3
2
7
5
2.5
3
0
3.3
1.1
4
3
3.25
0.8
Crash Data as Poisson Process
Number of Sites
Method to calculate the mean and variance observed in crash data
Finding the mean:
20
18
16
14
12
10
8
6
4
2
0
Eˆ     y
Finding the variance:
ˆ    s 2  y
VAR
0
1
2
3
4
Crash Count
Where:
n
y 

i 1
yi  n  yi 
n
n
s 
2

i 1
 yi  y 
2
n
 n  yi 
Overdispersion (aka heterogeneity)
Crash data can rarely be exhibited as pure Poisson distribution.
Usually, the data display a variance that is greater than the
mean, VAR(Y) > E(Y). This is known as over-dispersion.
Sometimes, the data can show under-dispersion, but this is very
rare.
The principal cause of over-dispersion was explained in the
previous overheads (Bernoulli process with unequal probability of
events). Over-dispersion can also be caused by numerous factors.
For other types of processes (not based on a Bernoulli trial),
over-dispersion can be explained by the clustering of data
(neighborhood, regions, wiring boards, etc.), unaccounted
temporal correlation, and model mis-specification. These factors
also influences the heterogeneity found in crash data.
Overdispersion (aka heterogeneity)
In order to account for over-dispersion commonly found in crash
data, it has been hypothesized that the mean (λ) found in a
population of sites follows a gamma probability density function.
In other words, if we have a population of entities (say 100
intersections) their mean λs (if everything else remain constant)
would follow a gamma distribution. The gamma probability
density function is defined by:

f     
  1     
e
  
for  > 0
Where,
 = the mean of the selected site
 , = parameters of the gamma distribution [gamma(  , )]
    = gamma function (∫e-u u(Φ-1) du)
Overdispersion (aka heterogeneity)
f(λ)
Distribution of the Poisson means
follows the gamma distribution
λ
As discussed in the previous slide, 100 intersections with the
exact same characteristics (traffic flow, geometric design, etc.),
including the number of crashes per year, will (or are expected
to) have different Poisson mean λ values. The distribution of
these means is assumed to be gamma distributed.
Overdispersion (aka heterogeneity)
There are three reasons why the gamma probability function has
been a popular assumptions:
1. The mean λ is allowed only to take a positive value;
2. The Gamma PDF is very flexible; it can move and stretched to
fit a variety of shapes; and
3. It makes the algebra simple and often yields “closed form “
results.
Note: Nobody has proved so far that the mean varies according
to a gamma probability function. People use it because it is easy
to manipulate. Some researchers have used a lognormal function
to characterize the distribution of the mean, which is a little more
complex. You can also use more complicated distributions (e.g.,
Conway-Maxwell-Poisson).
Overdispersion (aka heterogeneity)
The mean and variance of gamma probability density function
can be estimated as follows:
E   

 

VAR   


2
 
E  
E  
VAR  
2
VAR  
To estimate  and  from data, you can use the following
equations
ˆ 
y
ˆ 
2
s  y 
2
y
s  y 
2
y and s2 are estimated using the
equations shown above for the Poisson.
Negative Binomial (or Poisson-gamma)
It can be shown that if the mean of a Poisson distribution is
gamma distributed, the joint mixed distribution gives rise to the
Negative Binomial distribution. The derivation is shown as
follows:

p y 
p y 
p y 
1
   


e


y
0

   1      

e
d 
y ! 


1
y !   


  y 1
e
 1  1 

d
0
1
  y  1    

  
   y  



1



  y  1  1  
 
p y  

 1 

y


1


1
 




y
Overdispersion (aka heterogeneity)
Note: Poisson-gamma distribution is often characterized by the
setting    :
E   

E   

VAR   


2

1

VAR   


2
 1

This is known as the one parameter gamma distribution.
The relationships shown above will become useful for describing
the mean and variance of the negative binomial regression
model.
Negative Binomial (or Poisson-gamma)
The mean and variance of the Negative Binomial distribution (one
parameter) are estimated using the following equations:
E  y   E  
VAR  y   E   
E  
2

Note: if    , the second part of the variance function tend
towards 0. This means that the Negative Binomial becomes a Poisson
distribution since the mean and variance are now equal.
Note: For modeling purposes, the term  is usually estimated directly
from the data. This will be addressed later in the course.
Negative Binomial (or Poisson-gamma)
Alternative forms of the PDF:
x
 y    1      
P( y)  

 

  1          
   y 
x

     
P( y) 

 

     y  1          

Analysis method known as “in-dept” analysis or clinical study. This
method does not rely on the statistical nature of the crash process. It
seeks to determine the deterministic mechanism that lead to an
accident or a crash. This method is very common for extremely rare
events, such aircraft or space shuttle crashes.
The data needed to carry out such investigations and reconstruction
are very extensive. Often, it is not available on typical computerized
crash records.
Site Characteristics
Speed Limit Policy
Speed Distribution
Outcome
An example: analysis of de-icing roads and traffic club for children using
causal chain approach. This study aims at estimating the reduction in
risk factors for each chain.
See Elvik (2003) in Accident Analysis & Prevention, Vol. 35, pp. 741-748
De-icing Roads
De-icing Roads
Traffic Club