Assigning Individuals to Populations

Download Report

Transcript Assigning Individuals to Populations

Lab 8: Individual Identity and Population Assignment

Goals

1. Use LR method to assess probability that a person left DNA at a crime scene 2. Assign individuals to a predefined population using LR method.

3. Understand the role of human population differentiation in reliable population assignment.

Individual Identity

• Scenario: Skin cells under the fingernails of the murder victim match the DNA profile of a suspected Sicilian hitman who was seen exiting the apartment. Genotyping of the sample gives a single DNA profile that matches the suspect

.

Individual Identity

Let

E

= Let

H

1 Genotype of the skin cell and the suspect match (Evidence) = Suspect left DNA on the crime scene (Prosecutor's hypothesis) Let

H

2 = Random unrelated male left DNA on the crime scene (Defendant's hypothesis)

Ideally…

P

(

H

1 |

E

) 

P

(

H

1 ,

E

) 

P

(

E

)

P

(

E

j

|

H

1 )

P

(

H

1 )

P

(

E

,

H j

)  

j P

(

P

(

E E

| |

H

1 )

P

(

H

1 )

H j

)

P

(

H j

)

For the courtroom…

=1

L

(

H

1 ,

H

2 |

E

) 

LR

P

(

E

|

H

1 )

P

(

E

|

H

2 ) Expected Freq. based on HWE

A B

Locus

Example

Allele 1 (frequency)

A1 (0.3) B1 (0.4)

Allele 2 (frequency)

A2 (0.7) B1 (0.4)

P

(

E P

(

E

| |

H

1 )

H

2 )   1 2 ( 0 .

3 )( 0 .

7 )  ( 0 .

4 ) 2  0 .

0672

L

(

H

1 ,

H

2 |

E

) 

LR

P

(

E

|

H

1 )

P

(

E

|

H

2 )  1 0 .

0672  14 .

88

Interpretation

• • • “It is about 15 times more likely that the sample came from the suspect than from a random person unrelated to the suspect.” Prosecutor’s Fallacy: “It is 15 times more likely that the sample is from the suspect than from someone else” Defense Attorney’s Fallacy: “Because the odds that the sample came from the suspect rather than someone else are only 15:1, there are hundreds of thousands of people who are just as likely to be the sources of the sample found at the scene

Interpretation

Prosecutor’s Fallacy (part 2): “Given the DNA

evidence, the probability that the sample came from somebody other than the suspect is 0.0672”.

P

(

H

2 |

E

) 

P

(

E

|

H

2 )

Individual identity

P

(

H

1 |

L

(

H

1 ,

P

(

H

1

E

) | 

E

)

H

2 |

E

) 

LR

P P

( (

E E

| |

H H

2 1 ) ) 

P

(

E

  |

P

(

E P

(

E H

1 )

P

(

P

(

E H

| 1 )

H

 1 )

P

(

H

1 )

P

(

E

|

H

2 )

P

(

H

2 ) | |  

P

(

E P

(

E

| |

H

1 )

H

2 )  

P

(

E H

1 )

H

2 )   

P

(

E

| |

H

1 )

P

(

H

1 ) 

P

(

E

|    

H

1 )

P

(

H

1 )

P

(

E P

(

E P

(

E P

(

E

| | | |

H

1 )

H

2 )

H

1 )

H

2 )

H

2 )

P

(

H

2 )     

P

(

H

1 |

E

) 

LR

LR P

( 

H

1

P

( )

H

1 

P

) (

H

2 ) 

LR

P LR

( 

H

1 )

P

 (

H

1 1  )

P

(

H

1 )

Problem 1.

The profile of a crime suspect genotyped for three of the Combined DNA Index System (CODIS) loci used by the U.S. Federal Bureau of Investigation is: This profile matches perfectly to the profile from a hair sample found at the crime scene. CODIS Locus D8S1179 D21S11 D7S820

Suspect Profile

Allele 1 (frequency) Allele 2 (frequency) 12 (0.1119) 14 (0.2238) 28 (0.1049) 11 (0.2797) 31 (0.0664) 11 (0.2797) a) Calculate the likelihood ratio if H 1 suspect and H 2 is the hypothesis that the sample found at the crime scene is from the is the hypothesis that the sample found at the crime scene is from a person unrelated to the suspect. Be sure to provide a strictly correct interpretation of this likelihood ratio.

b) Calculate the posterior probability P(H 1 |E) for each of the following scenarios:

1 2 Prior Prob. P(H 1 )

1/(6.6

 10 9 ) 1/(3.03

 10 8 )

3

1/56,000

4

1/24

Rationale for P(H 1 )

All ~6.6 billion people on the planet are considered equally likely to be the perpetrators. All ~303 million U.S. citizens are considered equally likely to be the perpetrators.

All ~56,000 people (including students) currently living in Morgantown are considered equally likely to be the perpetrators.

All 24 students currently enrolled for BIOL 464 are considered equally likely to be the perpetrators.

Discuss the extent to which different prior probabilities affect posterior probabilities in b), and why this is relevant to forensic identification of samples at crime scenes.

Accounting for Pop. Structure

P

(

G

) = Õ éë

k p i

2 +

p i

(1 -

p i

)

F ST

ùû Õ

l

2

p i p j

• F ST typically much less than 0.01, but a conservative value of 0.01 is used with all ethnic groups .

A value of 0.03 is used for Native Americans.

• Typically, the ethnic group with the lowest LR (most conservative) is used for the courtroom calculation.

Problem 2.

There are two hypotheses about the ethnicity of the crime perpetrator from Problem 1 (group 1 and group 2), and the frequencies of the alleles from the profile of the suspect in these groups are as follows:

Locus

D8S1179 D8S1179 D21S11 D21S11 D7S820

Allele

12 14 28 31 11

Frequency in group 1

0.1119

0.2238

0.1049

0.0664

0.2797

Frequency in group 2

0.1622

0.1982

0.0495

0.0495

0.3378

Based on the allele frequencies in the two ethnic groups, what would be the estimated likelihood ratio in each group: a) In the absence of substructure.

b) If F ST = 0.01 in group 1, and F ST = 0.03 in group 2. c) Which estimates should be used in court for each case and how should the evidence be presented (i.e., please provide a correct interpretation of the likelihood ratios)?

Assigning individuals to predefined populations based on LR

Lets say,

H

1

H

2 is the hypothesis that an individual is from population 1 is the hypothesis that the individual is from population 2 G is the multilocus genotype of the individual.

L

(

H

1 ,

H

2 |

G

) 

LR

P

(

G

|

H

1 )

P

(

G

|

H

2 )

Assumptions:

Both populations are in HWE.

Selected loci are in linkage equilibrium.

Example: LR = 230 means “the individual is 230 times more likely to originate from population 1 than from population 2”

Assignment success depends on:

1. # and variability of molecular markers.

2. # of potential source populations.

3. Accuracy of allele frequency estimations.

4. F

ST.

Problem 3.

During a camping trip in Glacier National Park (British Columbia), you discover what appear to be bear feces not far from where you had pitched your tent with the intention to spend the following several weeks. Knowing that both brown bears (Ursus arctos) and black bears (Ursus americanus) inhabit this area, you decide to determine whether your neighbor is a brown or a black bear. You mail a sample to a friend at Stanford and, within a day, you receive the genotype of the bear for three microsatellite loci and allele frequency distributions for brown and black bears in the area (see tables below).

a)Is it more likely that the feces are from a black bear or from a brown bear? Provide a correct interpretation of the likelihood ratio as part of your answer.

b)Discuss the practical significance of your findings. If you are lacking ideas, consider the fact that black bears rarely attack humans, whereas brown bear attacks result in an average of two human casualties per year in the U.S.

Locus Allele

Locus G1A G1D G10B

Bear Profile

Allele 1 192 178 156 Allele 2 196 178 158 G1A G1A G1D G10B G10B 192 196 178 156 158

Frequency in brown bears

0.025

Not detected

0.063

0.038

0.263

Frequency in black bears

0.254

0.034

0.112

Not detected

0.190

Problem 4

: Use GenAlEx to analyze human_struc.xls

and human_by_region.xls

data to see success of population assignment in these two datasets.

a) Discuss the success of population assignment in both analyses. If the success was different in the two analyses, explain why. If it was not, explain why not.

b) Compare these results to the Structure results from Lab #7. Which approach gave the clearest answer? Which do you think is most appropriate for this dataset? Which do you think would be most appropriate for assigning a randomly selected human to a population of origin?

c) Discuss the practical significance of your findings. For example, can population assignment be used reliably in criminal investigations? If yes, explain under what circumstances. If not, explain why not.

Problem 5. GRADUATE STUDENTS ONLY: Find an application of population assignment in the literature. Describe: • the question or hypothesis that was addressed, • the test(s) applied, • and the general conclusions.

Be sure to critique the method.

Two points of extra credit will be awarded if you discover an improper application of population assignment in a peer-reviewed publication.

Be sure to send the paper to Rose with your report (.pdf please).