Transcript ISTA Workshop on Statistical Aspects of GMO Detection
Generalities & Qualitative Testing Plans
May 8-10, 2006 Iowa State University, Ames – USA Jean-Louis Laffont Kirk Remund
Objectives
• Introduce Acceptance Sampling – review assumptions – definitions – understand strengths & limitations • Use with a qualitative assay – zero tolerance plans – plans that allow deviants – purity testing ISTA Statistics Committee 2
Challenges: random sampling variability
Seed Lot
0.09% 0.07% 0.12% 0.11% ISTA Statistics Committee 0.05% 3
Challeges: Sampling & Assay Variability
Seed Lot Sample
0.15% 0.12% < 0.10%
Assay (PCR)
ISTA Statistics Committee 0.09%
Sample Prep
4
Benefits of acceptance sampling approach 1.
Manage sampling variability & assay errors 2.
Maintain flexibility: seed pooling schemes, single or double stage testing 3.
Maintain confidence in decisions – “ We are 95% confident that the GMO presence in this lot is < 0.1% ” ISTA Statistics Committee 5
Assumption:
“
Representative
”
Sample
• Definition 1 – “Obtain sample so that each seed has an equal and independent chance of being selected [called a simple random sample (SRS)]” – Index every seed, pick random numbers, obtain indexed seeds ...
– Good idea?
1 2 3 4 5 1,000,000,000 • Definition 2: mimic SRS sample – bag sampling (ISTA rules) – probe sampling (uniform grid) – systematic sampling ISTA Statistics Committee 6
Probe sampling
Sampling bulk containers (e.g., trucks or bins) Often reasonable approach if heterogenuity occurs as horizontal or inverted cone layers Sam pling collection point: probe the depth of the container
ISTA Statistics Committee 7
Systematic sampling
• Sample a flow of seed on regular time interval – flow from hopper bottom truck – flow from a silo • More samples as heterogeneity increases • Sample collect from cut through entire stream of flowing seed • Caution: Make sure that there is not cyclic behavior in flow that correlates with sampling interval ISTA Statistics Committee 8
Obtaining Pools to Evaluate Bulk Characteristics Obtain sample seed lot primary samples … composite sample submitted sample seed pools (bulks) for testing ISTA Statistics Committee 9
Assumption: Seed lot is large
• Sample size should be no larger than 10% of population • This condition must hold to use Seedcalc or Qalstat • If this assumption is not met we must use methods based on the hypergeometric distribution ISTA Statistics Committee 10
Acceptance sampling for qualitative assays
SEED SEED SEED SEED SEED LOT SAMPLE OF SEEDS X DEVIANT SEEDS FOUND X>C X
C REJECT LOT ACCEPT LOT
ISTA Statistics Committee 11
Definitions
• LQL = lower quality limit – highest level of impurity that is acceptable to consumer – “95% confident that seed impurity is below 1%” (LQL=1%) • AQL = acceptable quality level – level of impurity that is acceptable to producer and consumer – Some definitions • Conservative: producer can produce seed at this impurity level or below • Practical:
process average
• Set in relation to threshold 12
Definitions, cont.
AQL LQL
0% 0.15% 0.2% % impurity ISTA Statistics Committee 0.5% 13
Definitions, cont.
• Consumer Risk = chance of accepting “bad” lot (lot impurity = LQL) • also called beta ( ) • Producer Risk = chance of rejecting “good” lot (lot impurity = AQL) • also called alpha ( ) ISTA Statistics Committee 14
Operating characteristic (OC) curve
want these whatever don ’ t want these
100% Ideal OC Curve 80% 60% High chance of accepting lot at AQL (alpha) High chance of rejecting lot at LQL (beta) 40% 20% 0% AQL True Im purity in Lot LQL
ISTA Statistics Committee 15
OC curves, cont.,
100% 90% 80% AQL=0.5
% LQL=1.0% Poor Testing Plan low producer risk high consum er risk 70% 60% 50% Good Testing Plan low producer risk low consum er risk 40% 30% 20% 10% 0% 0.00% Poor Testing Plan high producer risk low consum er risk 0.25% 0.50% 0.75% 1.00% 1.25% True Im purity Level (%) 1.50% 1.75% 2.00%
ISTA Statistics Committee
n=400, c=1 Large n n=400, c=4
16
LQL & AQL in relation to threshold LQL = threshold AQL = what producer can deliver
Retest Acceptance
LQL = 2 x threshold AQL = ½ x threshold (similar to tolerance approach)
Retest Acceptance 0 0.5
1 1.5
Actual % Impurity in Lot 2 2.5
0 0.5
1 1.5
Actual % Impurity in Lot 2 2.5
ISTA Statistics Committee 17
Reducing Costs: Testing Seed Pools Rather than Individuals
5 seed pools 300 seeds per pool
• Works well in testing for adventitious presence • Assay must be able to detect one GM seed in pool of all conventional seed with high confidence ISTA Statistics Committee 18
Challenge: setting the threshold
Option 1 : require true zero threshold result : test all seed in entire lot…..
Option 2 : “zero tolerance” in sample result 1 : hidden non-zero threshold Example:
USDA recommendation for Starlink (Cry9c), test 2400 seeds and allow zero positives yields a 0.19% threshold rather than zero.
result 2 : high cost to producer Throw away a lot of good seed due to false positives and sampling variability ISTA Statistics Committee 19
Challenge: setting the threshold, cont.
Option 3 : set reasonable non-zero threshold, allow for some positives result 1 : manage consumer and producer risks to acceptable levels result 2 : better manage impact of assay errors on results result 3 : most seed approved for sale will be much lower than threshold (e.g., 3 or 10 times lower) ISTA Statistics Committee 20
Zero Tolerance Plans
LQL=1.0% AQL=0.5
% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.00% 0.25% 0.50% 0.75% 1.00% 1.25% True Im purity Level (%) 1.50%
ISTA Statistics Committee
1.75% 2.00%
21
The Perfect Plan
50% 40%
Reject 0% of “ Good ” Lots Accept 0% of “ Bad ” Lots
30% 20% 10% 0% 0.01% 0.10% 0.25% 0.50%
True Lot Impurity
Accept 1.50% Reject 2.00%
ISTA Statistics Committee 22
Zero Tolerance Plan - Test one pool of 300
50% 40% 30% 20%
Reject ~20% of “ Good ” Lots Accept <1% of “ Bad ” Lots
10% 0% 0.01% 0.10% 0.25% 0.50% 1.50% 2.00% Accept
ISTA Statistics Committee
Reject
23
50%
Almost Perfect Plan: Test 6 pools of 300, accept 4 deviants pools or less
40%
Reject 5% of “ Good ” Lots Accept <1% of “ Bad ” Lots
30% 20% 10% 0% 0.01% 0.10% 0.25% 0.50% 1.50% 2.00% Accept
ISTA Statistics Committee
Reject
24
100%
OC curves for two testing plans 1 pool of 300 6 pools of 300
80% 60% 40% 20% 0% 0 0.5
1 1.5
Actual % Impurity in Lot
ISTA Statistics Committee
2 2.5
25
Hypothetical situation: “ Ten seed pools of 300 seeds each are tested from a conventional seed lot and 5 pools test positive for adventitious presence. The lot is labeled as having less than 1% adventitious presence and it is shipped.
”
Should they have shipped the lot?
ISTA Statistics Committee 26
Yes.
10 pools of 300 seeds each 60 pools of 50 seeds each Can see up to 7 positive pools and still have 95% confidence the true lot purity is below 1% threshold Can see up to 17 positive pools and still have 95% confidence the true lot purity is below 1% threshold ISTA Statistics Committee 27
OC Curves for two testing plans
100%
60 pools of 50 seeds 10 pools of 300 seeds
80% 60% 40% 20% 0% 0 0.5
1 1.5
Actual % Impurity in Lot 2 2.5
ISTA Statistics Committee 28
More definitions
• False negative rate (FNR) – probability that a positive sample tests negative – PCR failures, DNA problems, … • False positive rate (FPR) – probability that a negative sample tests positive – DNA contamination, … ISTA Statistics Committee 29
Assay Error Impact (pool size =1) 100 80 60 No Errors 10% false negative rate 40 20 20%false negative rate 0 1% false positive rate 0 2 4 % Deviants in Lot 2% false positive rate
ISTA Statistics Committee
6 8
30
Double Stage Testing Plan N 1
X
1
b
X 1
X
1
a a
X
1
b
N 2
X
1
X
2
c
X 2
X
1
X
2
c
ISTA Statistics Committee 31
No Pooling Allowed!!
Trait Purity Testing
• • • Example: Testing RR Soybeans are above 98% trait purity Must test individual seeds DNA or protein assay detects intended trait rather than unintended trait in AP testing • • FNR has larger effect on testing plan than FPR Roles of FNR & FPR reverse in Seedcalc6 and Qalstat programs ISTA Statistics Committee 32
Introduction to Seedcalc
ISTA Statistics Committee 33