BASICS OF WET STATISTICS

Download Report

Transcript BASICS OF WET STATISTICS

BASICS OF WET
STATISTICS
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
GRAPH THE DATA
80
Raw Data
Mean
70
Response (% Effect)
60
50
40
30
20
10
0
0
1
2
3
4
5
Concentration (% Effluent)
6
ANALYZE DATA FOLLOWING EPA
WET STATISTICAL FLOWCHARTS
• Hypothesis Tests
–NOAEC (Acute)
–NOEC (Chronic)
• Point Estimation
–LC50 (Acute)
–EC25 or IC25 (Chronic)
PURPOSE OF HYPOTHESIS
TESTS AND BASIC
CONSIDERATIONS
• Purpose - Determine if two things
(responses) are different
• Relevance of initial (control) condition(s)
• Power of statistical test
EFFECTS ASSOCIATED WITH
THE NOEC IN FATHEAD MINNOW
GROWTH DATA
25
Effect at NOEC
20
15
10
5
0
-5
-10
0
2
4
6
8
Test #
10
12
14
16
EPA HYPOTHESIS TEST
FLOWCHART (MULTI-CONC)
• Test assumptions of
• Select appropriate test
ANOVA
– Parametric Tests
– Transform data if
• Assumptions met
necessary
– Non-Parametric Tests
– Normally distributed
• Assumptions NOT
data
met
• Shapiro-Wilks Test
– Variance is equal
• Bartlett’s test
MULTIPLE CONCENTRATION
PARAMETRIC TESTS
• Dunnett’s Test
–Equal number of replicates in
each treatment
• Multiple t-tests with Bonferroni
adjustment
–Unequal number of replicates in
each treatment
MULIPLE CONCENTRATION
NON-PARAMETRIC TESTS
• Steel’s Many-one Rank Test
–Equal number of replicates in
each treatment
• Wilcoxon Rank Sum
–Unequal number of
replicates in each treatment
PASS/FAIL TESTS
• Control and critical concentration (IWC)
• Test assumptions
– Transformations - Arc sine square root
– Normality - Shapiro-Wilk’s test
– Homogeneity - F-test
• Test for statistical difference
– Normal/homogeneous - t-test
– Non-normal - Wilcoxon rank sum test
– Normal/heterogeneous - Modified t-test
PURPOSE OF POINT-ESTIMATION
AND BASIC CONSIDERATIONS
 Describe
relationship
between two
parameters
12
10
8
 Selection of a
significant
response
6
4
2
0
0
2
4
6
8
10
12
 Elucidation of
relationship
 Confidence in
relationship
EPA POINT-ESTIMATE
METHOD SELECTION
• Binomial Data
–Probit
–Spearman-Karber
• Untrimmed or trimmed
–Graphical
• Continuous Data
–ICp / Linear Interpolation
PROBIT ANALYSIS
• Binomial data only (two choices)
– Dead or alive, normal/abnormal, etc.
• Normally distributed
• Adjusted for control mortality
– Abbott’s correction
• At least two partial mortalities
• Sufficient fit
– Chi-square test for heterogeneity
• Designed for LC50/EC50 and confidence
intervals
SPEARMAN-KARBER
• Nonparametric model
• Monotonic concentration response
– Smoothing
• Adjusted for control mortality
• Zero response in the lowest concentration
• 100% response in the highest
concentration
• Calculates LC50/EC50
• Confidence interval calculation requires at
least one partial response
TRIMMED SPEARMAN-KARBER
• Same basic procedure as SpearmanKarber
• Requires at least 50% mortality in one
concentration
• The trimming procedure is employed
when the zero and/or 100% response
requirements of Spearman-Karber
method are not met.
GRAPHICAL METHOD
• Specifics
–Nonparametric procedure
–Adjusted for control mortality
–Monotonic concentration response
• Smoothing
–Linear interpolation of “all or
nothing” response
–Calculates LC50/EC50 - No CI’s
INHIBITION CONCENTRATION
(ICp)
• Specifics
– Nonparametric procedure
– Calculates any effect level
– Monotonic concentration response
• Smoothing
– Random, independent, and representative
data
– Piecewise linear interpolation
– Bootstrapped confidence intervals
SOFTWARE PROGRAMS
• Many software packages/programs
are available
• DO NOT assume they follow the EPA
recommended analysis
• DO verify the software by running
example datasets from the methods
manuals
DO THE RESULTS MAKE
SENSE ???
0
Percent Effect
10
20
30
40
Raw Data
Mean
Probit
% MSD
EC25
50
60
70
80
0
1
2
3
4
5
Concentration (% Effluent)
6
TOXIC UNITS IN WET TESTS
• Goals
1) Standardize the results of toxicity
tests to simulate chemical
specific criteria.
2) Create a reporting value which
increases with sample toxicity.
DEFINITIONS OF TU VALUES
• Acute
– TUa = 100/LC50 OR
• Chronic
– TUc = 100/NOEC
• where the NOEC is defined by
hypothesis testing or the IC25
SUMMARY OF THE ANALYSIS OF
WET DATA
• STEP 1: Graph The Data
• STEP 2: Analyze The Data By EPA Methods
• STEP 3: Do The Results Make Sense?
ANALYSIS OF
MULTIPLE CONTROL
TOXICITY TESTS
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
WHAT IS A CONTROL SAMPLE ?
• A treatment in a toxicity test that
duplicates all the conditions of the
exposure treatments but contains no
test material. The control is used to
determine the absence of toxicity of
basic test conditions (e.g. health of
test organisms, quality of dilution
water). Rand and Petrocelli, 1985.
WHAT IS A REFERENCE
SAMPLE?
• “A reference sample is the “control”
by which to gauge the instream
effects of a discharge at a particular
site.” Grothe et.al. 1996.
- site-specific
- ecoregional
WHEN ARE MULTIPLE
CONTROLS USED?
• When manipulations are made to SOME of the test
concentrations or treatments.
• To compare “standard” and “alternative” methods.
• When testing control and/or reference samples in
which the quality is unknown.
• When a sample used for toxicity testing possess
physico-chemical properties significantly different
from water in which surrogate test organisms were
cultured.
• TIEs - Toxicity Identification Evaluations.
WHEN ARE MULTIPLE
CONTROLS USED?
Example #1
• When manipulations are made to
SOME of the test concentrations or
treatments.
BRINE ADDITION IN MARINE
TESTS
Concentration
Seawater
Control
1.25 %
2.5 %
5 % (IWC)
10 %
20 %
Brine
Control
Effluent Volume
( 0 ppt)
0 ml
12.5 ml
25 ml
50 ml
100 ml
200 ml
0 ml
Brine Volume Seawater Volume Salinity
(68 ppt)
(34 ppt)
0 ml
1000 ml
34 ppt
0 ml
0 ml
0 ml
100 ml
200 ml
200 ml
+ 200 ml
987.5 ml
975 ml
950 ml
800 ml
600 ml
600 ml
34 ppt
33 ppt
32 ppt
34 ppt
34 ppt
34 ppt
ANALYSIS OF TWO-CONTROL TOXICITY
TESTS WHEN SOME CONCENTRATIONS
WERE MANIPULATED
Both Controls
Valid?
No
IWC Treated
Control Valid?
No
Yes
Analyze IWC and Like
Treated Concs. and
Control Using
EPA Flowcharts
Repeat Test
Yes
Control t-Test
Non-Significant?
Yes
No
Pool Controls
and Analyze All Data
Using EPA Flowcharts
Analyze IWC and Like
Treated Concs. and
Control Using
EPA Flowcharts
WHEN ARE MULTIPLE
CONTROLS USED ?
Example #2
• To compare “standard” and “alternative”
methods.
• To determine treatment effects.
40
35
60
E C
5
100
E C
70
50
30
E C
10
20
50
30
15
60
40
40
30
20
10
20
10
0
F re s h
S to re d
0
F re s h
120
E C
25
100
80
60
40
20
0
20
10
(S t o re d - F re s h ; p p b C u )
0
S to re d
0
F re s h
S to re d
F re s h
S to re d
S to re d
10
0
-1 0
-2 0
*
-3 0
*
-4 0
*
-5 0
-6 0
*
-7 0
F re s h
15
80
60
40
25
5
C o p p e r C o n c e n tr a t io n ( p p b )
80
E C
1
C h a n g e in E C V a lu e s
C o p p e r C o n c e n tr a t io n ( p p b )
EFFECT OF KELP STORAGE ON
SENSITIVITY TO COPPER
1
5
10
15
E ffe c t L e v e l
25
WHEN ARE MULTIPLE
CONTROLS USED?
Example #3
• When testing control and/or reference
samples in which the quality is unknown.
- Use of a reference not previously
tested (ambient).
- Quality of reference may vary from
season to season (ambient).
- When the potential exists for a
sample to be impacted or impaired.
EFFECT OF A NON-POINT
DISCHARGE ON AN INSTREAM
DILUTION WATER
C. dubia Control Survival
120
Percent Survival
Lab Control
Upstream
100
80
60
40
20
0
Apr-96
May-96
Jun-96
Test Date
Aug-96
Dec-96
WHEN ARE MULTIPLE
CONTROLS USED ?
Example #4
• When a sample used for toxicity
testing possess physico-chemical
properties significantly different from
water in which surrogate test
organisms were cultured
- As a natural phenomenon
- Due to sample manipulation
WHEN ARE MULTIPLE
CONTROLS USED ?
Example #5
• TIEs - Toxicity Identification Evaluations.
- Methods require the use of multiple
controls called “blanks” which are exact
manipulations on the dilution water.
TAKE HOME POINTS
• Multiple negative controls are a good idea if:
- New reference or control sample.
- Performing any sample manipulations.
- Comparing “standard” vs. “alternative”
methods. Multiple Positive Controls (e.g. Ref
Tox tests) should be used in this situation
- Using multiple organisms with different
sensitivities.
REFERENCES:
• Short-Term Methods For Estimating The Chronic Toxicity Of
Effluents And Receiving Water To Freshwater Organisms.
EPA-600-4-91-002. July, 1994.
• Methods for Measuring the Acute Toxicity of Effluents and
Receiving Waters to Freshwater and Marine Organisms.
EPA/600/4-90/027F. August, 1993.
- Have recommendations for multiple controls under certain
conditions.
• Methods for Aquatic Toxicology Identification Evaluations. Phase I
Toxicity Characterization Procedures. EPA/600/6-91/003.
February, 1991.
- Has recommendations for multiple controls “blanks”.
• Whole Effluent Toxicity Testing: An Evaluation of Methods and
Prediction of Receiving Water System Impacts. Grothe et al..
1996.
SUSPICIOUS DATA
AND OUTLIER
DETECTION
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
CONCERNS
• Outliers make interpretation of WET
data difficult by
– Increasing the variability in test
responses
– Biasing mean responses
IDENTIFYING OUTLIERS
Raw Data and Means
Proportion Alive
0.8
0.6
0.4
0.2
0.0
0
100
200
300
400
Copper Concentration (ppb)
Residual (predicted - observed)
• Graph raw
data, means
and residuals
1.0
Residuals
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
0
100
200
300
400
Copper Concentration (ppb)
IDENTIFYING OUTLIERS
• Formal statistical test - Chauvenet’s Criterion
– Using the previous mysid data, the critical values are:
• Mean = .80, Std. Dev. = 0.302, n = 8
– Chauvenet’s Criterion Value = n/2 = 4
– Z-score = 2.054 (two-tailed probability of
n/2 = 4 %)
– The calculations are:
• Equation 1) (Z-score)(Std. Dev.) = (2.054)(0.302) = 0.620
• Mean  Equation 1 = 0.80  0.620 = 1.42 - 0.18
• Outlier Range is >1.42 or <0.18
– A value of 0.2 is not an outlier.
CAN A CAUSE BE ASSIGNED TO
THE OUTLIER(S) ?
•
•
•
•
•
Review analyst’s daily observations
Check water chemistry data
Check data entry
Check calculations
If cause can be assigned to outlier,
then reanalyze data without outlier
DETERMINE EFFECT ON
TEST INTERPRETATION
• Keep all data unless cause is found
• Analyze data with and without suspect
data
• Determine effect of suspect data on test
interpretation
• Results reported will depend on effect of
outlier(s) on test interpretation, best
professional judgement, and discussions
with regulatory agency
REPORTING OF RESULTS
• Insignificant Effect
– With Outlier
• IC25 = 131 (96.9-158) ppb
• NOEC = 100 ppb
• % MSD = 28.1 %
– Without Outlier
• IC25 = 124 (93.6-152) ppb
• NOEC = 100 ppb
• % MSD = 20.9 %
• Report results with
suspect data
included
• Significant Effect
– With Outlier
• IC25 = 131 (96.9-158) ppb
• NOEC = 100 ppb
• % MSD = 28.1 %
– Without Outlier
• IC25 = 106 (83.8-126) ppb
• NOEC = 50 ppb
• % MSD = 12.2 %
• Report results from
both analyses
CONCENTRATION RESPONSE CURVES
IN WET TESTS
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
NON-MONOTONICITY
vs. HORMESIS
• Hormesis is a toxicological response to a
single toxicant characterized by lowconcentration stimulation but is inhibitory
at higher doses.
• Non-monotonicity is a relationship where
a smaller response (e.g. mortality) is
observed at the higher of two
consecutive concentrations.
TYPICAL TRAITS OF
HORMESIS
Max. Stimulation (30-60%)
Response
• Calabrese and Baldwin,
1998
• Hormetic - concentration
range
• Magnitude of hormetic
stimulation
• Range from maximum
stimulation to NOEL
(NOEC)
Hormetic Range (10 x)
Max. Stimulation
to NOEL Range
(4-5 x)
Concentration
NOEL
WHY IS HORMESIS DIFFICULT
TO DETECT IN TOXICITY TESTS?
Response
Well Defined Hormetic Response
100
1000
Concentration
Poorly Defined "Hormetic" Response
Response
• Inadequate
concentration series
• Inadequate description
of concentration response
• Inadequate statistical
power
• Hormesis is not the
cause
100
Concentration
EFFECTS OF NON-MONOTONIC
DATA
NOEC >LOEC
100
Percent Fertilized
• Limited replicates (4)
• High control & low
concentration
variability
• High Statistical Power
• NOEC > LOEC
Sea Urchin Fertilization Data
95
Statistically Significant Reduction
90
85
80
75
NOEC = 6.0 %
LOEC = 0.36 %
% MSD = 5.82 %
IC25 = > 6.0 %
70
0
1
2
3
4
Percent Effluent
5
6
EFFECTS OF NONMONOTONIC DATA
HETEROGENEITY IN PROBIT ANALYSIS
Significant Chi-Square for
Heterogeneity
Response
• Limited replicates (5)
• High control & low
concentration variability
• Significant chi-square
• Inflated confidence
intervals
• Reanalyze with nonparametric models
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1
10
100
Dose ppb
1000
10000
EFFECTS OF NON-MONOTONIC
DATA SMOOTHING IN ICp
ANALYSIS
Selenastrum Cell Growth Data
250
Actual Response
Smoothed Response
225
200
Response (% of Control)
• Smoothing is used in
all non-parametric
models.
• Smoothing procedure
averages treatment
responses
• Increases estimated
toxicity
175
150
125
100
75
50
25
0
0
20
40
60
Percent Effluent
80
100
REMEDIES FOR PROBLEMS
ASSOCIATED WITH NONMONOTONIC DATA
•
•
•
•
Better concentration series selection
Increase number of replicates
% MSD limits (NOEC’s)
Use of more robust parametric models
Bailer and Oris, 1997
Kerr and Meador, 1996
Baird et al., 1996
• Concentration-response curve criterion
CONFIRMATION OF A
CONCENTRATION-RESPONSE
CURVE
• Graphical
• Linear regression Analysis
• Correlation Analysis
GRAPHIC ANALYSIS OF
CONCENTRATION-RESPONSE
CURVES
Concentration-Response Curve Absent
Concentration-Response Curve Present
80
80
70
60
Response (% Effect)
Response (% Effect)
60
50
40
30
20
50
40
30
20
10
10
0
0
-10
-10
0
1
2
3
4
5
Concentration (% Effluent)
6
% MSD
Raw Data
Mean
70
% MSD
Raw Data
Mean
0
1
2
3
4
5
Concentration (% Effluent)
6
GRAPHIC ANALYSIS OF
CONCENTRATION-RESPONSE
CURVES
Response (% Effect)
Concentration-Response Curve Present ???
80
70
60
50
40
30
20
10
0
-10
Raw Data
Mean
% MSD
0
1
2
3
4
Concentration
(% Effluent)
5
6
LINEAR REGRESSION ANALYSIS
OF CONCENTRATIONRESPONSE CURVES
Concentration-Response Curve Absent
80
80
Negative Slope Not Sig. Dif. from Zero
Raw Data
Mean
Probit
% MSD
70
50
40
30
20
50
40
30
20
10
0
0
-10
-10
1
2
3
4
5
Concentration (% Effluent)
6
Raw Data
Mean
Probit
% MSD
60
10
0
Positive Slope and Sig. Dif. than Zero
70
Response (% Effect)
60
Response (% Effect)
Concentration-Response Curve Present
0
1
2
3
4
5
Concentration (% Effluent)
6
LINEAR REGRESSION ANALYSIS
OF CONCENTRATIONRESPONSE CURVES
Response (% Effect)
Concentration-Response Curve Present ???
80
70
60
50
40
30
20
10
0
-10
Positive Slope Not Sig. Dif. from Zero
Raw Data
Mean
Probit
% MSD
0
1
2
3
4
Concentration
(% Effluent)
5
6
CORRELATION ANALYSIS OF
CONCENTRATION-RESPONSE
CURVES
Concentration-Response Curve Present
Concentration-Response Curve Absent
80
60
Significant Negative Correlation
(r = -0.965, P = 0.000)
70
% MSD
Raw Data
Mean
50
40
30
20
% MSD
Raw Data
Mean
60
Response (% Effect)
70
Response (% Effect)
80
Insignificant Correlation
(r = -0.0931, P = 0.593)
50
40
30
20
10
10
0
0
-10
0
1
2
3
4
5
Concentration (% Effluent)
6
0
1
2
3
4
5
Concentration (% Effluent)
6
CORRELATION ANALYSIS OF
CONCENTRATION-RESPONSE
CURVES
Response (% Effect)
Concentration-Response Curve Present ???
80
70
60
50
40
30
20
10
0
-10
Significant Negative Correlation
(r = -0.389, P = 0.021)
Raw Data
Mean
% MSD
0
1
2
3
4
Concentration
(% Effluent)
5
6
SUMMARY
• Identification of a significant C-R curve is
an important QA check.
• Graphical analysis is simple but subjective
• Linear regression analysis is objective and
conservative but requires parametric
analysis.
• Correlation analysis is objective and liberal
and non-parametric methods are available.
BIOLOGICAL
INTEFERENCE IN
FATHEAD CHRONIC
TESTS
TOXICITY CHARACTERISTICS
• Seasonal (cold months)
• Affects only fathead minnows
• High variability
• Poor dose response
• Fungus-like growth
Normal Gills
and Pharynx
Bacterial
Clogging
% Survival on Day of Test
Rep
3
4
7
1
100
13
0
2
100
25
0
3
100
100
100
4
100
88
88
5
100
50
13
STERILIZATION
Autoclaved
100
80
60
40
20
0
25%
50%
100%
% Survival
% Survival
UV LIGHT
100
80
60
40
20
0
100%
Untrt
Untrt
PASTEURIZE
25%
50%
100%
Antibiotic
% Survival
100
80
60
40
20
0
Untrt
Autoclaved
UV
ANTIBIOTIC
% Survival
50
100
80
60
40
20
0
25%
50%
100%
Untrt
Pasteur
ANTIBIOTIC ADDITION
100
Rec control
% Survival
80
32%
60
42%
40
56%
20
75%
100%
0
Baseline
Diluent Only
ANTIBIOTIC ADDITION
100
Rec cont
% Survival
80
32%
60
42%
40
56%
20
75%
0
100%
Baseline
Diluent + Effluent
% Alive Since Previous
Day
EFFECT OF ISOLATION
100
80
Sick Fish
Removed
60
40
Dead Fish
Removed
20
0
1
2
3
4
Day of Test
5
6
CONCLUSION
• “Toxicity” due to a naturally occurring
pathogen
• Best viewed as a kind of interference
CONTROLLING BIOLOGICAL
INTERFERENCE
• Heat
• Filtration (0.2 uM)
• UV light
• Antibiotics
HEAT
Advantages:
• Simple, no specialized equipment
Disadvantages:
• May be more “intrusive” (e.g. removal
of volatile components
• Must re-aerate sample
FILTRATION (0.2 UM)
Advantages:
• Usually very effective
Disadvantages:
• Impractical with high suspended solids
• Requires specialized equipment for
filtering large volumes
• May remove particulate bound
contaminants
UV LIGHT
Advantages:
• Usually very effective.
• Uses common equipment
Disadvantages:
• Less effective with high suspended
solids or stained water
• May degrade organic contaminants or
enhance organic toxicity (e.g. PAHs)
ANTIBIOTICS
Advantages:
• Usually very effective.
• Chemicals inexpensive and widely
available
• Easy to treat large volumes
Disadvantages:
• May require determination of proper
dose
SUMMARY
• Chronic WET tests using fathead
minnows may show evidence of
interference due to pathogens.
• Interference = high variability, poorly
defined dose response
• Most common with surface waters
• Control measures = sample treatment
to kill or remove pathogens.
STATISTICAL
AND BIOLOGICAL
SIGNIFICANCE
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
TOXIC VS. NON-TOXIC
• WET Tests Developed to Identify
Toxic Samples
• Two Methods Used
–Hypothesis testing - Statistical
difference
–Point-estimation - Standard level
of effect
TOXICITY ASSUMPTIONS
OF HYPOTHESIS TESTING
• Non-Toxic = No statistical difference
between control and critical
concentration response
• Toxic = Statistical difference
between control and critical
concentration response
TOXICITY ASSUMPTIONS OF
POINT-ESTIMATION
• A preselected level of effect is considered
toxic
– Acute test:
50 % effect
– Chronic test:
25 % effect
• Toxic = ECx/ICx is less than the critical
concentration (IWC)
• Non-Toxic = ECx/ICx is equal or greater
than the critical concentration (IWC)
BOTH APPROACHES HAVE
STRENGTHS AND LIMITATIONS
• Complete Discussion in:
–Grothe et al. Eds. 1996. Whole
Effluent Toxicity Testing: An
Evaluation of Methods and
prediction of Receiving System
Impacts, SETAC Press, Pensacola,
FL, USA.
STRENGTHS AND LIMITATIONS
OF HYPOTHESIS TESTS
• Strengths
– Suited for
comparison of
treatments
– Simple to calculate
(no modeling)
– Not model
dependent
• Limitations
– NOEC is concentration
dependent
– Variability reduces
statistical power and
increases significant effect
– No confidence intervals
– Results are independent
of concentration-response
curve
STRENGTHS AND LIMITATIONS
OF POINT ESTIMATES
• Strengths
– Uses concentrationresponse curve
– Not limited to tested
concentrations
– Confidence intervals
• Limitations
– Selection of effect
level
– Partial responses
increase accuracy
– Model dependent
– More difficult
computations
WHICH METHOD IS BEST?
• Both Approaches Are Supported By
The TSD And The Methods Manuals
• Depends On The Purpose Of The
WET Test
–Hypothesis test - Identify statistical
difference from control response
–Point-estimate - Concentration
which shows a standard effect
TOXIC MAY NOT =
ECOLOGICAL IMPACT
• Hypersensitive Hypothesis Tests
• Relatively Sensitive Test Species
• Inconsistent Exposure Parameters Between
the Toxicity Test and Receiving Water
– Magnitude, duration, frequency of exposure
– Water chemistry
• Population/Community Structure Dynamics
NONTOXIC MAY NOT =
NO ECOLOGICAL IMPACT
•
•
•
•
Hyposensitive Hypothesis Tests
High Effect Level In Point-Estimates
Relatively Insensitive Test Species
Inconsistent Exposure Parameters Between
the Toxicity Test and Receiving Water
– Magnitude, duration, frequency of exposure
– Water chemistry
• Undetected Biological Effects
• Population/Community Structure Dynamics
WHAT CONCLUSIONS CAN BE
MADE?
• The Sample Is Toxic/Non-Toxic As
Defined By The WET Program
• The Biological Impact Was
Significant/Insignificant In The Beaker
• The Receiving Water May or May not
Become Impacted
WAYS TO INCREASE THE
ECOLOGICAL RELEVANCE
• Identification of Toxic Agent(s)
• Consider the Use Of Indigenous Species In Toxicity Tests
• Consider Exposure Parameters Found In Receiving Water
– Magnitude, duration, frequency of exposure
– Water chemistry
– Ambient water tests
• In Situ Bioassays
• Detection and Study Of Other Biological Effects
• Comprehensive Study Of Population/Community Structure
Dynamics In Receiving Water
• Further Studies In A Variety Of Ecosystems Which Examine
The Relationship Between WET Tests And Ecological Impact.
COST OF “ECOLOGICALLY
RELEVANT” WET TESTS
• Very Expensive
–Methods Research and Development
–Receiving water characterization
–Field bioassessments
• Loss Of Comparability
• Increase In Complexity Of Water Quality
Standards and Interpretation
SUMMARY
• WET Tests Were Developed To
Identify Toxic and Nontoxic Samples
• WET Tests Are Useful In Conjunction
With Chemical And Field Assessment
Data To Protect Aquatic Ecosystems
• Adaptation Of WET Tests To Be
Ecologically Relevant Can Be Helpful
But Comes At A Cost
FALSE POSITIVES
FALSE NEGATIVES
GUIDING PRINCIPLE
= REPEATABILITY
Repeatable test results are taken
as “true” or “real” or “correct”.
FALSE POSITIVES/NEGATIVES IN
CONTEXT OF WET TESTS
Depends on presumed function of WET tests:
•
WET Test as “predictor” of instream effects.
•
WET Test as “detector” of toxic amounts of
toxic chemicals
WET TEST AS “PREDICTOR”
OF INSTREAM EFFECTS.
• False Positive = false indication of
instream effects
• False Negative = failure to indicate
instream effects
WET TEST AS “DETECTOR” OF
TOXIC AMOUNTS OF TOXIC
CHEMICALS
• False Positive = false indication of
presence of toxic amounts of toxic
chemicals
• False Negative = failure to indicate
presence of toxic amounts of toxic
chemicals
WHAT IS “TOXICITY”?
• Statistically significant difference
between effluent concentration and
control
• An LC50 or other point estimate that
is less than some predetermined
value
The operational definition of toxicity is
often statistical
TOXICITY AS A STATISTICAL
CONCEPT
• False Positive = Statistically
significant effect that is not “Real”
(spurious, artifactual).
• False Negative = Effect that should
be observed but is not.
THERE ARE REASONS WHY
STATISTICALLY SIGNIFICANT
RESULTS HAPPEN
At most, 4 things are present in a test
beaker:
Diluent
Sample
Organism(s)
Food
TOXICITY NOT DUE TO SAMPLE
• Technician error
• Bias in test chamber location or in
assigning organisms to treatments.
• Statistical sampling error (Type 1 error)
• Other
TECHNICIAN ERROR
• Expertise
• Experience
BIAS IN ORGANISM/CHAMBER
ASSIGNMENT
• Bias in organism assignment is a tendency to
assign healthier or less healthy organisms to
certain test concentrations
• Systematic arrangement of test chambers can
result in systematic bias in organism response
(e.g. Selenastrum algal growth test)
• Can be eliminated through proper
randomization.(See Davis, et al, 1998)
STATISTICAL OUTCOMES
Types of Errors in Hypothesis
Testing
If Ho is True
If Ho is False
If Ho is
rejected
Type I error
No error
If Ho is not
rejected
No error
Type II error
HYPOTHESIS TESTING FACTS
• NOECs are not point estimates
• Cannot calculate coefficients of variation or
confidence intervals
• NOEC is a lower concentration level than
the LOEC when the dose response curve
is smooth
• LOEC may represent a different amount of
effect from test to test
 = 0.05
= Type 1 Error
o
msd
Null Hypothesis is TRUE
 = 0.05
Power = 0.8
 = 0.2
= Type 2 Error
o
msd a
Null Hypothesis is FALSE
STATISTICAL SAMPLING ERROR
• Type 1 error.
• Should be rare (P < alpha)
• Not repeatable
• Can be reduced by decreasing
alpha but at cost of increasing
Type 2 error (False Negatives)
“UNINTERESTING” TOXICITY
Toxic response due to a sample that
deviates from culture conditions but is
still within standard test conditions.
E.g. The toxic response is due to a
slight difference in pH (0.2 units).
FALSE NEGATIVE: FAILURE OF
THE TEST SYSTEM TO INDICATE
TOXICITY
• Operator error
• Bias
• Type 2 error
• Intrinsically variable data
• Interference
CONCLUSIONS
False +/- are “wrong” answers.
• In the absence of technician error, biased test
design and biased sampling, the False +/rate = Type I and II error rate, respectively.
• Repeatable results, in the absence of
technician error and biased sampling, cannot
be False +/-’s.
• An estimate of the False + rate could be
obtained through testing of blanks.
INTRA- AND INTERTEST VARIABILITY
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
TYPES OF VARIABILITY
• Variability inherent in any analytical
procedure
• Intra-test : among and between
concentrations
• Intra-lab: within one lab, same method
• Inter-lab: between labs, same method
• Method specific: within limits of method
–organism age, length of test, dilution
water, food type, etc.
INTRA-TEST VARIABILITY
Group
N
Mean
Survival
s.d.
CV
(%)
control
4
0.975
0.050
5.1
2
4
0.975
0.050
5.1
3
4
0.975
0.050
5.1
4
4
0.950
0.058
6.1
5*
4
0.675
0.150
22.2
6*
4
0.275
0.222
80.6
MSE = 0.033
MSD = 13.9 %
INTRA-TEST VARIABILITY AND
ENDPOINT UNCERTAINTY
EC
Conc.
1
220
10
332
196
422
0.68
50
553
440
670
0.41
90
919
744
1416
0.73
99
1392
1024
2906
1.35
Lower
Upper
95% CL 95% CL
95
310
Conf.
Int/EC
0.98
POINT ESTIMATE INTRA-LAB VARIABILITY
13
11
10
LC50
95% UCI
95% LCI
mean LC50
9
8
7
6
Tests
19
17
15
13
11
9
7
5
3
5
1
LC50 (mg/l SDS)
12
HYPOTHESIS TESTS INTRA-LAB
VARIABILITY
Horizontal lines = acceptance limits for two dilution series
(red dotted = 0.5; blue dashed = 0.75)
NOEC (ppb Cu)
300
250
200
150
100
50
0
0
1
2
3
4
5
Test #
6
7
8
9
10
SOURCES OF INTRA-TEST
VARIABILITY
• Genetic variability
• Organism handling and feeding
• Toxicity among and between
treatments
• Non-homogeneous sample source
• Sample toxicity
SOURCES OF INTRA-TEST
VARIABILITY
• Abiotic conditions
• Dilution scheme
• Number of organisms/treatment
• Dilution water pathogens
• Randomization important!
SOURCES OF INTRA-LAB
VARIABILITY
•
•
•
•
•
•
Intra-test sources
Analyst experience and practice
Organism age and health
Acclimation
Dilution water
Type of sample
SOURCES OF INTRA-LAB
VARIABILITY
•
•
•
•
Sample quality
Test chamber characteristics
Organisms/source
food type/rate/source
SOURCES OF INTRA-LAB
VARIABILITY
• Replicate volume
• Test duration
• Procedures
SOURCES OF INTER-LAB
VARIABILITY
• All of previous are important
• Differences allowed in methods - Could
be significant between labs
• Differences in protocols - State, federal,
local, etc. Use promulgated standard
• ANALYST EXPERIENCE
VARIABILITY AND POINT
ESTIMATE UNCERTAINTY
Test #1
Test #2
Mean CV (%)
9.9
33.8
IC25 (%)
27.2
26.0
MSE
34.5
290.6
95% CI
25.7-28.5 17.2-31.3
HYPOTHESIS TESTS
HIGH VARIABILITY - LOW
STATISTICAL POWER
Group
n
Control
4
2
4
727
674
92.7
3
4
1080
408
37.7
4
4
564
493
87.5
5
4
748
235
31.4
MSD = 131 %
Mean wt s.d.
(ug/ind)
632
552
CV%
87.4
HYPOTHESIS TESTS LOW
VARIABILITY - HIGH
STATISTICAL POWER
Group
n
Control 4
Mean wt
mg/survivor
0.30
s.d.
CV
0.012 4.0%
10%
4
0.30
0.013 4.3%
18%
4
0.31
0.008 2.6%
32%
4
0.30
0.010 3.3%
56%
4
0.27*
0.013 4.8%
100%
4
0.27*
0.013 4.8%
MSD = 6.5 %
ACTIONS TO REDUCE
VARIABILITY
• Establish performance criteria
• QA program
• Establish and follow strict procedures
• MAXIMIZE ANALYST SKILL
• Contract lab selection
• Additional QA/QC criteria
WHY DETERMINE METHOD
VARIABILITY AND WHY
CONTROL VARIABILITY?
• If inherent variability of each method is
known there will be less chance of
making errors concerning toxicity.
• Variability too high - not detect toxicity
when present. Variability too low - might
detect toxicity when it is not there.
• At present there is little incentive to
reduce variability.
EXAMPLES OF ADDITIONAL QC
TEST CRITERIA
• EPA Region IX: upper MSD limits
• Washington: upper MSD limits, change in

• N. Carolina: limit control CVs, C. dubia
“Practical Sensitivity Criteria”
• EPA Region VI: limit control CV,
increase number replicates,biological
significance
THE CHRONIC TEST
GROWTH ENDPOINT
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
CHANGE IN GROWTH
ENDPOINT CALCULATION
Pre-Nov., 1995 Approach
Growth = D.W. surviving organisms
# surviving organisms
Post-Nov., 1995 Approach
Growth = D.W. surviving organisms
# initial organisms
EFFECT ON MEAN
TREATMENT RESPONSES
%
Before
After
Treatment Mortality Promulgation Promulgation
Control
5.1
325
308
2
2.6
353
341
3
5.0
345
329
4
17.9
387
306
5
47.5
319
167
INTRA-TREATMENT VARIABILITY
AND WEIGHT CALCULATIONS
35
CV (%)
30
25
20
After
Before
15
10
5
1
3
5
7
9
11 13 15 17
Observations
19
21
23
OLD MSE/NEW MSE RATIO
1.6
1.4
1.2
1
Ref. Tox.
Effluent
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
Tests
7
8
9
10
EFFECTS ON HYPOTHESIS
TEST ENDPOINTS
Test #
Before
After
Promulgation
Promulgation
%MSD NOEC %MSD NOEC
1
16.4
50
16.7
50
2
10.8
10
29.1
10
3
11.9
5
39.0
5
4
19.7
25
18.5
25
EFFECTS ON HYPOTHESIS
TEST ENDPOINTS
Tes
t#
1
2
3
4
Before
Promulgation
%M NO Avg.
SD EC wgt.
at
NO
EC
20.9 100 296
19.5 100 268
22.1 100 254
21.4 100 387
After
Promulgation
%M NO Avg.
SD EC wgt.
at
NO
EC
23.4 100 296
25.1 100 233
24.1 100 227
22.8 100 313
EFFECTS ON POINT
ESTIMATE ENDPOINTS
Test #
Before
Promulgation
IC25 95%CI
After
Promulgation
IC25 95%CI
1
56.2
45.4-79.3
48.3
43.3-61.9
2
NC
NC
12.4
6.4-13.8
3
NC
NC
4.2
1.5-7.3
4
33.7
28.2-40.6
30.0
19.4-35.0
EFFECTS ON POINT
ESTIMATE ENDPOINTS
Test #
Before
Promulgation
IC25 95%CI
After
Promulgation
IC25 95%CI
1
291
NC
234
191-262
2
386
NC
176
140-256
3
227
179-258
138
111-155
4
>400
NC
144
104-162
NOEC/IC25 RELATIONSHIP
Test #
Test
Type
NOEC
IC25
Before
IC25
After
1
Effluent
50%
56.2
48.3
2
Effluent
25%
33.7
30.0
3
Ref. Tox. 100 ppb
291
234
4
Ref. Tox. 100 ppb
386
176
5
Ref. Tox. 100 ppb
227
138
6
Ref. Tox. 100 ppb
>400
144
IMPACT ON TEST
INTERPRETATION
• Hypothesis Test Results - most cases
show little change, but not always
• Point Estimate Results - usually
increases predicted toxicity
ISSUES RELATED TO
CHANGE IN APPROACH
• Test growth or biomass?
• Accurate representation of growth?
• Correlation between new results
and instream responses?
ISSUES RELATED TO
CHANGE IN APPROACH
• Conflict between new results and
unchanged effluent quality?
• Effect on reference toxicant control
charts
• Relationship between NOEC and
IC25
AGE-RELATED
SENSITIVITY OF FISH
IN ACUTE WET TESTS
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
REVISIONS TO FISH AGES IN
EPA ACUTE TEST MANUALS
• From: 1-90 days old in the 3rd
edition of the acute manual (1985;
EPA/600/4-85/013)
• To: 1-14 days old (or 9-14 days old
for silversides) in the 4th edition of
the acute manual (1993;
EPA/600/4-90/027F)
COMMONLY USED TEST
SPECIES
• Fathead minnows
• Sheepshead minnows
• Silversides (inland, atlantic, and
tidewater)
RATIONALE
• Younger life stage is generally
more sensitive than older life stage
• Reduction in range of acceptable
ages from 1-90 to 1-14 days will
reduce variability
CONCERN
• Use of younger fish in NPDES
testing may show an increase in
apparent toxicity, without any
changes in effluent conditions
COMMON QUESTIONS
• Are <14-day old fish more sensitive than
<90-day old fish to toxicants?
• Does the use of <14-day old fish reduce
intertest variability when compared to
<90 day-old fish?
• How does the sensitivity and precision
vary within the 1 to 14 day old age
range?
SENSITIVITY OF 14, 30, AND 90
DAY-OLD FATHEAD MINNOWS
Copper
Unionized Ammonia
1200
1.50
C
1000
800
B
600
400
A
200
0
Mean 96 hr LC50 (ppm)
Mean 96 hr LC50 (ppb)
B
1.25
A
1.00
A
0.75
0.50
0.25
0.00
14
30
Age (days)
90
14
30
Age (days)
90
INTER-TEST PRECISION OF 14,
30, AND 90-DAY OLD
FATHEAD MINNOWS
Copper
Unionized Ammonia
0.25
Coefficient of Variation
Coefficient of Variation
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.20
0.15
0.10
0.05
0.00
14
30
Age (days)
90
14
30
Age (days)
90
SENSITIVITY OF 1-14 DAY-OLD
FATHEAD MINNOWS
A
300
B
200
B
100
B
0
1
Mean 48 hr LC50 (ppm)
B
7
6
5
4
3
2
1
0
4
7
10
250
A
200
150
B
100
B
B
B
7
10
14
50
0
14
1
4
Age (days)
Age (days)
SDS
Unionized Ammonia
A
B
1
Mean 48 hr LC50 (ppm)
400
Hexavalent Chromium
4
B
7
B
10
Age (days)
B
14
Mean 48 hr LC50 (ppm)
Mean 48 hr LC50 (ppb)
Sodium Pentachlorophenol
2.5
2.0
A
1.5
A
1.0
A
A
A
7
10
14
0.5
0.0
1
4
Age (days)
INTER-TEST PRECISION OF 1-14
DAY-OLD FATHEAD MINNOWS
Coefficient of Variation
0.6
NaPCP
Cr+6
SDS
NH3
0.5
0.4
0.3
0.2
0.1
0.0
1 - 14
4 - 14
7 - 14
Age Range (days)
10 - 14
SUMMARY
• 14-day old fathead minnow larvae
are more sensitive to copper &
ammonia than 90 day- old fish.
• The inter-test precision of 90 day
old fish is equal or better than 14
day-old fish for copper & ammonia.
SUMMARY - Cont.
• Within the 1-14 day age range, 1 day-old
larvae are less sensitive to several
toxicants.
• The sensitivity of these toxicants
becomes constant after 4-7 days of age.
• Maximum inter-test precision for these
toxicants is observed when the age
range is limited to 7 -14 day old larvae.
REASONABLE
POTENTIAL
AND
TOXICITY TEST
DESIGN
RP DETERMINATION
DEFINITION
• “to determine whether the
discharge causes, has the
reasonable potential to cause, or
contributes to an excursion of
numeric or narrative water
quality criteria” (TSD, 1991)
REASONABLE
POTENTIAL
• 40 CFR 122.44(d)(1) requires that the RP
procedure address the following:
– effluent variability
– existing controls on all pollution sources
– available dilution
– species sensitivity
• WERF POTW survey found that RP is not
consistent among regulatory agencies
REASONABLE POTENTIAL
EXAMPLES
• Virginia definition is that 75% of
tests must meet decision criterion
• Region IX uses a statistical
approach adopted from the TSD
• Some states do not issue limits
• Some states issue limits to all
major dischargers
VARIABILITY AND RP
• Primarily an inter-test issue
– effluent variability
– method variability
• How is it determined?
– Assumptions
• TSD
• Similar facilities
– Collecting sufficient data
• Monthly?, Quarterly?, Annually?
VARIABILITY
ASSUMPTION ISSUES
• TSD assumption (CV=0.6) may not be
accurate
• May take advantage of data for similar
facilities, reduces some uncertainty
• Actual data always best - greater certainty
in decision to issue limit
• Reduce potential for erroneous
conclusions based on a few data points
95%1 WLA
95%2
HOW TO ADDRESS VARIABILITY
THROUGH TEST DESIGN
• Consistency between tests:
– dilution schemes
– dilution water type and characteristics
– test vessel dimensions and material
– test replicate volume
– increase sample size per rep. or conc.
– test organism age (acute tests)
– species sensitivity affects variability
SPECIES SENSITIVITY AND RP
• Two Components:
– Representative of condition to be
protected?
– Magnitude of toxicity
• Both components affected by:
– species
– age of life stage
– dilution water quality
– test type (static, renewal, flow-through)
– culturing/handling of organisms
SPECIES SENSITIVITY AND
REPRESENTATION OF TOXICITY
• Important that tests be reliable
indicators of toxicity, dependent
on some test design parameters:
–pH
–hardness
–alkalinity
–treatment renewals
TEST AND INSTREAM HARDNESS
• C. dubia sensitive to hardness
• C. dubia acclimated and tested at 120 ppm
hardness
• Instream and effluent hardness is 300 ppm
• Test result due to effluent or sensitivity to
hardness?
• Solution: test different organism or C.
dubia cultured at higher hardness
SPECIES SENSITIVITY, TOXICITY,
ORGANISM AGE & RP
• Flexibility in organism age tested
–acute: significant
–chronic: minimal
• Data indicates that age affects
sensitivity
SPECIES SENSITIVITY, TOXICITY,
DILUTION WATER QUALITY & RP
• Example: pH
• If ammonia is present, and pH artificially rises
in test beyond that in real world, ammonia may
contribute to toxicity and affect results used to
determine RP
• Solution: control pH in tests at levels
occurring at the condition of interest (IWC,
100% discharge, etc.) using direct control (CO2
headspace) or flow-through testing
DILUTION & RP
• EPA’s RP approach compares data
distribution to WLA
• If WLA predicted to be exceeded by a
specific percentile of the distribution,
then RP exists
• WLA consists of numeric standard
and dilution
Ceriodaphnia sp.
Relative
Frequency
CV = 1.06
Long - Term Average
WLA1
95th %
WLA2
Chronic Toxic Units
ADDRESSING DILUTION & RP
IN TEST DESIGN
• Center test dilutions on respective effluent
concentrations of concern
• Test dilutions below and above
• Avoid testing concentrations/conditions
which are unlikely to naturally occur
• Maximize dilution factor with intra-test and
inter-test uncertainty in mind
CHOOSING TEST
DILUTIONS
• Example:
– Chronic IWC = 25%
– Dilutions of 23%, 24%, 25%, 26% and 27% may
miss toxicity at 28% which is well within
uncertainty of most chronic endpoints and may
result in a false negative indication of toxicity
– If dilutions are 6.25%, 12.5%, 25%, 50% and
100%, there is little environmental relevance to
results at concentration 4x the IWC
– Choose something in between, like 12%, 17%,
25%, 35% and 50% (dilution factor 0.7)
RP TEST DESIGN
SUMMARY
• Minimize inter-test method
variability
• Insure representative test results
through control of parameters not
limited by methods
• Account for dilution in tests
• Balance maximum dilution factors
in tests with endpoint uncertainty
MOST SENSITIVE
SPECIES SELECTION
SETAC Expert Advisory Panel
Performance Evaluation and
Data Interpretation
MOST SENSITIVE SPECIES
(MSS) DETERMINATION
• Purpose
–To determine which test species is
“most sensitive” to an effluent source
or ambient water
• Desired Toxicity Information from MSS
–Variability/Seasonality
–Magnitude or frequency of “sensitive”
response
COMMON CONSIDERATIONS
•
•
•
•
•
•
Test Frequency
Species Selection
Dilution Water Type
Sample Type
Concentration Series
Statistical Analysis
FREQUENCY AND TIMING OF
MSS SCREENS
• Balance of Cost and Adequate
Information
• Initial or Reevaluation
• Seasonal or Summary Information
Desired
SELECTION OF TEST SPECIES
• Diversity of Organism Types
– Plant, vertebrate, invertebrate
• Nature of Receiving Water
– Salinity, resident species
• Non-promulgated, Resident Species
• Suspected Toxicant(s)
– USEPA Region 9 & 10 Guidance
Document
SELECTION OF DILUTION
WATER
• Method Defined Synthetic Dilution
Water
• Natural Receiving Waters
• Receiving Water Defined Synthetic
Dilution Water
SELECTION OF SAMPLE
TYPE
• Whole Effluents
• Receiving Water
• Composite or Grab Samples
CONCENTRATION SERIES
SELECTION
• Multiple Concentration Tests
– Preferred experimental design for MSS
screens
– Select concentrations based upon IWC and
elucidation of concentration-response (CR) relationship.
• Single Concentration Tests (Pass/Fail)
– Effective if cost is prohibitive
– Control and IWC
STATISTICAL ANALYSIS AND
INTERPRETATION
• Multiple Biological Endpoints
• Combining Multiple Screen Results
• Statistical Analysis Method
MULTIPLE BIOLOGICAL
ENDPOINT ANALYSIS
K
e
l
p
G
e
r
m
i
n
a
t
i
o
n
a
n
d
G
e
r
m
T
u
b
e
L
e
n
g
t
h
1
0
0
G
e
r
m
i
n
a
t
i
o
n
T
u
b
e
L
e
n
g
t
h
8
0
6
0
EfluentConcetraion(%)
• Evaluate
each
biological
endpoint
• Use most
“toxic”
endpoint
4
0
2
0
0
N
O
E
C
E
C
/
I
C
2
5
S
t
a
t
i
s
t
i
c
a
l
E
n
d
p
o
i
n
t
METHODS OF COMBINING
MSS RESULTS
M
u
l
t
i
p
l
e
M
S
S
S
D
a
t
a
U
s
i
n
g
F
W
C
h
r
o
n
i
c
T
e
s
t
s
• Averaging
C
D
S
C
1
0
0
8
0
EC 25 /IC 25 (%Efluent)
• Proportion
(X times
out of Y
screens)
F
H
*
*
6
0
4
0
*
2
0
0
1
2
3
S
c
r
e
e
n
N
u
m
b
e
r
S
p
e
c
i
e
s
P
r
o
p
o
r
t
i
o
n
(
X
/
Y
) A
v
e
r
a
g
e
F
a
t
h
e
a
d
M
i
n
n
o
w
(
F
H
)
6
7
%
(
2
/
3
)
*
8
7
%
C
e
r
i
o
d
a
p
h
n
i
a
(
C
D
)
3
3
%
(
1
/
3
)
7
0
%
*
S
e
l
e
n
a
s
t
r
u
m
(
S
C
)
0
%
(
0
/
3
)
9
7
%
STATISTICAL ANALYSIS
METHODS FOR MSS SCREENS
• NOEC’s
• Point-estimates
• Probability of effect at critical
concentration (pECC)
NOEC’S
• Experimental Question
Which method/species is most
likely to identify a change from
control response?
ADVANTAGES OF NOEC’S
100
NOEC (% Effluent)
• Common
method
• Integrates
effect and
intratest
variability
80
*
60
40
20
0
FH
CD
Species
SC
• Can not separate
biological effect
and statistical
sensitivity
• Can not average
• NOEC’s may not
be environmentally
relevant
Effluent Concentration (%)
DISADVANTAGES OF NOEC’S
EC/IC25
NOEC
>100
100
>100
80
60
IWC
40
20
0
FH
CD
Species
SC
POINT ESTIMATES
• Experimental Question
Which method/species shows
the specified effect at the lowest
concentration?
ADVANTAGES OF POINT
ESTIMATES
100
*
FH - EC25/IC25 = 70 %
CD - EC25/IC25 = 90 %
SC - EC25/IC25 = > 100 %
90
80
70
Effect (%)
• Evaluates a common
effect level
• Utilizes the entire
concentration-response
curve (parametric
models)
• Can use proportion or
average analysis
60
50
40
30
20
10
0
0
20
40
60
Concentration (%)
80
100
DISADVANTAGES OF POINT
ESTIMATES
*
FH - EC25/IC25 = 70 %
CD - EC25/IC25 = 90 %
SC - EC25/IC25 = > 100 %
100
90
80
70
Effect (%)
• Effect level
selection
• Concentrationresponse required
• Smoothing
• No consideration of
endpoint precision
• EC values may not
be environmentally
relevant
60
IWC
50
40
30
20
10
0
0
20
40
60
Concentration (%)
80
100
PROBABILITY OF EFFECT AT
THE CRITICAL CONCENTRATION
(pECC)
• Experimental Question
At the concentration of
environmental concern, which
method/species had the greatest
effect at the lower 95 % confidence
limit?
ADVANTAGES OF pECC
ECC
pECC
20
*
Effect (%)
• Considers precision
of response
estimate
• Can use proportion
or average analysis
• Environmental
relevance
• No concentrationresponse required
30
10
0
-10
FH
CD
Species
SC
DISADVANTAGES OF pECC
10
5
*
Effect (%)
• Zero replicate
variance
• Boot-strapping
• Obtaining 95%
confidence
intervals at
IWC
0 0
0
-5
-10
ECC
pECC
-15
FH
CD
Species
SC
SUMMARY
• Discuss the MSS procedure in detail during
permit development
• Select variety of organism types
• Initially test for trends in toxicity
• Continue periodic screening
• Select type of statistical analysis carefully
• Make sure that statistical analysis and the
raw results “make sense”
WHOLE
EFFLUENT
TOXICITY TEST
DESIGN
WET TESTING DESIGN
• Important factors
– discharge concentration of concern
– type of statistical analysis
– typical toxicant(s)
– dilution/control water
– receiving water quality
– number of concentrations tested
– stage in testing program (initial, advanced)
DISCHARGE
CONCENTRATION
OF CONCERN (COC)
• Acute
– initial dilution, if allowed, at edge of
acute mixing zone multiplied by 3.3
(TSD, 1991) to convert concentration
at LC1 to concentration at LC50
• Chronic
– dilution available at edge of chronic
mixing zone
TYPES OF WET TESTS
• COC and control
• Multiple concentrations and
control
WET TESTS WITH
MULTIPLE CONCENTRATIONS
• Recommended design for discharge
monitoring
• Usually includes small number of replicates
• Focus more on concentration-response
relationship
• Dilutions center on COC
• EPA recommends dilution factor > 0.5
• Maximize dilution factor with endpoint
uncertainty and inter-test variability in mind
WET TESTING ONLY THE
COC
• Design for ambient and some discharge
monitoring
• Little flexibility in test design
• Increase number of replicates and/or
organisms to increase confidence in results
• Information on concentration/response
relationship not available and not considered
WET TESTS & WATER
QUALITY PARAMETERS
• Important that parameters match
goals of testing, either:
–instream condition of discharge
upon dilution, or
–inherent toxicity of discharge
independent of instream
condition
WET TEST WATER QUALITY
PARAMETERS
• Most common parameters of concern
– hardness
– salinity
– pH
– temperature
– conductivity
• Test design solution: extra controls
EXAMPLE OF ADDITIONAL
CONTROL
TO ADDRESS HARDNESS
• Example goal: test instream condition of
discharge after dilution
• Daphnids cultured at 120 ppm
• Discharge and receiving water are at 300
ppm
• Prepare extra controls at 300 ppm
hardness and compare results with
dilutions tested
WET TEST DESIGN AND
TYPICAL TOXICANTS
• The toxicant(s) suspected determine if and
which test conditions are important
• Good example is ammonia:
– pH affects ammonia toxicity
– pH is not strictly limited by the methods
– pH drift beyond realistic levels may bring
unionized ammonia to unrealistic levels
• Test design solution: use pH control in WET
tests
WET TEST DESIGN &
DILUTION/CONTROL WATER
• Depends on test goals
• Instream mixed discharge condition
– use of water upstream from discharge
preferred
– second choice is water similar to upstream
– as culture and dilution water differ,
acclimation importance prior to testing
increases
WET TESTING FREQUENCY
• Dependent on variability in
condition (instream or discharge)
• As variability increases, frequency
should increase
• Balance variability and frequency
of testing with cost
• Goal is to accurately represent the
condition in question
WET TEST DESIGN & STAGE
OF TESTING
• Species sensitivity varies with
biological endpoints and test
conditions
• Frequency of testing and number
of endpoints tested can decrease
as data set increases
WET TEST DESIGN &
STATISTICS
• Statistical approach used to analyze
results affects test design and
usually is permit-defined
• Point estimates benefit from fewer
replicates but more treatments
• Hypothesis testing benefits from
greater numbers of replicates but
the number of treatments minimally
affects results
WET TEST DESIGN
SUMMARY
• Focus on condition to be tested and question
being asked
• Insure test parameters are representative of
condition being tested
• Testing frequency is driven by temporal
variability in condition
• Design tests to meet requirements of
statistical approaches to be used
Ambient Water
Testing:
Experimental Design
and Data Analysis
SETAC Expert Advisory Panel
Performance Evaluation and Data
Interpretation
AMBIENT TOXICITY TESTING
OBJECTIVES OF AMBIENT
TOXICITY TESTING
• Objectives vary
– General assessment of water quality in
streams, rivers, bays, ocean
• Determine whether water body should
receive more focused assessment
• Assess whether water body or segment
thereof should be placed or taken off of
CWA 303d list of impaired waterways
• Ascertain source of water contamination
OBJECTIVES OF AMBIENT
TOXICITY TESTING - Cont.
• Compare results of effluent toxicity tests
with receiving water tests
• In conjunction with TIEs, and associated
chemical analysis, identify the cause(s) of
contamination
• Assess the success of remediation efforts
• Determine compliance with water quality
standard for toxicity
INFORMATION PROVIDED BY
AMBIENT TOXICITY TESTING
• Toxicity testing procedures with TIEs and chemical
analyses have been used effectively to identify the
chemical causes and sources of water quality
contamination.
• When applied in conjunction with carefully
designed sampling regimes (e.g., site selection
and timing of collection) these procedures can
describe:
–
–
–
–
Magnitude of toxicity
Temporal extent (duration and frequency)
Spatial/geographic distribution
Land use practices responsible for toxicity
STRENGTHS OF SINGLE
SPECIES TESTS
• An integrative measure of aggregate,
additive toxicity
• Provide a direct measure of toxicity and
bioavailablity
• In combination with TIEs, they can identify
chemical cause(s) of toxicity
• Measure toxicological responses to
chemicals for which there are no chemical
specific water quality standards
STRENGTHS OF SINGLE
SPECIES TESTS - Cont.
• Reliable predictors of instream impacts
• Afford reliable, repeatable, and comparable
results compared to other types of biological
and chemical tests
• Furnish an early warning signal so that actions
can be taken to minimize ecosystem impacts
from toxic chemicals
• Can be performed quickly and inexpensively
compared to other biological monitoring
procedures
LIMITATIONS OF SINGLE
SPECIES TESTS
• Do not characterize the persistence/duration
or frequency of exposures in ambient waters
without repeated sampling and testing
• Do not directly measure biotic community
responses
• Do not encompass the range of species,
sensitivities, or functions (endpoints)
responsive to toxic chemicals which occur in
biological communities
LIMITATIONS OF SINGLE
SPECIES TESTS - Cont.
• Do not measure delayed impacts nor effects due to
bioaccumulation or bioconcentration, mutagenicity,
carcinogenicity, teratogenicity, and enrichment.
• Laboratory tests do not reflect the multivariate and
complex exposure conditions which exist in many
aquatic ecosystems
• Results may underestimate biotic community
responses to chemicals because of multiple
stressors acting on aquatic ecosystems
LIMITATIONS OF SINGLE
SPECIES TESTS - Cont.
• Use of surrogate species may not
represent toxicological sensitivities
in some aquatic ecosystems
AMBIENT TESTING METHODS
• Usually U.S. EPA marine or freshwater
methods
• Other (e.g., ASTM) protocols or
indigenous species tests are sometimes
used
DEVIATIONS FROM U.S. EPA
EFFLUENT TESTING
PROCEDURES
• Ambient water testing follows U.S. EPA protocols
for testing effluents with a few exceptions
• A dilution series usually is not included in testing
until TIEs are performed on toxic samples
• Water renewals may be from a single sample
• Number of control replicates may be increased
• Tests are conducted in glass or teflon containers
“TIERED” APPROACH TO
AMBIENT TESTING
• Initial surveys intended to characterize
watershed or waterbody sites over several
years or hydrologic cycles - sampling may be
monthly
• Focused follow-up studies may include:
– Increased number of sites and frequency of
sampling
– TIEs conducted
– Evaluation monitoring to assess toxicity
reduction/remediation efforts
EXPERIMENTAL DESIGN
• Centers around selection of:
– Surface waterbody or segment(s)
thereof to be monitored
– Number and location of sampling sites
– Sample type
– Timing/period and frequency of
sampling
FACTORS TO CONSIDER WHEN
SELECTING SAMPLING SITES
• Significant source of flow or loads into the
watershed?
• Representative type of drainage (agriculture,
urban, mining, etc.)?
• Receives runoff from particular land use?
• Predicted or suspected toxicity?
• “Integrator” site indicative of inputs and/or of
waterway (e.g., near mouth of river)
• Previously identified toxicity?
• Critical or sensitive habitat?
TYPE OF SAMPLE
• Composite collected over various
time periods
• Sub-surface grab sample
SELECTING PERIOD AND
FREQUENCY OF SAMPLING
• Selecting sampling period depends on
objectives of investigation
• Selecting sampling frequency relates
to defining duration and frequency of
toxic events
DATA ANALYSIS
• EPA recommends t-tests to compare
laboratory control to single ambient water
sample
• ANOVA and Dunnett’s multiple
comparison are appropriate for multiple
sites/samples
ECOLOGICAL RELEVANCE
QUESTION
• Are the results of the U.S. EPA
tests, or other single species tests,
reliable predictors of biotic
community responses/impacts?
TWO REVIEWS OF ECOLOGICAL
RELEVANCE ISSUE
• Waller W.T., et. al. 1996.
• de Vlaming V, Norberg-King T.J.
1999.
ENCAPSULATED CONCLUSIONS
OF REVIEWS
• SETAC Panel - “It is unmistakable and clear
that when U.S. EPA toxicity test procedures
are used properly, they are reliable
predictors of environmental impact provided
that the duration and magnitude of exposure
are sufficient to resident biota.” and “a
strong predictive relationship exists between
ambient toxicity and ecological impact.”
ENCAPSULATED CONCLUSIONS
OF REVIEWS - Cont.
• de Vlaming and Norberg-King - The
U.S. EPA, and other single species
toxicity test results are, in a majority
of cases, reliable qualitative
predictors of responses in aquatic
ecosystem populations.
DE VLAMING AND NORBERGKING SUMMARY
• Available literature yields a weight of
evidence demonstration that WET, and
other indicator species, toxicity test results
are reliable qualitative predictors of biotic
responses.
• There are no empirical data which
demonstrate that the indicator species
results consistently fail to provide reliable
predictions of instream biological
responses.
DE VLAMING AND NORBERGKING SUMMARY - Cont.
• When toxicity test results fail to provide a
reliable prediction, they more frequently
underestimate instream biological responses.
• Lab toxicity test results do not tend to
overestimate bioavailability of chemicals.
• Reliability with which toxicity test results predict
instream biological responses increases when
tests are performed on ambient waters and with
magnitude of toxicity.
DE VLAMING AND NORBERGKING SUMMARY - Cont.
• Reliability with which toxicity test results predict
instream biological responses increases with
characterization of persistence and frequency
of toxicity.
• Reliability with which toxicity test results predict
instream biological responses increases with
effective matching (or accounting for) of lab and
field exposures.
TIE/TRE TEST
DESIGN
TIE/TRE GOAL
• To identify, confirm and remove
toxicant(s) in order to bring effluent into
compliance with water quality standards
• Test design is dependent on the phase
of the TIE and the magnitude/variability
of toxicity present
• As toxicity decreases, number of
replicates and identification/confirmation
trials may need to increase
TEST DESIGN AND
PHASE I TIE
• Use species that were used in testing
which suggests toxicity
• Many sample manipulations
• Minimum number of replicates/treatment
• Primarily analyze with hypothesis testing
and BPJ
• Test at 100% concentration or
concentration providing significant
response compared to controls
TEST DESIGN AND PHASE III
TIE
• May use more than one species to compare
sensitivities in supporting hypothesis
• Few sample manipulations
• Number or replicates and treatments similar to
normal tests
• May use hypothesis or point estimate statistical
approaches - depends on permit
• Usually test at multiple concentrations to support
point estimates and to capture concentrationresponse relationships
• Standard QA/QC
OTHER TIE/TRE TEST
DESIGN ISSUES
• Flexibility
• Temporal variability within and
between samples
• Screening
• Dilution water
• Controls for manipulations
• QA/QC
FLEXIBILITY
• Be creative
• Do not be constrained by required methods
• Consider toxicology in test design and
interpretation
– rate of action
– changes with organism age or development
• Consider magnitude of toxicity for chronic
TIEs - can you use acute tests?
REFERENCE TEST APPROACH
FLUORIDE LC50S FOR EFFLUENT
AND LAB WATER
Age
(days) Series #1
Series #2
Series #3
2
7.8, 4.7
7.1, 4.4
8.0, 5.0
4
11.0, 6.8
11.7, 6.8
9.5, 7.3
6
16.3, 9.3
17.6, 8.0
18.6, 9.2
TEST DESIGN & TEMPORAL
VARIABILITY
• Variability can occur within and between
samples, as well as between toxicant(s),
over time
• As toxicity persistence within samples
decreases, may increase requirement for
renewals
• As temporal variability in toxicant identity
and magnitude of toxicity increases, the
number of trials increases
TIE/TRE TEST DESIGN AND
SCREENING
• Only possible if screen can be a reliable
predictor of toxicity in definitive test
• Utility of screens impacted when toxicity is not
persistent
• Good idea when toxicity is unpredictable
between samples - saves resources
• Difficult for chronic TIE/TREs
TIE/TRE TEST DESIGN AND
DILUTION WATER
• Should use same dilution water as that in tests
which originally suggested toxicity
• Advisable to test another dilution water to see
if it impacts test results
• Dilution water may influence toxicity and TIE
interpretation
• Differences may be biological, chemical or
physical
TIE/TRE TEST DESIGN AND
ADDITIONAL CONTROLS
• Phase I includes numerous manipulations of
tested sample
• Manipulations may cause toxicity independent
of samples
• Be wary of chemical additions which oxidize
or reduce (examples will be provided)
• Solution: treat control water in same fashion
as sample and add to test as another control
TIE/TRE TEST DESIGN
SUMMARY
• Design changes with stage of
study
• Focus resources on issues
specific to each stage of study
• Maintain flexibility and creativity
• Avoid false conclusions with
multiple controls and checks
• Expertise