Transcript Stats 1

BioStatistics

Why Statistics?

You want to make the strongest conclusions based on limited data Differences in biological systems sometimes cannot be easily observed Random variation?

Real difference?

Statistics sometimes are Unnecessary

Large differences in observed events And small scatter within groups In most instances, though, the use of statistics can provide you with mathematically-based conclusions Clinical research Field research

Statistics extrapolate from sample to population

The only way to draw absolute conclusions about a population is to measure the trait(s) of interest of every individual in that population The reality is, this is almost always impossible to do Thus, randomly sampling some of the individuals can provide information about the entire population Sometimes random sampling can be difficult to define If your sample is not random, then conclusions drawn from it are not reliable

Samples and Populations

Quality control A company manufactures 20,000 vials (population) of a vaccine from a single production run About 50 vials (samples) are taken from this production run and analyzed for a variety of characteristics The results on 50 vials are then extrapolated to the remaining vials

Samples and Populations

Political polls The number of eligible U. S. voters is about 125,000,000 (population) A few hundred or thousands (sample) are asked to respond to political questions

Samples and Populations

Clinical studies Patients in a clinical study (sample) have a clinical condition (e.g., disease) They rarely reflect the entire population However, they often reflect the population with the condition Sampling humans can be particularly difficult

Samples and Populations

Field experiments Local variations Impact of weather Environmental conditions/changes Human impact Sampling bias

Samples and Populations

Laboratory experiments Usually not necessary Highly-controlled experiments Single variable Genetically-defined organisms Very little variation

What statistical calculations can do

Statistical estimation Calculation of a mean within a population is a precise number However, the number is only an estimate of the whole population Statistical hypothesis testing Helps determine if an observed difference is due simply to random chance Provides a

P value

; if

P

is small, the difference is unlikely due to random chance and the conclusion is statistically significant Statistical modeling Tests how well experimental data fit a mathematical model The most common form of statistical modeling is linear regression LR usually determines the best straight line through a set of data points

What statistical calculations cannot do

Analysis of a simple experiment Define a population you are interested in Randomly select a sample of subjects to study Randomly split the sample subjects into two groups One group gets one treatment The other group gets another treatment Measure a single variable trait in each subject Use statistical tests to determine if there’s a difference between the groups

What statistical calculations cannot do

The problems with real experiments Populations can be more diverse than your samples Samples are collected on convenience , rather than randomly The measured value is proxy value for what you’re really interested in Errors in data collection Record data incorrectly Assays may not report what you think they report You need to combine different types of measurements to reach an overall conclusion (multiple variables)

Why statistics are difficult to learn

Deceptive terminology (

significant, error, hypothesis

) Statistical conclusions are never absolute (

statistically significant

) Statistics uses abstract concepts (

populations, probabilities

) Statistics are at the interface of math and science Many statistical calculations require complex math

Variables

Independent variable - The variable scientists manipulate to evaluate a response Dependent variable - The variable (i.e., trait) resulting from a treatment with an independent variable

Variables

Types of variables in biology Measurement variables Continuous Discontinuous Ranked variables Attributes

Variables

Measurement variables - Those whose differing states can be expressed in a numerically-ordered fashion Continuous Can assume any value between two distinct points For example, there are infinite numbers between 1.5 and 1.6

Include: lengths, areas, volumes, weights, angles, temperatures, periods of time, percentages, rates Discontinuous Discrete values that can only have fixed numerical values The number of segments in an insect’s appendage may be 4, 5, or 6, but not 4.3

Variables

Ranked variables Variables that cannot be measured For example, order of emergence of pupae without regard to time Attribute variables Variables that cannot be measured, but must be expressed qualitatively For example: black/white; pregnant/nonpregnant; male/female; live/dead

Appropriate tests

Design Measurement Var 1 variable 1 sample Computing median and frequencies Computing means Computing standard deviations 1 Variable 2 samples t-tests Test of equality Paired comparisons test 1 Variable 2+ Samples ANOVA Tukey-Kramer test 2 Variables 1 Sample Regression analysis Polynomial regression Olmstead and Tukey’s corner test Ranked Var Attribute Var Confidence limits for percentages Runs test for randomness Mann-Whitney U test Testing differences Kolmogorov between two Smirnov two-sample percentages test Kruskal-Wallis test Friedman’s random ized block test G-test for percentages Ordering test Spearman’s rank test Chi-square test Fisher’s exact test

Means and Standard Deviations

The mean is the average of measured trait a population from In biology, we usually compare two or more populations, which we call groups The standard deviation is the variance the mean around Many statistical tests use means and standard deviations to determine if there are significant differences between groups

null hypothesis

Used to assume an event is true Statistics can be used to disprove the hypothesis This lends support to an alternative hypothesis Nearly every experiment that uses statistics should define null and alternative hypotheses

Student’s T-test

Determines if there is a significant difference between the means of two groups of measured data Paired - compares matched values between members of a group Unpaired - assumes values between members are not related Tests values for fit to a normal curve”) (aka -Gaussian) distribution (“bell If not, then use nonparametric testing One-tailed vs. two-tailed One-tailed: You must specify which group will have a larger mean in advance of data collection Two-tailed: You do not know which group will have a larger mean in advance of data collection

Student’s T-test

P value: Is there a significant difference between the means of the two groups?

Generally, if the P value is less than or equal to 0.05

, then the difference is considered significant t-value: Positive if the first mean is larger than the second and negative if it is smaller

Student’s T-test

Confidence interval The calculated mean is unlikely the exact same as the entire population Assumes your samples are randomly collected and fit a normal distribution If your sample is large with a small standard deviation , then your calculated mean likely is close to the actual mean The CI is a calculation based upon sample size and standard deviation If the CI is 95%, then the range of your calculated mean (i.e, standard deviation) probably (95%) includes the actual mean of the population under study