Slide 1

Transcript Slide 1

Canadian Bioinformatics
Workshops
www.bioinformatics.ca
Module #: Title of Module
2
Lecture 8
Microarrays II: Data Analysis
MBP1010
†
Dr. Paul C. Boutros
Winter 2014
DEPARTMENT OF
MEDICAL BIOPHYSICS
†
Aegeus, King of Athens, consulting the Delphic Oracle. High Classical (~430 BCE)
This workshop includes material
originally developed by Drs. Raphael Gottardo,
Sohrab Shah, Boris Steipe and others
Course Overview
•
•
•
•
•
•
•
•
•
•
Lecture 1: What is Statistics? Introduction to R
Lecture 2: Univariate Analyses I: continuous
Lecture 3: Univariate Analyses II: discrete
Lecture 4: Multivariate Analyses I: specialized models
Lecture 5: Multivariate Analyses II: general models
Lecture 6: Sequence Analysis
Lecture 7: Microarray Analysis I: Pre-Processing
Lecture 8: Microarray Analysis II: Multiple-Testing
Lecture 9: Machine-Learning
Final Exam (written)
Lecture 8: Microarrays Part II
bioinformatics.ca
House Rules
• Cell phones to silent
• No side conversations
• Hands up for questions
Lecture 8: Microarrays Part II
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
Example #1
You are conducting a study of osteosarcomas using mouse models. You
are using a strain of mice that is naturally susceptible to these tumours at
a frequency of ~20%. You are studying two transgenic lines, one of which
has a deletion of a putative tumour suppressor (TS), the other of which
has an amplification of a putative oncogene (OG). Tumour penetrance in
these two lines is 100%. Your hypothesis: tumours in mice lacking TS will
be smaller than those in mice with amplification of OG, as assessed by
post-mortem volume measurements of the primary tumour. Your data:
TS (cm3)
3.9
7.1
3.1
4.4
5.0
Lecture 8: Microarrays Part II
OG (cm3)
5.2
1.9
5.0
6.1
4.5
4.8
bioinformatics.ca
Example #2
You are conducting a study of osteosarcomas using mouse models. You
are studying transgenic animals with deletion of a tumour suppressor
(TS), or with amplification of an oncogene (OG). You consider the
penetrance of tumours in a set of 8 different mouse strains.
Your hypothesis: some mouse strains are lead to bigger tumours than
others when OG is amplified and only considering animals in which
tumours form. You measure tumour volume in mm3 using calipers.
Strain 1 (mm3)
91
69
83
Strain 2 (mm3)
201
70
71
Strain 3 (mm3)
15
36
20
Strain 4 (mm3)
52
52
53
Strain 5 (weeks)
11
538
59
Strain 6 (mm3)
6
60
63
Strain 7 (mm3)
85
79
70
Strain 8 (mm3)
100
105
121
Lecture 8: Microarrays Part II
bioinformatics.ca
Example #3
You are conducting a study of osteosarcomas using mouse models. You
are using a strain of mice that is naturally susceptible to these tumours at
a frequency of ~20%. You are studying two transgenic lines, one of which
has a deletion of a putative tumour suppressor (TS), the other of which
has an amplification of a putative oncogene (OG). Tumour penetrance in
these two lines is 100%. Your hypothesis: mice lacking TS are less likely
to respond to a novel targeted therapeutic (DX) than wildtype animals, as
assessed by molecular imaging:
TS (imaging response)
Yes
No
Yes
Yes
No
Lecture 8: Microarrays Part II
WT (imaging response)
Yes
Yes
Yes
Yes
No
Yes
bioinformatics.ca
Example #4
You are conducting a study of osteosarcomas using mouse models. You
are using a strain of mice that is naturally susceptible to these tumours at
a frequency of ~20%. You are studying two transgenic lines, one of which
has a deletion of a putative tumour suppressor (TS), the other of which
has an amplification of a putative oncogene (OG). Based on your
previous data, you now hypothesize that mice lacking TS will show a
similar molecular response to DX as those with amplification of OG. You
use microarrays to study 20,000 genes in each line, and identify the
following genes as changed between drug-treated and vehicle-treated:
TS (DX-responsive genes)
MYC KRAS CD53
CDH1 FBW1 SEPT7
MUC1 MUC3 MUC9
RNF3
Lecture 8: Microarrays Part II
OG (DX-responsive genes)
MYC KRAS CD53
CDH1 MUC1 MARCH1
PTEN IDH3 ESR2
RHEB CTCF STK11
MLL3 KEAP1 NFE2L2
ARID1A
bioinformatics.ca
Example #5
You are conducting a study of osteosarcomas using mouse models. You
are using a strain of mice naturally susceptible to these tumours at ~20%
penetrance. You are studying two transgenic lines, one with deletion of a
tumour suppressor (TS), the other with amplification of an oncogene
(OG). Tumour penetrance in these is 100%.
Your hypothesis: You now wonder if tumour size is differing by age of
the animal, and suspect tumour-size differs between lines, but is
confounded by age differences. Your data:
TS (cm3)
3.9 (17 weeks)
7.1 (15 weeks)
3.1 (15 weeks)
4.4 (22 weeks)
5.0 (22 weeks)
Lecture 8: Microarrays Part II
OG (cm3)
5.2 (17 weeks)
1.9 (9 weeks)
5.0 (15 weeks)
6.1 (15 weeks)
4.5 (21 weeks)
4.8 (20 weeks)
Wildtype (cm3)
1.1 (9 weeks)
1.5 (10 weeks)
2.1 (15 weeks)
2.5 (15 weeks)
0.3 (17 weeks)
2.2 (21 weeks)
bioinformatics.ca
Example #6
You are conducting a study of osteosarcomas using mouse models. You
are using a strain of mice that is naturally susceptible to these tumours at
a frequency of ~20%. You are studying two transgenic lines, one of which
has a deletion of a putative tumour suppressor (TS), the other of which
has an amplification of a putative oncogene (OG). Tumour penetrance in
these two lines is 100%. Your hypothesis: mice lacking TS will acquire
tumours sooner than wildtype mice. You test the mice weekly using
ultrasound imaging. Your data:
TS (week of tumour)
4
7
7
6
5
Lecture 8: Microarrays Part II
OG (week of tumour)
3
9
3
2
4
3
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
Summary Point #1:
Microarray data is analyzed with a pipeline
of sequential algorithms.
This pipeline defines the standard
workflow for microarray experiments.
Lecture 8: Microarrays Part II
bioinformatics.ca
Spot Cy3 Cy5
Quantitation
Background
Spot
Quality
Inter-array
Intra-Array
Significance
Testing
Spot List
Clustering
Lecture 8: Microarrays Part II
Integration
?
bioinformatics.ca
Summary Point #2:
This is an active research area.
Lecture 8: Microarrays Part II
bioinformatics.ca
Summary Point #3:
These basic steps hold true for all
microarray platforms and types.
Lecture 8: Microarrays Part II
bioinformatics.ca
What Is BioConductor?
“Bioconductor is an open source, open development software
project to provide tools for the analysis and comprehension of
high-throughput genomic data.”
- BioConductor website
The vast majority of our analyses will use BioConductor
code, but there are clearly non-BioConductor approaches.
Lecture 8: Microarrays Part II
bioinformatics.ca
I’ve outlined the general workflow.
Each technology and application has its
own unique characteristics to consider.
Module 1
bioinformatics.ca
Let’s Define an AffymetrixSpecific Workflow
Module 1
bioinformatics.ca
Quantitation is done according
Spot Cy3 Cy5
to Affymetrix defaults with
Quantitation
minimal user intervention.
One-Channel array
Background
Spot
Quality
Single-Channel array, so one
Inter-array simultaneous normalization
Intra-Array
procedure
Typically
ignored
Significance
Testing
Spot List
Clustering
Module 1
Integration
?
bioinformatics.ca
Let’s Collapse This a Bit And
Re-Phrase Things
Module 1
bioinformatics.ca
.CEL
Files
Background
Normalization
ProbeSet
Annotation
Spot List
Clustering
Statistics
Integration
?
Module 1
bioinformatics.ca
First let’s go Back to Pre-Processing
What exactly is pre-processing (aka
normalization)?
Why do we do it?
Module 1
bioinformatics.ca
Sources of Technical Noise
Where does technical noise come from?
Module 1
bioinformatics.ca
More Sources of Technical Noise
Module 1
bioinformatics.ca
Any step in the experimental pipeline can
introduce artifactual noise
•
•
•
•
•
•
•
Array design
Array manufacturing
Sample quality
Sample identity  sequence effects?
Sample processing
Hybridization conditions  ozone?
Scanner settings
Pre-Processing tries to remove these systematic effects
Module 1
bioinformatics.ca
Important Note
Pre-processing is never a substitute for
good experimental design. This is not a
course on statistical design, but a few
basic principles should be mentioned.
Biological replicates are
preferable to technical
replicates.
Always try to balance
experimental groups.
If processing samples identically is not possible, include
controls for processing-effects.
Module 1
bioinformatics.ca
Pre-Processing
What exactly is pre-processing (aka
normalization)?
Why do we do it?
Lecture 8: Microarrays Part II
bioinformatics.ca
Sources of Technical Noise
Where does technical noise come from?
Lecture 8: Microarrays Part II
bioinformatics.ca
More Sources of Technical Noise
Lecture 8: Microarrays Part II
bioinformatics.ca
Any step in the experimental pipeline can
introduce artifactual noise
•
•
•
•
•
•
•
Array design
Array manufacturing
Sample quality
Sample identity  sequence effects?
Sample processing
Hybridization conditions  ozone?
Scanner settings
Pre-Processing tries to remove these systematic effects
Lecture 8: Microarrays Part II
bioinformatics.ca
Affymetrix Pre-Processing Steps
1. Background Correction
2. Normalization
3. Probe-Specific Adjustment
4. Summarizing multiple Probes into a single ProbeSet
Let’s look at two common approaches
Lecture 8: Microarrays Part II
bioinformatics.ca
Introducing Two Major Affymetrix PreProcessing Methods
• The two most commonly used methods are:
• RMA = Robust Multi-array
• MAS5 = Microarray Analysis Suite version 5
• MAS5 has strengths & weaknesses
• Sacrifices precision for accuracy
• Can easily be used in clinical settings
• RMA has strengths & weaknesses
• Sacrifices accuracy for precision
• Challenging to integrate multiple studies
• Reduces variance (critical for small-n studies)
• Both are well accepted by journals and reviewers, perhaps RMA a bit more so.
We’ll talk about some of the mathematics later on in this course.
Module 1
bioinformatics.ca
Approach #1: MAS5
• Affymetrix put significant effort into developing good
data pre-processing approaches
• MAS5 was an attempt to develop a “standard”
technique for 3’ expression arrays
• The flaws of MAS5 led to an influx of research in this
area.
• The algorithm is best-described in an Affymetrix
white-paper, and is actually quite challenging to
reproduce exactly in R.
Lecture 8: Microarrays Part II
bioinformatics.ca
MAS5 Model
Observations = True Signal + Random Noise + Probe Effects
Assumptions?
Lecture 8: Microarrays Part II
bioinformatics.ca
MAS5: Background & Noise
Background
•Divide chip into zones
•Select lowest 2% intensity values
•stdev of those values is zone variability
•Background at any location is the sum of all zones
background, weighted by 1/((distance^2) + fudge
factor)
Noise
•Using same zones as above
•Select lowest 2% background
•stedev of those values is zone noise
•Noise at any location is the sum of all zone noise as
above
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II
bioinformatics.ca
MAS5: Adjusted Intensity
A = Intensity minus background, the final value
should be > noise.
A: adjusted intensity
I: measured intensity
b: background
NoiseFrac: default 0.5 (another fudge factor)
And the value should always be >=0.5 (log issues)
(fudge factor)
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II
bioinformatics.ca
MAS5: Ideal Mismatch
Because Sometimes MM > PM
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II
bioinformatics.ca
MAS5: Signal
Value for each probe:
Modified mean of probe values:
Scaling Factor
(Sc default 500)
ReportedValue(i) = nf * sf * 2 (SignalLogValuei)
Signal
(nf=1)
Tbi = Tukey Biweight (mean estimate, resistant to outliers)
TrimMean = Mean less top and bottom 2%
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II
bioinformatics.ca
What is RMA?
RMA = Robust Multi-Array
Why do we use a “robust” method?
Robust summaries really improve over the standard ones by
down weighing outliers and leaving their effects visible in
residuals.
Why do we use “array”?
To put each chip’s values in the context of a set of similar
values.
Lecture 8: Microarrays Part II
bioinformatics.ca
What is RMA?
It is a log scale linear additive model
Assumes all the chips have the same
background distribution
Does not use the mismatch probe (MM) data from
the microarray experiments
Why?
Lecture 8: Microarrays Part II
bioinformatics.ca
What is RMA?
Mismatch probes (MM) definitely have
information - about both signal and noise - but
using it without adding more noise is a challenge
We should be able to improve the background
correction using MM, without having the noise
level blow up: topic of current research (GCRMA)
Ignoring MM decreases accuracy but increases
precision
Lecture 8: Microarrays Part II
bioinformatics.ca
Methodology
Quantile Normalization – the goal of this method
is to make the distribution of probe intensities for
each array in a set of arrays the same. This
method is motivated by the idea that a Q-Q plot
shows that the distribution of two data vectors is
the same if the plot is a straight diagonal line and
not the same if it is anything else.
Lecture 8: Microarrays Part II
bioinformatics.ca
Methodology
Lecture 8: Microarrays Part II
bioinformatics.ca
Methodology
Summarization: combining multiple probe
intensities of each probeset to produce
expression values
An additive linear model is fit to the normalized
data to obtain an expression measure for each
probe on the GeneChip
Yij = aj + βi + εij
Lecture 8: Microarrays Part II
bioinformatics.ca
Methodology
Yij = aj + βi + εij
Yij denotes the background-corrected normalized
probe value corresponding to the ith GeneChip and
the jth probe within the probeset [log2(PM-BG)*ij]
aj is the probe affinity jth probe
βi is the chip effect for the ith GeneChip (log scale
expression level)
εij is the random error term
Lecture 8: Microarrays Part II
bioinformatics.ca
Methodology
Yij = aj + βi + εij
Estimate aj ( probe affinity) and βi (chip effect)
using a robust method:
• Tukey’s Median polish (quick) - fits iteratively,
successively removing row and column medians,
and accumulating the terms, until the process
stabilizes. The residuals are what is left at the end
Lecture 8: Microarrays Part II
bioinformatics.ca
RMA vs. MAS5
• RMA sacrifices accuracy for precision
• RMA is generally not appropriate for clinical settings
• RMA provides higher sensitivity/specificity in some
tests
• RMA reduces variance (critical for small-n studies)
• RMA is better accepted by journals and reviewers
Lecture 8: Microarrays Part II
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
One key detail has been omitted so far:
How do we know if our pre-processing
actually worked?
Lecture 8: Microarrays Part II
bioinformatics.ca
Can we determine how well our preprocessing worked?
Or if our data looks good?
Lecture 8: Microarrays Part II
bioinformatics.ca
Let’s See Some “Bad” Data
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Those Three Were From A Spike-In
Experiment Done by Affymetrix
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Lecture 8: Microarrays Part II
bioinformatics.ca
Those Last Three Were From An
Experiment We Did On Rat Liver Samples
Lecture 8: Microarrays Part II
bioinformatics.ca
Were Those Bad Samples?
• Lots of evident spatial artifacts
• But in practice all samples were carried forward into
analysis
• And validation (RT-PCR) confirmed the overall study
results for many genes
Lecture 8: Microarrays Part II
bioinformatics.ca
Eye-ball Assessments Are Hard
• A couple of useful tricks:
• Look at the distributions
• Did quantile normalization work (for RMA)?
• Look at the inter-sample correlations
• Is one sample a strong outlier?
• Look at the 3’  5’ trend across a ProbeSet
I know of no accepted, systematic QA/QC methods
Lecture 8: Microarrays Part II
bioinformatics.ca
Distributions (Raw)
Lecture 8: Microarrays Part II
bioinformatics.ca
Distributions (normalized)
Lecture 8: Microarrays Part II
bioinformatics.ca
Inter-Sample Correlations
Lecture 8: Microarrays Part II
bioinformatics.ca
3’  5’ Signal Trend
Lecture 8: Microarrays Part II
bioinformatics.ca
What Do You Do If You Find a Bad Array?
• Repeat it?
• Drop the sample?
• Include it but account for the “noise” in another way?
Lecture 8: Microarrays Part II
bioinformatics.ca
In This Case
• We excluded a series of outlier samples
• We believed these samples had been badly degraded
because their were derived from FFPE blocks
Lecture 8: Microarrays Part II
bioinformatics.ca
Final Distribution
Lecture 8: Microarrays Part II
bioinformatics.ca
Final Heatmap
Lecture 8: Microarrays Part II
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
T-tests
• What are the assumptions of the t-test?
• When would you feel comfortable using a t-test?
Lecture 8: Microarrays Part II
bioinformatics.ca
T-Test Alternative: Wilcoxon Rank-Sum
• Also called:
• U-test
• Mann-Whitney (U) test
• Some argue that for continuous microarray data there is
rarely a good reason to use this test:
• Low n: tests of normality are not very powerful
• High n: the central limit theorem provides support
• If the sample is normal, asymptotic efficiency is 0.95
Lecture 8: Microarrays Part II
bioinformatics.ca
T-Test Alternative: Moderated Statistics
• A series of highly complex methods based on Bayesian
statistical methodologies
• Gordon Smyth’s limma R package is by far the most
widely used implementation of this technique
This term is “shrunk” by
borrowing power across all
genes. This increases effective
power.
Lecture 8: Microarrays Part II
bioinformatics.ca
T-Test Alternative: Permutation Tests
• SAM is the classic method
• Most people suggest not using SAM today
• Empirically estimate the null distribution
Iterate
Start with many samples
Lecture 8: Microarrays Part II
Randomly Sample
bioinformatics.ca
Problems with Significance Testing
• What happens if there are NO changes?
• Imagine:
• You analyzed 1,000 clinical samples
• 20,000 genes in the genome
• P < 0.05
• What if… somebody comes and randomizes all your data?
Lecture 8: Microarrays Part II
bioinformatics.ca
You had a lot of Data
20,000 genes / array
1,000 patients
All
Randomized
20,000,000 data points
Genes are mixed up together
Patients are mixed together
What happens if you analyze this data?
There should be NO real hits anymore!
Lecture 8: Microarrays Part II
bioinformatics.ca
What will you actually find?
Array: 20,000 genes
Threshold: p < 0.05
20,000 x 0.05 = 1000 False Positives
This is called “multiple testing”.
There is a solution
Lecture 8: Microarrays Part II
bioinformatics.ca
20%
15%
10%
A “false-discovery rate
adjustment” (FDR) for multiple
testing considers all 20,000 pvalues simultaneously
In this experiment, lots of low
p-values, so we can use this
to “adjust” the p-values so
we can find the true hits.
Expected Value
5%
0%
P-Value
Lecture 8: Microarrays Part II
bioinformatics.ca
This is what you get from randomized
data…
In this experiment, NO
enrichment for low p-values,
so no more hits than
expected randomly.
Lecture 8: Microarrays Part II
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
The Mask Production Makes Affymetrix Designs
Expensive To Change
Photolithographic mask
Lecture 8: Microarrays Part II
bioinformatics.ca
But… there are multiple probes per gene
Lecture 8: Microarrays Part II
bioinformatics.ca
We Can Change Those Mappings!
Hybridized
Chip
Lecture 8: Microarrays Part II
bioinformatics.ca
CDF File
• Chip Definition File
• This file maps Probes (positions) into ProbeSets
• We can update those mappings
• Ignore deprecated or cross-hybridizing probes
• Merge multiple probes that recognize the same gene
• Account for entirely new genes that were not known at the time
of array-design
Lecture 8: Microarrays Part II
bioinformatics.ca
Sequence Mappings Are Slow
• Requires aligning millions of 25 bp probes against the
transcriptome and identifying the best match for each
• Fortunately, other groups have done this for us, and
regularly update their mappings
Lecture 8: Microarrays Part II
bioinformatics.ca
Many Probes Are Lost
Lecture 8: Microarrays Part II
bioinformatics.ca
But There Is Also A Major Benefit
Increased
validation
rates using
RT-PCR
(~10%)
Sandberg et al
BMC Bioinformatics
2007
Lecture 8: Microarrays Part II
bioinformatics.ca
Topics For This Week
• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II
bioinformatics.ca
What Are The Outputs of A Microarray Study?
• Primary Data
• Raw image (.DAT file)
• Quantitation (.CEL file)
These file can be 10s
of GB for a typical Affy
study
• Secondary Data
• Normalized data (usually an ASCII text file)
• QA/QC plots
• Tertiary Data
• Statistical analyses
• Global visualization (e.g. heatmaps)
• Downstream analyses (e.g. pathway, dataset-integration)
Lecture 8: Microarrays Part II
bioinformatics.ca
How Do You Organize These Data?
I recommend you put things on a fast, backed-up network drive
/data/
Organize data by project
/data/Project
Create separate directories for each analysis
/data/Project/raw
/data/Project/QAQC
/data/Project/pre-processing
/data/Project/statistical
/data/Project/pathway
Lecture 8: Microarrays Part II
bioinformatics.ca
How Do You Organize The Scripts?
I recommend you write a separate script for each analysis, and put those in
a standardized (backed-up!) location, mirroring the directory structure and
naming of your dataset directories.
Some sub-structure here is often useful:
/scripts/Project/pre-processing.R
/scripts/Project/statistical-univariate.R
/scripts/Project/statistical-multivariate.R
/scripts/Project/pathway/GOMiner.R
/scripts/Project/pathway/Reactome.R
/scripts/Project/integration/mRNA+CNV.R
/scripts/Project/integration/public-data.R
Lecture 8: Microarrays Part II
bioinformatics.ca
Why Many Small Scripts?
• Monolithic scripts are hard to maintain
• Easier to make errors
• Accidentally re-using the same variable name
• Harder to debug
• Harder for somebody else to learn
• Small scripts are more flexible
• Quicker to modify/re-run a small part of your analysis
• Easier to re-use the same code on another dataset
• This is akin to the “unix” mindset of systems design
Lecture 8: Microarrays Part II
bioinformatics.ca
What To Save?
• Everything!!
• All QA/QC plots (common reviewer request)
• All pre-processed data (needed for GEO uploads)
• Gene-wise statistical analyses
• Not just the statistically-significant genes
• Collapse all analyses into one file, though
• All plots/etc
• Using clear filenames is critical
• Disk-space is not usually a critical concern here
• Your raw data will be much larger than your output!
Lecture 8: Microarrays Part II
bioinformatics.ca
Most Important Points
• Do not delete things:
• Keep all old versions of your scripts by including the date in the
filename (or using source-control)
• Version output files by date
• I have needed to go back to analyses done 7 years prior!
• Make regular (weekly) backups:
• Try to pass this work off to professional sysadmins
• External hard-drives/USBs are okay if you cannot get access to
network drives, but try to automate
Lecture 8: Microarrays Part II
bioinformatics.ca
Course Overview
•
•
•
•
•
•
•
•
•
•
Lecture 1: What is Statistics? Introduction to R
Lecture 2: Univariate Analyses I: continuous
Lecture 3: Univariate Analyses II: discrete
Lecture 4: Multivariate Analyses I: specialized models
Lecture 5: Multivariate Analyses II: general models
Lecture 6: Sequence Analysis
Lecture 7: Microarray Analysis I: Pre-Processing
Lecture 8: Microarray Analysis II: Multiple-Testing
Lecture 9: Machine-Learning
Final Exam (written)
Lecture 8: Microarrays Part II
bioinformatics.ca