PPS - CellProfiler

Download Report

Transcript PPS - CellProfiler

Getting Started with CellProfiler
Mark-Anthony Bray, Ph.D
Imaging Platform, Broad Institute
Cambridge, Massachusetts, USA
Software Overview
Image Analysis &
Quantification
Image-centric
Data Analysis
• Available from www.cellprofiler.org
• Free, open source (Python)
• Software available for Windows, Mac and Linux
2
CellProfiler: Overview
• Process large sets of images
• Identifies and measures objects
• Export data for further analysis
• Goal: Provide powerful image analysis methods with a
user-friendly interface
• Philosophy: Measure everything, ask questions later...
• Support data analysis based on individual cells
3
Typical CellProfiler Pipeline Workflow
• For image-based assays, the basic
objective is always to
– Identify cells/organisms
– Measure feature(s) of interest
• The uniqueness of each assay comes in
– Deciding what compartments to identify and
how to identify them
– Determining which measure(s) are most useful
to identify interesting samples
5
The CellProfiler Interface
Module help
Add or remove modules
Change module position
• Pipeline panel: Displays modules in pipeline
– Modules executed in order from top to bottom
7
The CellProfiler Interface
Load pipeline by double-clicking on it
View images by double-clicking on the filename
• File panel: Displays files in default image folder
8
The CellProfiler Interface
• The figure window
has additional
menu options
• Toolbar menu:
Pan, zoom in/out
• CellProfiler Image
Tools
– Image Tool (also
displayed by
clicking on image)
– Interactive zoom
– Show pixel data
(location, intensity)
9
The CellProfiler Interface
Input folder: Contains images to be analyzed
Output folder: Contains the output file plus exported data and images
• Folder panel: Change default input and output directories
– Usually these should be separate folders
10
The CellProfiler Interface
• Settings panel: View and change settings for each module
– Clicking on a different module updates the settings view
11
Module Categories
• File processing: Image
input, file output
• Image processing: Often
used for pre-processing
prior to object identification
• Object processing:
Identification, modification
of objects of interest
• Measurement: Collection
of measurements from
objects of interest
• Data Tools: Measurement
exploration, measurement
output
12
The First Module: LoadImages
• Loads an “image set” which is a group of related images,
in preparation for further processing
DNA
GFP
• Related how? Depending on the imaging device, one file
may represent
–
–
–
–
One channel at one imaging location
Multiple channels at one imaging location
Multiple channels at multiple locations
Etc…
13
The First Module: LoadImages
• Can use text matching to define the difference between images in a set
All images stained for GFP have the text Channel1- in the name
Assign each image a meaningful name name for downstream reference
Same for DNA images (Channel2-)
14
What Is An “Image”?
•Images from Carolina Wahlby
16
Object Identification
• Once the images are loaded, how do you find objects of
interest?
• Step 1: Distinguish the
foreground from the
background by picking a
good threshold
• Step 2: Identify objects as
regions brighter than the
threshold
• Step 3: Cut and join
objects to “improve” their
shape
17
Primary Object Identification
• Many options for thresholding, cut and join methods, etc.
18
Thresholding
• Definition: Division of the
image into background and
foreground
Frequency
• What is the best threshold
value for dividing the
intensity histogram into
foreground and
background pixels…
Here?
Or here?
Pixel values
• Method: Pick the method that provides the best results
– Otsu: Default - Good for readily identifiable foreground / background
– Background, RobustBackground: Good for images in which most of
the image is comprised of background
19
Thresholding
• Correction factor
– Multiplication factor applied to threshold
– Adjusts threshold stringency/leniency
– Setting this factor is empirical
• Upper/lower bounds
– Set safety limits on automatic threshold to
guards against false positives
– Helpful for unexpected images: Empty wells,
images with dramatic artifacts, etc
20
Object Separation
•
• •
• •
••
•
• Once the foreground objects have been
identified, we need to distinguish multiple
objects contained in the same “clump”
Images from Carolina Wahlby
21
Object Separation
Adjust settings to “de-clump” objects
• Two step process in “de-clumping”
1. Identification of the objects in a clump
2. Drawing boundaries between the clumped objects
22
Object Separation
• Clump identification: Two options
Peaks
•
• •
• •
••
•
– Intensity: Works best if
objects are brighter at
center, dimmer at
edges
1
1
– Shape: Works best if
objects have
indentations where
clumps touch (esp. if
objects are round)
2
Indentations
1
2
23
Object Separation
• Drawing boundaries: Two options
•
• •
• •
••
– Distance: Draws
boundary lines
midway between
object centers
•
1
– Intensity: Draws
boundary lines at
dimmest line between
objects
• Test mode allows users to view results of all
setting combinations
24
Object Separation
• Additional separation settings: Adjust these
settings if objects are being incorrectly split into
pieces or merged together
Original image
Smoothing filter
size = 4
Smoothing filter
size = 8
• Smoothing: Increase to reduce intensity
irregularities which produce over-segmentation
of objects
25
Object Separation
Maxima
Original image
Maxima
distance = 4
Maxima
distance = 8
• Suppress Local Maxima
– Smallest distance allowed between object intensity
peaks to be considered one object rather than a clump
– Decrease to reduce improper merging of objects in
clumps
26
Object Separation
• However….
Original image
Smoothing filter
size = 4
Smoothing filter
size = 8
• Adjusting these parameters can produce more improper
segmentation than it solves
• The proper settings are usually a matter of trial and error
– The automatic settings are a good starting point, though
27
Filtering Invalid Objects
Discard objects that fail size criterion or touch the image border
• See FilterObjects module for more advanced filtering options
28
Primary Object Identification
• Colors used to label
each segmented object
– Shows if each object has
been identified and
separated properly
• Outlines highlight valid
objects
– Green: Valid
– Yellow: Invalid – Touching
border
– Red: Invalid – Size
criterion
• Gives object count as a
measurement
29
Secondary Object Identification
• Goal: Identify individual cell boundaries by “growing” primary objects
using a staining channel
– Nuclei typically more uniform in shape, more easily separated than cells
• Segment nuclei first, then use segmented nuclei to start cell
segmentation
30
Secondary Object Identification
• Methods
– Distance-N: Ignores image
information
• Useful in cases where no cell
stain is present
– Watershed, propagate,
Distance-B: Uses image
information
Distance-N
• Finds dividing lines between
objects and background /
neighbors
• Test mode allows user to
view results of all methods
Propagation
31
Secondary Object Identification
• Regularization: Controls the precise dividing line
between cells that touch each other
– Performed by balancing between intensity and distance
– Usually not adjusted
Regularization = 0
Regularization = ∞
• Correction factor, lower/upper bounds on threshold:
Same purpose as in IdentifyPrimaryObjects
32
Tertiary Object Identification
• Goal: Identify tertiary objects by removing
the primary objects from secondary objects
– “Subtract” the nuclei objects from cell objects
to obtain cytoplasm
Cells
—
Nuclei
═
Cytoplasm
33
Measurement Modules: Object Morphology
Select the objects to measure
34
Module: MeasureObjectAreaShape
• Goal: Measure morphological features such as
–
–
–
–
–
–
–
Area
Perimeter
Eccentricity
MajorAxisLength
MinorAxisLength
Orientation
FormFactor: Compactness measure, circle = 1, line = 0
35
Measurement Modules: Object Intensity
Select the image to measure from
Select the objects to measure
36
Module: MeasureObjectIntensity
• Goal: Measure object intensity features such as
– Integrated intensity: Sum of the pixel intensities within
an object
– Mean, median, standard deviation intensities
– Maximal and minimal pixel intensities
– Lower/Upper quartile
• The object intensity may be obtained from any
image, not just the image used to identify the
object
– Example: Ph3 intensity may be measured using the
nuclei objects
37
Measurement Modules: Object Texture
Select the image to measure from
Select the objects to measure
Select the spatial scale
38
MeasureObjectTexture
• Goal: Determine whether the staining pattern is
smooth on a particular scale
• Selection of the appropriate texture scale is
essentially empirical
– A higher number measures larger patterns of texture
– Smaller numbers measure more localized (finer)
patterns of texture
• Can also add several texture modules to the
pipeline, each measuring a different texture scale
39
Other Measurement Modules
• CalculateMath: Arithmetic operations for measurements
• CalculateStatistics: Assay quality (V and Z' factors) and
dose response data (EC50) for all measurements
• Image-based measures
– MeasureImageAreaOccupied
– MeasureImageGranularity
– MessureImageIntensity
• Object-based measures
– MeasureCorrelation
– MeasureObjectNeighbors
– MeasureRadialDistribution
40
Data Export Modules
Select the objects to export
• User may output images or image measurements
41
Measurement Display
• The average
measurements for
all objects in the
image are
displayed in the
figure window
• However, the
individual
measurements for
each object are
stored in the output
file
42
Data Export Modules
• Goal: Retain images of intermediate image processing
steps for quality control or save measurements for later
analysis and exploration
• SaveImages: Writes an image to a file
– Intermediate images in the pipeline are not saved unless
requested
– Choice of many image formats to write → module can be used as
an image format converter
• ExportToSpreadsheet: Export measurements as a
comma-separated file readable by spreadsheet programs
• ExportToDatabase: Export measurements as a perobject and per-table plus configuration file for upload to a
MySQL database
43
Illumination Correction
• The physical limitations of
any microscope produce
nonuniformities in the
optical path of the sample,
microscope, and/or
camera
(a)
(b)
• Example: Tiling raw images shows that there is
uneven illumination from left to right in each image
– This heterogeneity can lead to inaccurate intensity
measurements
– A cell located at (a) is brighter than one at (b) even if the
cells have the same amount of fluorescent material
Carpenter et al, Genome Biology 2006, 7:R100
45
Illumination Correction
• Illumination correction ensures that object
segmentation and measurements (e.g. DNA
content) are more accurate
Carpenter et al, Genome Biology 2006, 7:R100
46
Illumination Correction
• Two modules
– Correct Illumination Calculate: Creates a illumination correction function
– Correct Illumination Apply: Applies the function to your images
• Available options
– Correct each image individually, or all images together as an ensemble?
– Calculate the illumination function by using foreground pixels or
background pixels?
– Apply the function using division or subtraction?
• Additional considerations
– Create a new illumination correction function if you image on a different
microscope or change plates
– Correct each channel since absolute illumination intensities may differ
between channels
– First, create and save the function from image set, then load and apply it
prior to identification
47
Cluster Computing
• If processing time is too great on a single
computer, then run the pipeline on a cluster
– Download and install CellProfiler on a computing
cluster
– Add the ExportToDatabase module
– Add the CreateBatchFiles module to the end of the
pipeline and configure it appropriately
– Run the first image cycle locally
– Submit the batches to your cluster for processing
– Check the progress of processing
• For really big screens, it is necessary to process
images in batches on a computing cluster.
48
Data Analysis
• At the end of a pipeline, you may have
500+ features per cell
– Size, shape, staining intensity, texture
(smoothness), etc
• Remember our Philosophy: “Measure
everything, ask questions later...”
49
Data Analysis
• What does this data set look like?
• Cytological profile, or Cytoprofile
+1
0
-1
-.2 .7 -.1 0
.2 -.9
Cell #6111617
• Shows all the measurements acquired
– For each individual cell
– In every image
– In the entire experiment.
50
CellProfiler Analyst: Overview
• Explore data large sets of images
• Identify interesting subpopulations and see
the original images
• Identify interesting phenotypes
automatically
• Goal: Provide the user with
a powerful suite of image
exploration and machine
learning methods
51
The CellProfiler Analyst Interface
• CellProfiler Analyst (CPA) allows you to explore the data
with a variety of tools
• Upon startup, CPA request a properties file which contains
– Locations of the measurement tables
– How the images are referenced
– Other assorted information
52
Plate Viewer
• Displays data in plate layout
– 96- or 384-well format
– Measurements are shown as color-coded wells or mouse tool-tips
– Right-clicking on well reveals list of images to display
53
Image Viewer
• Displays an image
referenced by
number
• Color display
– Colors are assigned to
each channel of image
data
– Shown as a merged
color image
– Toggle channel
visibility and color
scaling
54
Plotting Tools
• Various plotting tools allow
user to explore and sift
through the measurements
and make discoveries
55
Data Analysis
• Why make so many measurements?
Y-axis: phospho-H3 staining
– For many screens, only a few measurements
are necessary to obtain the phenotype
X-axis: DNA content
56
Data Analysis
• Unfortunately, for other phenotypes, the
proper features are not so simple to find…
Wild-type
HT29 cells
Long
projections
Crescent-shaped nuclei
Crooked
projections
Cells
on the
move
Actin dots at
junctions
Peas in a pod
Hyphae-like
projections
57
Data Analysis
•
Concentrating on single cells allows us to avoid
problems of heterogeneous populations, and to
detect rare events (such as mitosis)
•
However, determining which combinations of
features and values are appropriate for a
phenotype is tedious and impractical
•
We have included a machine learning
classification tool to automatically chose the
features and values require to score a rare or
subtle phenotype
58
Automated Cell Image Processing
Thousands of
wells
104 images, 103 cells in each:
Total of 107 cells/experiment
Each cell with cytoprofile
• Cytoprofile of 500+ features measured for each cell
59
Iterative Machine Learning
• System presents ~500 cells to biologists for scoring
Iteration
Rule
Yes
No
• System defines rule based on cytoprofile of scored cells
60
Iterative Machine Learning
107 cells
Rule
Scored
• Scored cells are sorted by well: Identify samples with a
high proportion of positive cells
61
Final Notes
• Where to get help
– Access help from the CellProfiler main window
– Ask for help on the CellProfiler.org forum
62
The Team
Director
Image assay development
IT/Administration
Apply image analysis methods to biological questions
Anne
Carpenter
David
Logan
Mark
Bray
Peggy
(Margaret)
Anthony
Kate
Madden
Algorithm development & software engineering
Develop & test new image analysis and data mining methods
and create open-source software tools
Ray
Jones
Vebjørn
Ljoså
Adam
Fraser
Lee
Kamentsky
Carolina
Wählby
Auguste
Genovesio
(begins 2010)
63