Transcript Document

SAS Lecture 6 – SAS/GRAPH
Aidan McDermott,
May 3, 2005
SAS/GRAPH
 There are a small number of graphic types
commonly used in public health presentations
and publication.
 These basic types are either used alone or
mixed together to form a composite graphic.
 Here we will look at how to build some of these
basic types of graph.
 Golden Rule: Everybody is a graph critic.
2
Two types of graph maker
 If you are using SAS for statistics and data management
then it seems natural to use it to produce your graphs as
well. Sometimes a statistical procedure will produce the
graph you are looking for anyway.
 Need a one-off graph for a presentation versus
production line graphs.
 To produce “quick and dirty” graphs you can use
Graph-n-go.
 Very easy to use; not bad for putting multiple graphs on one
page; data viewer is a graph type; only a small number of
graph types available; not all options available; labor intensive
so not suitable for production line graphs.
 Use SAS/Graph procedures
 Very flexible; complete control over graphic elements; less
labor intensive in the long run; harder to learn; same control
3
can be used for SAS/STAT graphics output.
Some common types of graph
 Charts
 Histograms
 Stem and leaf plots
 Boxplots
 Plots
 Contour plots / 3-dimensional plots
 Maps
 Gantt charts
 Trellis plots
 Trees / pedigrees / dendograms
 (mathematical) graphs / networks
 Flow charts / entity-relationship diagrams
4
Graph-n-go
 Solutions  reporting  graph-n-go
 The top two icons represent data models
 The rest are data viewers.
6
Graph-n-go
 Choose and configure a data model.
 Choose a dataset.
 Right mouse
button click on
the data model
and choose
properties.
 Set which
columns to use,
where clauses
7
etc.
Graph-n-go
Choose a viewer and position it on the viewer
area (e.g. a bar chart).
 Drag and drop the
data model onto the
viewer to associate
data with the viewer.
 Right mouse button
on the viewer and
choose properties.
 Configure (choose
variables to plot etc).
8
Graph-n-go
When finished graph can be exported to html etc.
Choose file  export  write to file
You’ll see more in the lab.
9
Graphic output within SAS
• You have already seen some graphic output
from within SAS.
• proc means, proc univariate, proc genmod,
proc lifetest etc. all produce graphs
• Other procedures in SAS specifically produce
graphs, even some procedures that are not part
of SAS/Graph (proc boxplot is an example)
Here our aim is to produce
publication/presentation-- quality graphs.
10
Graph basics
 SAS stores graphs in catalogs (an entity similar to a
folder in windows).
 Graphs are stored in a SAS proprietary format.
 By default graphs are stored in a catalog called Gseg
in the work library.
 Graphs can be translated to postscript, gif, jpeg, and
a number of other commonly used formats for
printing or including in other documents (Word,
html, etc.).
11
Graphic control
There are three ways to control the look of a
sas/graph.
1. Use options within the procedure
2. Use global commands
3. Use goptions
12
GOPTIONS
 set the environment for a graphics program to
run and send output
 independent of the program
 remain in effect for the entire SAS session
unless changed or reset
 control appearance of graphic elements by
specifying default fonts, colors, text heights
etc.
Useful when you want the same options in
multiple procs
13
PROC GOPTIONS
 used to review current GOPTIONS
 lists alphabetically all of the current
GOPTIONS in the LOG window
proc goptions;
run;
 Can also type goptions at the command
line
14
GOPTIONS
GOPTIONS options-list
 ROTATE= portrait or landscape
(will override the setting in the print dialog
box)
 RESET=ALL resets all options to
defaults including all
global statements
 RESET=GOPTIONS resets only goptions
statements
15
 COLORS=device dependent default
color list for device driver
 GUNIT= unit of measurement for
height in global statements, such as
TITLE and FOOTNOTE
cell - character cells
pct - percent of graphics area
in - inches
16
Data
• From the SAS samples folder.
• Three Californian pollutant monitoring
stations (AZU, LIV, SFO)
• One monthly measurement (taken on the
15th of the month) for CO, O3, SO4,
temperature etc. for each station.
36 observations in all
• Month is a numeric variable taking the
value 1 for January, 2 for February, etc.
17
Californian Air pollutant Data –
ca88air
18
Charts
• Examples
Look for graphic elements in each chart
Look for common data types
Look for similarities among the examples
19
20
21
22
23
24
25
26
Charts
• All the examples used a small number of
graphic elements
• Main difference between plots is the
polygon/area type
• Most involved a categorical/discrete
variable and a numeric variable.
A histogram uses a continuous variable to
create categories. The counts of a categorical
variable can be used to create the numeric
variable.
27
Proc GCHART
 produces charts based on the values of one or
more chart variables.
 produces vertical and horizontal bar charts,
block charts, pie charts etc.
 graphs based on statistics - counts, percentages,
sums, or means
 run-group processing
 numeric and character variables
28
Proc GCHART example
proc format; value seas 1 = ‘Win’ 2 = ‘Spr’
3 = ‘Sum’ 4 = ‘Fal’;
data ca88air;
set vol1.ca88air(where=(station=“SFO”));
if ( month in (12,1,2) ) then season = 1;
else if ( month in (3,4,5) ) then season = 2;
else if ( month in (6,7,8) ) then season = 3;
else if ( month in (9,10,11)) then season = 4;
format season seas.;
format month mth.;
run;
29
Proc GCHART example
title1 h=4
’Mean seasonal carbon monoxide for station SFO’;
footnote j=l h=4 f=simplex
'Bar Chart - vertical’;
proc gchart data=ca88air;
vbar season / sumvar=co type=mean
discrete
ctext=black clm=95 ;
run;
quit;
30
31
Proc GCHART syntax
PROC GCHART data=data set name;
One of the following:
VBAR variables
HBAR variables
STAR variables
PIE
variables
BLOCK variables
/
/
/
/
/
options;
options;
options;
options;
options;
run;
32
VBAR
 separate bar chart for each chart
variable
 each bar represents the statistic selected
for a value of the chart variable
 response axis (vertical) provides a scale
for statistic graphed
 midpoint axis - horizontal axis
33
VBAR SYNTAX
VBAR chart variables/ options;
chart-variable(s)
specifies one or more variables that
define the categories of data to chart.
options
specifies appearance, statistics, axes
and midpoint options
34
VBAR
 midpoints are the values of the chart
variable that identify categories of data.
By default, midpoints are selected or
calculated by the procedure. The way the
procedure handles the midpoints
depends on whether the values of the
chart variable are character, discrete
numeric, or continuous numeric.
 character chart variables- separate bar
is drawn for each value
35
VBAR
 numeric chart variables - each bar represents
a range of values
- DISCRETE option generates a midpoint for
each unique value of the chart variable.
- generates midpoints that represent ranges
of values. By default, determines the
ranges, calculates the median value of each
range, and displays the median value at
each midpoint on the chart. A value that
falls exactly halfway between two midpoints
is placed in the higher range.
36
VBAR OPTIONS
 For character or discrete numeric
values, you can use the MIDPOINTS=
option to rearrange the midpoints or to
exclude midpoints from the chart.
For character data
MIDPOINTS= list values in quotes
MIDPOINTS=‘Sydney’ ‘Atlanta’ ‘Paris’
37
VBAR OPTIONS
 For continuous numeric variables, use the
MIDPOINTS= option to change the number
of midpoints, to control the range of values
each midpoint represents, or to change the
order of the midpoints. To control the range
of values each midpoint represents, use the
MIDPOINTS= option to specify the median
value of each range. For example, to select
the ranges 20-29, 30-39, and 40-49, specify
MIDPOINTS=25 35 45
38
VBAR OPTIONS
Other options;
DISCRETEseparate bar for each value of
numeric variable
TYPE=statistic specifies the chart statistic.
FREQ frequency
PCT
percentage
SUM
sum (the default)
MEAN mean
CLM=confidence-level draws chart confidence
intervals (error bars)
39
VBAR SYNTAX
SUMVAR=variable
specifies variable to used for sum or mean
calculations for each midpoint. The resulting
statistics are represented by
the length of the bars along the response
axis, and they are displayed at major tick
marks. REQUIRED if specifying TYPEMEAN or SUM.
RAXIS= axisn response axis
MAXIS=axisn midpoint axis
40
GLOBAL STATEMENTS
 define titles, footnotes
 used to control axes, symbols, patterns, and
legends
 can be defined anywhere
inside a proc or before a proc
 in effect until canceled, replaced, or the end
of SAS session
 cancel by repeating statement with no
options or using
goptions RESET=ALL;
41
GLOBAL STATEMENTS

TITLE
 AXIS
defines titles
defines appearance of axes

FOOTNOTE
defines footnotes

PATTERN
defines patterns used in
graphs (histograms)

LEGEND
defines legends

SYMBOL
 NOTE
defines symbols (plotting)
adds text to graph
42
TITLE STATEMENT





creates, changes or cancels a title for all
subsequent graphics output in a SAS session
allowed up to 10 titles
keyword TITLE can be followed by
unlimited number of text strings and
options
text strings enclosed in single or double
quotes
most recently created TITLE number
replaces the previous TITLE of the same
number
43
Title syntax
TITLE<1,2....10> <options | ‘text’> ......
<options-n>| ‘text-n’>;
Options:
FONT=font
specifies the font for the
subsequent text.
HEIGHT=
specifies the height of text
H=n<units> characters in number of units
JUSTIFY=
J=R|L|C
specifies the alignment
By default, JUSTIFY=C=center
R=right L=left.
44
PATTERN STATEMENT
 defines the characteristics of patterns
used in charts
 type of fill pattern - solid, empty, lined
 color
An example of a global statement
45
PATTERN STATEMENT
PATTERN <1....99> options;
OPTIONS
COLOR= pattern color
VALUE=
E
S
Ln
Rn
Xn
where n
fill
empty
solid
left slanting lines
right slanting lines
crosshatched lines
is 1-5
1 indicating the lightest
46
Proc GCHART example
pattern1 color=blue value=fill;
pattern2 color=red value=fill;
proc gchart data=ca88air;
star month / sumvar=co type=mean
discrete
ctext=black noheading ;
run;
quit;
47
48
Exporting graphs
 Make sure the graphics window has focus,




by clicking on it.
File  export as Image
select type of image – gif, …
open other software program – Powerpoint
insert picture
49
Saving graphs
 Graphs can also be saved in a SAS catalog.
 They are stored in a SAS proprietary format.
 They can be viewed with proc greplay.
goptions replace;
libname mylib ‘c:\Temp\sasclass\myfiles’;
proc gchart data=mydat gout=lib.mygraphs;
…
proc greplay allows multiple plots on one page.
50
PROC GPLOT
 graphs one variable against another
producing presentation quality plots
 coordinates of each point correspond to
the values in one or more observations of
the input data set.
 run-group processing
- procedure does not end with a run
- submit new statements and produce
more graphs without another PROC
- ends with QUIT or PROC or DATA
51
Proc GPLOT

produces two-dimensional graphs that
plot one variable against another within
a set of coordinate axes
 graphs are automatically scaled to the
values of your data, although scaling
can be controlled with options or with
AXIS statements.
 scatterplots, bubble plots plots, plots
with interpolated lines (SYMBOL
statement)
52
VERTICAL AXIS Y variable
20
Tick Marks
10
Values
2
4
6
8
HORIZONTAL AXIS
X variable
53
GPLOT SYNTAX
PROC GPLOT data=data-set-name <options>;
PLOT request list </options list>;
request list is of the form:
vertical*horizontal e.g.
PLOT y*x;
vertical*horizontal=variable e.g.
PLOT y*x=z;
54
GPLOT SYNTAX
Graphics options on PLOT statement
 CTEXT= color
 LEGEND= LEGENDn
(uses nth global LEGEND statement)
 HAXIS=AXISn
(uses nth global AXIS statement)
 VAXIS=AXISn
(uses nth global AXIS statement)
55
Proc GPLOT example
• Suppose we are asked to draw a plot of ozone
by month for the three stations SFO, LIV, AZU.
After consulting the help we might try:
proc gplot data=ca88air;
plot o3 * month;
run;
quit;
which produces:
56
57
Proc GPLOT example
•
•
•
Increase the size of the text
use a format to print out Month names
clear the unwanted footnote
GOPTIONS gunits=pct htext=4;
footnote1;
proc gplot data=ca88air;
plot o3 * month ;
format month mth.;
title1 '1988 Air Quality Data - Ozone';
run;
58
59
Proc GPLOT example
•
•
•
back to the help
you can make a stratified plot by station
x axis too crowded - use a different format
proc gplot data=ca88air;
plot o3 * month = station;
format month mthc.;
title1 '1988 Air Quality Data - Ozone';
run;
60
61
Proc GPLOT example
•
•
the symbols in the plot are too small
use symbol global statements!
symbol1 v=dot i=join c=blue h=1.3;
symbol2 v=dot i=join c=green h=1.3;
symbol3 v=dot i=join c=brown h=1.3;
proc gplot data=ca88air;
plot o3 * month = station;
format month mthc.;
title1 '1988 Air Quality Data - Ozone';
run;
62
63
Proc GPLOT example
The x-axis is not right - use an axis global statement
axis1 minor = none
label = (f=simplex j=c
'Ozone levels at three locations')
major = (h=1.1)
order = (0 to 13 by 1)
value = (f=simplex h=3.0);
proc gplot data=ca88air;
plot o3 * month = station / haxis=axis1;
format month mthc.;
title1 '1988 Air Quality Data - Ozone';
run;
64
65
Proc GPLOT example
•
•
•
The x-axis has extra characters - use a new format or
use an axis global statement
y-axis label need to be rotated and placed in center of
axis
legend needs moving - legend global command
axis1 minor = none
label = (f=centb j=c
'Ozone levels at three locations')
major = (h=1.0)
order = (0 to 13 by 1)
value = (f=simplex h=3.0
" " "J" "F" "M" "A" "M" "J”
"J" "A" "S" "O" "N" "D" " ");
66
Proc GPLOT example
axis2 label = (f=centb rotate=0 angle=90 j=c 'Ozone')
value = (f=simplex h=3.0) ;
legend1 across=3
position=(bottom center inside)
label=none;
proc gplot data=ca88air;
plot o3 * month = station / haxis=axis1 vaxis=axis2;
format month mthc.;
title1 '1988 Air Quality Data - Ozone';
run;
67
68
proc g3d and proc contour produce
3-dimensional analogs of gplot
69
Maps
• You can use proc gmaps to make simple
presentation maps
• There is another product by SAS called
SAS/GIS - i.e. SAS / geographical
information system
70
71
Data
• taken from the CDC web page
• AIDS prevalence during 1997-1998
• rate is given for each state per 100,000 of
population
• state is given by name and two letter code
• map data is provided by SAS in the library
maps -- the map we will use is maps.us
• if you look in the maps library you will see
data for maps for most countries and world
maps
72
Data
• this data uses FIPS coding to match geographic
boundries e.g. the fips coding for Alaska is 02 and
Maryland is 24
• We need to join the AIDS data and the FIPS codes
in order to map the data
proc sort data=aids; by name;
proc sort data=state; by name;
data join;
merge aids(in=inaids) state(in=instate);
by name;
if inaids and instate then output join;
run;
73
Proc GMAP
• proc gmap is used to create a number of
different types of map
• the map we will be interested in is a choropleth
map -- this is a map in which the rates will be
color-coded by state.
• such a map shares many of the properties of a
chart, particulary a pie or star chart -- both use
areas to represent information, but in the case
of the choropleth map the color/shading
contains the display information
74
Proc GMAP
• First we set up some global title and footnote
statements:
title1 color=blue font=centb
"Acquired immunodeficiency syndrome (AIDS) by
state" ;
title2 font=cent "(per 100,000 of population)" ;
title3 font=cent "12 months ending June, 1998" ;
footnote1 color=green justify=left
" Choropleth Map";
75
Proc GMAP
• the syntax of proc gmap is like other graphic
procedures we have met, but it specifically
requires:
– a map dataset (maps.us in this case)
– an id variable which is present in both the map
dataset and the dataset we wish to map (in this case
the variable state is in both datasets and contains the
fips code)
– the syntax is:
proc gmap map=map data=data;
id idvar;
choro rate / options;
run;
76
Proc GMAP
title1 color=blue font=centb
"Acquired immunodeficiency syndrome (AIDS) by
state" ;
title2 font=cent "(per 100,000 of population)" ;
title3 font=cent "12 months ending June, 1998" ;
footnote1 color=green justify=left
" Choropleth Map";
proc gmap map=maps.us data=join;
id state;
choro rate / coutline=black
midpoints=5.0 10.0 15.0
20.0 25.0 35.0 ;
run;
77
78
Proc GMAP
Instead of a choropleth map, you could also
make a surface map. For example:
proc gmap map=maps.us data=join;
id state;
surface rate / constant=20 cbody=red nlines=100;
footnote1 color=green justify=left " Surface Map";
run;
79
80
Axis statement
 defines appearance and location of axes
and tick marks
 defines text and appearance of axis label
 defines order of data values on axis
 99 active AXIS statements in a SAS
session
Syntax: AXIS<1...99> <option(s)>;
81
Axis statement options

ORDER=(value list)
specifies the data values in the order they
are to appear on the axis. The values
specified by ORDER= are the major tick
mark values. These values are displayed at
the major tick marks unless they are
modified by the VALUE= option.
Examples:
ORDER=(10 to 50 by 10)
ORDER=(10,20,30,40,50)
82
Axis statement options
 LABEL= (text description ‘text string’);
By default, the text of the axis label is either the
variable name or a previously assigned variable
label. Enclose each string in quotation marks.
COLOR=text-color
ANGLE=degrees
FONT=font | NONE HEIGHT=text-height <units>
JUSTIFY=LEFT | CENTER | RIGHT
Example: Label= (font=swissb
color=blue j=l a=90
‘Systolic BP mmHG’) ;
83
Axis statement options
 VALUE=(text description1 ‘text’
... text descriptionn ‘textn’);
modifies the major tick mark values , that
is, the text that labels the major tick marks
on the axis. Text-description defines the
appearance and ‘text’ is the text of a major
tick mark value.
COLOR=text-color
ANGLE=degrees
FONT=font | NONE HEIGHT=text-height <units>
JUSTIFY=LEFT | CENTER | RIGHT
84
Symbol statement
 specifies symbols in GPLOT
 defines appearance of symbols, plot
lines, including bars, boxes, confidence
limits, and area fills
 interpolation methods
85
SYMBOL<1....99> options;
COLOR = symbol color
FONT= font
HEIGHT= n <units>
INTERPOL = R<type>
=STEP ( for KM plots)
=BOX
VALUE= symbol
WIDTH=n
86