R and R commander: How to get started with the analyses

Download Report

Transcript R and R commander: How to get started with the analyses

Vaccination data analyses
with R and R commander
Lars T. Fadnes & Halvor Sommerfelt
Centre for International Health
University of Bergen
Aim for this session
• Introduce a brilliant tool
• Give examples on use of the tool
• Guide to further knowledge
What is R?
• R is a free software environment that
includes a set of base packages for
graphics, math, and statistics.
• You can make use of specialized
packages contributed by R users or write
your own new functions.
Why R?
• Very powerful
• Developing extremely quickly
• Working on different platforms (not only
Microsoft Windows…)
• Free of all costs
Why don’t all use R?
What is R commander?
Why R commander?
• Powerful
• Free of all costs
• Working on different platforms (not only
Microsoft Windows…)
• Easy to learn and to use…
How to install R?
For Windows:
• http://cran.r-project.org/bin/windows/base/
– Easy
• If here at UiB, the information department will fix it for you if you just
ask them to add it for you
For Linux (Ubuntu etc):
- Very Easy
- Just search for R-base-core in Synaptic Package Manager and add it
-
http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html
How to install R commander?
- install.packages("Rcmdr", dependencies=TRUE)
Some things to note first
• R is case-sensitive
– help, Help, HELP and HELF are different…
– Recommendation:
• Choose one style and stick to it
• Avoid using shortcut keys in R commander!
• If it’s something you don’t know?
– There are lot’s of good information on the web
• Particularly for R
How to start?
• Open R
– Load packages
• Rcmdr
or write
• library(Rcmdr)
Menu
•
File Menu:
– items for loading and saving script files;
– for saving output and the R workspace;
– and for exiting
•
Edit Menu:
– items (Cut, Copy, Paste, etc.) for editing the contents of the script and output
windows.
– Right clicking in the script or output window also brings up an edit “context” menu.
•
Data
– Submenus containing menu items for reading and manipulating data.
•
Statistics
– Submenus containing menu items for a variety of basic statistical analyses.
Menu
•
Graphs
–
•
Models
–
•
Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for
example, as a substitute for statistical tables) and samples from these distributions.
Tools
–
•
Menu items and submenus for obtaining numerical summaries, confidence intervals,
hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic
quantities, such as residuals, to the data set.
Distributions
–
•
Menu items for creating simple statistical graphs.
Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data
saved in another package), and for setting some options.
Help Menu
–
items to obtain information about the R Commander (including this manual). As well, each R
Commander dialog box has a Help button (see below).
•
Script Window
– R commands generated by the R
Commander
– You can also type R commands directly
into the script window or the R Console
– The main purpose of the R
Commander, however, is to avoid
having to type commands.
•
Output Window
– Printed output
•
Messages Window
– Displays error messages, warnings, and
notes
•
Graphics Device window
– When you create graphs, these will
appear in a separate window outside of
the main R Commander window.
Easily available function:
Let’s get started…
• Change directory
– Find the folder where
you placed your data file
• Import data
– Give it the name: Vaccine
• Save workspace as
– Give a name to your file
Data Types
• Vectors
– including continuous variables
• Factors
– Nominal/ categorical
• Matrices, arrays and data frames
• Lists
http://www.statmethods.net/input/datatypes.html
Define the datatypes
Variables in dataset
• IDNO
– ID number:
categorical = factor
• Adjuvant
– Categorical = factor:
• 0: Placebo
1: Adjuvant
• Day0
– Continuous (vector)
• Day30
– Continuous (vector)
– Variables are now defined as vectors
– Factors (categorical variables needs to be
defined)
Now we’re ready to answer some
scientific questions…
Vaccination response
• Effect of vaccination is often calculated by measuring
antibodies before and after vaccination
– Response = concentration after vaccination
concentration before vaccination
• Compute new variable: response
– concentration before vaccination
– concentration after vaccination
Day0
Day30
Does the new variable look
reasonable?
• View data set
• Histogram
Summarize variable
• Calculate mean, median and standard
deviation for
– response
• for each group
• Numerical summaries
• Placebo: adjuvant = no
• Adjuvant: adjuvant = yes
• Mean, standard deviation, the percentiles and
number of cases
Doing calculations for subset group
• Before making the ”Placebo” subset, remember
to select main dataset (Vaccine)
– You can now easily change between the datasets
•
Make a histogram of the response
• First for the adjuvant dataset
• Then for the adjuvant dataset
• And for the placebo dataset
– The histograms will be printed in the R window (not
inside R commander)
– Right click on the graph and you can copy it as
metafile to paste it into a document, print it or save it
Box plot
• Box plot for response
100
50
0
response
150
– By AdjuvantCat
– (for complete Vaccine dataset)
no
yes
AdjuvantCat
• Does it look normally distributed?
• We might need to do a logarithmic
transformation of the variable
– Compute new variable:
• logresponse
• Does it look more normally distributed (histogram)?
Check the logresponse
• Are the means and medians similar for both groups
(placebo and adjuvant)?
– Numerical summary (as earlier, but with logrespons)
• Check with an independent sample t-test
– Adjuvant vs logresponse
– Is the response different among those getting adjuvant?
• Will this be confirmed with a robust test not
assuming normal distribution?
• Check with a
– non-paramethric
» two-sample wilcoxon test (log rank test)
» Use the ’Exact’ test
– Will the test be different when using response or logresponse
» Why or why not?
Recoding and making new
variables
Different types of examples on how to recode in the box:
– Data  manage variables
•
value = ”factor”
–
•
value, value, value = ”factor”
–
•
Listed with comma
value:value = ”factor”
–
•
•
Factor can be either number or word
From lowest to highest values
else all other values
NA missing
•
How was the baseline immunological response (before vaccination)?
– Variable: Day0
• Numerical summary
•
Let us take a look into the upper quartile specifically
– Make subsets:
• ”UpperQuartile”
• ”OtherQuartiles”
Is baseline response associated
with response after intervention?
• We can first repeat the t-tests for the
subsets
• ”UpperQuartile”
• ”OtherQuartiles”
– Independent sample t-test
» logresponse
» AdjuvantCat
• What is the results?
– What are the differences between the means
• What does it mean?
Is intervention associated with
response?
• Linear regression analysis with the full
dataset
– Statistics  Fit model  Linear model
• Dependant variable (left box):
– logresponse
• Independent variable:
– AdjuvantCat
Is baseline response associated
with response after intervention?
• Linear regression analysis with the full
dataset
– Statistics  Fit model  Linear model
• Dependant variable (left box):
– logresponse
• Independent variables (right box with ’+’ between
each):
– AdjuvantCat
– UpperQuartileDay0
Is baseline response associated
with response after intervention?
• Linear regression analysis with the full
dataset
– Statistics  Fit model  Linear model
• Dependant variable (left box):
– logresponse
• Independent variables (right box with ’+’ between
each):
– AdjuvantCat
– Day0 (continuous)
What will you conclude?
• Is adjuvant/placebo associated with
response?
• Is initial immune response associated with
response after vaccination?
How to save?
• Save R workspace as…
– This will save your data (in the R format)
• Save output as…
– This will save your output
– Another strategy is to cut and paste what you want to save
• Always save the commands
– essential if you want to re-run the analyses later
– WordPad is a better option than Word etc (does not autocorrect change to upper case etc)
How to write and a command?
• Simply write the command in the script
window, mark it and click ’Submit’ or press
Ctrl+R
Nice to know:
• When writing comments in the syntax,
start with the following sign
#
• If you are uncertain about a function, use
google or help(name-of-function)
Are there differenses in syntax
between R and R commander?
•
Some few:
– Commands that extend over more than one line should have the second and
subsequent lines indented by one or more spaces or tabs; all lines of a multiline
command must be submitted simultaneously for execution.
– Commands that include an assignment arrow (<-) will not generate printed
output, even if such output would normally appear had the command been
entered in the R Console [the command print(x <- 10), for example]. On the other
hand, assignments made with the equals sign (=) produce printed output even
when they normally would not (e.g., x = 10).
– Commands that produce normally invisible output will occasionally cause output
to be printed in the output window. This behaviour can be modified by editing the
entries of the log-exceptions.txt file in the R Commander’s etc directory.
– Blocks of commands enclosed by braces, i.e., {}, are not handled properly unless
each command is terminated with a semicolon (;). This is poor R style, and
implies that the script window is of limited use as a programming editor. For
serious R programming, it would be preferable to use the script editor provided
by the Windows version of R itself, or – even better – a programming editor.
True or false quiz
• R was the program that gave their shareholders most
profit last year?
• R commander only works for Ms Windows?
• R has even more functions than R commander?
• R is built by a few people with a secret source-code?
• There are a lot of enthusiastic people working with R
providing help to their peers in the R forum?
Further reading:
• The R Commander A Basic-Statistics Graphical User Interface to R John Fox 2005.pdf
– http://www.jstatsoft.org/v14/i09/paper
• Getting started with the R Commander: a basic-statistics graphical
user interface to R
– http://socserv.mcmaster.ca/jfox/Getting-Started-with-the-Rcmdr.pdf
• Quick-R: magnificent guide
– http://www.statmethods.net/
• http://cran.r-project.org/manuals.html
R help forum:
• http://r.789695.n4.nabble.com/
• You have learnt some basic skills and can
now experiment with the program yourself
Questions and comments