Introduction to the Class, and to the program

Download Report

Transcript Introduction to the Class, and to the program

Introduction to Statistical
Computing in Clinical Research
Biostatistics 212
Today...
• Course overview
– Course objectives
– Course details: grading, homework, etc
– Schedule, lecture overview
• Where does Stata fit in?
• Basic data analysis with Stata
• Stata demos
Course Objectives
• Learn how to use STATA
• Learn practical application of basic epidemiological
and statistical concepts using STATA
• Learn how to turn raw data into presentable tables
and figures
Course details
Introduction to Statistical Computing - 1 unit
Summer schedule – 2 lectures, 1 lab - Parnassus
9/6 lecture 1-2:30
9/13 lecture 1-2:30, lab 2:45-4:45
Fall schedule – Every other Tuesday – China Basin
9/20, 10/4, 10/18, 11/1, 11/15, 11/29
Lecture 3-4, Lab 4-5
Final Project due 12/6/05
Course details
Introduction to Statistical Computing - 1 unit
Grading: Satisfactory/Unsatisfactory
Requirements:
-Hand in all four Labs (even if late)
-Satisfactory Final Project
-80% of total points
Reading: Optional
Course details, cont
Faculty
Mark Pletcher, MD, MPH
514-8008
[email protected]
Scott Biggins, MD
502-5259
[email protected]
Lee Zane, MD, MAS
[email protected]
Course email for homework:
[email protected]
Overview of lecture topics
•
•
•
•
•
•
•
•
1- Introduction to STATA
2- Do files, log files, and workflow in STATA
3- Generating variables and manipulating data with STATA
4- Basic epidemiology with STATA I
5- Basic epidemiology with STATA II
6- Using Excel
7- Organizing a project, making a table
8- Making a figure with STATA or Excel
First 2 lectures here at Parnassus, the rest in China Basin
Overview of labs
• Lab 1 – Load a dataset and analyze it, learn about do and log
files.
• Lab 2 – Import data from excel, generate new variables and
manipulate data, document everything with do and log files.
• Lab 3 – Epidemiologic analysis using Stata
• Lab 4 – Using and creating Excel spreadsheets
Labs 2 and 3 will be spread across several lab sessions
Course is front-loaded – last 2 lab sessions dedicated to Final
Project
Overview of labs, cont
• First lab will be at Parnassus next week 2:45-4:45, the rest at
China Basin 4:00-5:00 after lecture
• Scott Biggins will lead 1 section, I will lead the other
• China Basin Computer Lab
– No computers in it!
– Must bring own laptop with Stata loaded
Overview of labs, cont
• Labs are generally due 1 week after the last lab session
dedicated to them
• Labs 2-4 and the Final Project should be emailed to the course
email address – [email protected].
• Answers posted 1 day after Lab is due
• If you don’t turn the lab in on time, you STILL must turn it in
to pass the class, even though you won’t get points credit for it
(per TICR policy). PLEASE CORRECT YOUR COURSE
OVERVIEW FORM
Final Project
• Create a Table and a Figure using your own data, document
analysis using Stata.
• Due 1 week after last lab session, 20 points docked for each 1
day late.
Getting started with STATA
Session 1
Types of software packages used
in clinical research
•
•
•
•
Statistical analysis packages
Spreadsheets
Database programs
Custom applications
– Cost-effectiveness analysis (TreeAge, etc)
– Survey analysis (SUDAAN, etc)
Software packages for analyzing
data
•
•
•
•
•
•
•
•
•
STATA
SAS
S-plus, and “R”
SPS-S
SUDAAN
Epi-Info
JMP
MatLab
StatExact
Why use STATA?
•
•
•
•
•
•
•
Quick start, user friendly
Immediate results, response
You can look at the data
Menu-driven option
Good graphics
Log and do files
Good manuals, help menu
Why NOT use STATA?
•
•
•
•
SAS is used more often
SAS does some things STATA does not
Programming easier with S-plus
Complicated data structure and
manipulation easier with SAS
• Epi-info is even easier than STATA?
STATA – Basic functionality
• Hold data for you
– Stata holds 1 “flat” file dataset only (.dta file)
• Listen to what you want
– Type a command, press enter
• Do stuff
– Statistics, data manipulation, etc
• Show you the results
– Results window
Demo #1
•
•
•
•
Open the program
Load some data
Look at it
Run a command
STATA - Windows
• Two basic windows
– Command
– Results
• Optional windows
– Variable list
– History of commands
• Other functions
– Data browser/editor
– Do file editor
– Viewer (for log, help
files, etc)
STATA - Buttons
•
•
•
•
•
The usual – open, save, print
Log-file open/suspend/close
Do-file editor
Browse and Edit
Break
STATA - Menus
• Almost every command can be accessed via
menu
Demo #2
• Enter in some data
• Look at it
• Run a couple of commands
Menu vs. Command line
• Menu advantages
– Look for commands you don’t know about
– See the options for each command
– Complex commands easier – learn syntax
• Command line advantages
– Faster (if you know the command!)
– “Closer” to the program
– Only way to write “do” files
• Document and repeat analyses
STATA commands
Describing your data
• describe [varlist]
– Displays variable names, types, labels
• list [varlist]
– Displays the values of all observations
• codebook [varlist]
– Displays labels and codes for all variables
STATA commands
Descriptive statistics – continuous data
• summarize [varlist] [, detail]
– # obs, mean, SD, range
– “, detail” gets you more detail (median, etc)
• histogram varname
– Simple histogram of your variable
• ci [varlist]
– Mean, standard error, and confidence intervals
– Actually works for dichotomous variables, too.
STATA commands
Descriptive statistics – categorical data
• tabulate [var]
– Counts and percentages
– (see also, table - this is very different!)
STATA commands
Analytic statistics – 2 categorical variables
STATA commands
Analytic statistics – 2 categorical variables
• tabulate [var1] [var2]
– “Cross-tab”
– Descriptive options
, row
, col
(row percentages)
(column percentages)
– Statistics options
, chi2
, exact
(chi2 test)
(fisher’s exact test)
Getting help
• Try to find the command on the pull-down menus
• Help menu
– If you don’t know the command - Search...
– If you know the command - Stata command...
• Try the manuals
– more detail, theoretical underpinnings, etc
STATA commands
Analytic statistics – 1 categorical, 1 continuous
STATA commands
Analytic statistics – 1 categorical, 1 continuous
• bysort catvar: sum [contvar]
– mean, SD, range of one in subgroup
• ttest [contvar], by([catvar])
– t-test
• oneway [contvar] [catvar]
– ANOVA
• table [catvar] [, contents(mean
[contvar]…)
– Table of statistics
STATA commands
Analytic statistics – 2 continuous
STATA commands
Analytic statistics – 2 continuous
• scatter [var1] [var2]
– Scatterplot of the two variables
• pwcorr [varlist] [, sig]
– Pairwise correlations between variables
– “sig” option gives p-values
Demo #3
•
•
•
•
Load a STATA dataset
Explore the data
Describe the data
Answer some simple research questions
Next week
• Do files, log files, and workflow in Stata
• In lab next week:
– Familiarize yourself with Stata
– Practice today’s material (loading and
analyzing data)
– Start learning how to use do and log files
• You can leave lab early if you finish!