Transcript Document
SAS Programming II
Matthew A. Lanham Doctoral Student Virginia Polytechnic Institute and State University Pamplin College of Business Department of Business Information Technology
April 30, 2020 2
Outline SAS Programming II
Short Course Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Load in Data Sets From Yesterday Changing Data From Numeric to Character and Vise-Versa Review of DATA and PROC steps Data Example: National Longitudinal Mortality Survey PROC TTEST, PROC NPAR1WAY, PROC FREQ SQL PROC SQL Basic Plotting Examples SAS Macros References April 30, 2020 21:42 3 of 30
Load Data SAS Programming II
Lets get our Data Sets From Yesterday Loaded
1) Drag your SAS_LISA_2.sas code into the Enhanced Editor Window 2) Highlight lines 1 through 132 3) Select Run April 30, 2020 21:42 4 of 30
Changing Data Types SAS Programming II
Changing Data Types – PUT
You can convert a Numeric variable to a Character variable by using the PUT function: General Form: new_variable = PUT(current_variable, new_format); DROP current_variable; RENAME new_variable = current_variable; April 30, 2020 21:42 5 of 30
Changing Data Types SAS Programming II
Changing Data Types – INPUT
You can convert a Character variable to a Numeric variable by using the INPUT function: General Form: new_variable = INPUT(current_variable, new_format); DROP current_variable; RENAME new_variable = current_variable; April 30, 2020 21:42 6 of 30
NLMS SAS Programming II
National Longitudinal Mortality Survey
• • • • Survey given by the Census Bureau.
Public access data http://www.census.gov/did/www/nlms/publications/public.html
This is the 3 rd release (June 1, 2008) 38 variables; 988,346 observations April 30, 2020 21:42 7 of 30
Using NLMS data SAS Programming II
Lets see what our NLMS data says about this…
Probably a meta-analysis 60 years!! That’s a long time ago http://www.nbcnews.com/id/44122528/ns/health-behavior/t/single-people-may-die-younger-new-study-finds/#.UeUPi20k-Eh April 30, 2020 21:42 8 of 30
Two-sample t-test SAS Programming II
PROC TTEST
A two-sample t-test allows us to test for statistical significance between two groups.
Standard assumptions: (1) IID (“random”) samples, (2) Normally distributed, (3) Equal variances General Form: PROC TTEST data=libref.filename; CLASS < variable that classifies set into two groups> VAR
Two-sample t-test SAS Programming II
PROC TTEST
Assumptions: (1) IID (“random”) samples (2) Normally distributed could try non-parametric test (3) Equal variances Use unequal variance test (Satterthwaite) Since we have unequal variances, we would make any conclusions using the unequal t-test The F-test is what you use to test for equal variances: 𝐻 1 : 𝜎 𝑚𝑚 = 𝜎 𝑢𝑚 𝑣𝑠. 𝐻 1 : 𝜎 𝑚𝑚 ≠ 𝜎 𝑢𝑚 April 30, 2020 21:42 10 of 30
Wilcoxon rank sum test SAS Programming II
PROC FREQ or PROC NPAR1WAY
Parametric methods of statistical inference require you to assume that your data come from some underlying probability distribution (in our case Normal) Nonparametric methods relax underlying assumptions about how the data is generated, because maybe you don’t know or the parametric assumptions are not satisfied.
Nonparametric equivalent to two-sample t-test is: Wilcox rank sum test (Wilcoxon-Mann-Whitney Test) Standard assumptions: (1) I hope this works or my dissertation is shot!
General Forms: PROC FREQ data=libref.filename
Wilcoxon rank sum test
PROC FREQ and PROC NPAR1WAY
𝐻 1 : 𝑑𝑖𝑠𝑡 ′ 𝑛 𝑜𝑓 𝑚𝑚 ≠ 𝑑𝑖𝑠𝑡𝑛 𝑜𝑓 𝑢𝑚 SAS Programming II April 30, 2020 We can conclude there is a difference in life expectancy among married men versus unmarried men.
21:42 12 of 30
Old age..
Me in 20 years…
Pursue Industry Academic Life Looks like Kim has a lot to look forward to either way.
April 30, 2020 21:42 SAS Programming II 13 of 30
SQL SAS Programming II
SQL
• • SQL (pronounced “SEQUEL”) means Structured Query Language The language used by relational database management systems (RDMS) Examples: Microsoft Access, MySQL, Oracle, PostgreSQL, SQL Server,… General Form: SELECT
PROC SQL
PROC SQL
PROC SQL allows you to treat your SAS Data Sets like Database tables.
SAS Programming II General Form: Does not require a RUN; PROC SQL; SELECT
PROC SQL
Creating a New Variable or New Table
Healthy average male weight (5’11, 65+) is approximately 181 pounds.
SAS Programming II Here we can create a new variable called “pct_above_avg” using AS : General Form: PROC SQL; CREATE TABLE libref.libname
SELECT AS
QUIT; April 30, 2020 21:42 16 of 30
PROC SQL SAS Programming II
Querying Multiple Tables
PROC SQL allows you to examine data stored in two tables. Combining tables horizontally is called “joining.” Joins do not alter tables!
General Form: PROC SQL; SELECT libname.variable2, libname.variable2
FROM libref.libnameA, libref.libnameB
WHERE libnameA.variable1 = libnameB.variable1; QUIT;
A lisa.B
April 30, 2020 21:42 17 of 30
PROC SQL SAS Programming II
Querying Multiple Tables
This example also shows how to do a many-to-many match merge which is not available using the standard MERGE in a DATA step.
General Form: PROC SQL; SELECT libname.variable2, libname.variable2
FROM libref.libnameA, libref.libnameB
WHERE libnameA.variable1 = libnameB.variable1; QUIT; April 30, 2020 21:42 18 of 30
PROC SQL SAS Programming II
Group By
Use a Group By when you are performing some aggregation function in the SELECT clause like sum(), count(), avg() April 30, 2020 21:42 19 of 30
Plotting
Basic plotting Examples
SGPLOT makes some nice plots. Here are couple quick examples.
Histogram SAS Programming II Side-by-side Box Plots April 30, 2020 21:42 20 of 30
Macros
SAS Macros
SAS macro variables enable you to substitute text in your SAS programs.
SAS Programming II When you reference a macro variable in a SAS program, SAS replaces the reference with the text value that has been assigned to that macro variable. This in turn will enable your programs to become more reusable and dynamic.
The first thing you want to do is create a LET statement.
General Form: %LET variable =value; When using %LET: 1. All values are stored as character strings 2. Mathematical expressions are not evaluated 3. The case of the value is preserved 4. Quotation marks that enclose literals are stored as part of the value 5. Leading and trailing blanks are removed from the value before assignment is made April 30, 2020 21:42 21 of 30
Macros
SAS Macros
A %PUT statement writes text to the SAS log.
General Form: %PUT text; When using %PUT: 1. Writes only to the SAS log 2. Always writes to a new log line, starting in column one 3. Writes blank line if text is not specified 4. Does not require quotation marks around text 5. Can be used either inside or outside a macro definition SAS Programming II The %INDEX function – within another string.
enables you to determine the position of the first character of a string April 30, 2020 21:42 22 of 30
Macros SAS Programming II
SAS Macros
Like macro variables, macro programs (macros) enable you to substitute text into your programs. Macros are different from macro variables because they use conditional logic to make decisions about the text that you substitute into your programs.
You use a macro program you must first define it with a %MACRO statement, then compile it.
General Form: %MACRO macro_name; text %MEND macro_name; April 30, 2020 21:42 23 of 30
Macros SAS Programming II
SAS Macros
You can also pass in parameters to your macro programs. You would first define and compile it like before.
General Form: %MACRO text %MEND ; macro_name(parameter1=, parameter2=,..); Side note: You will notice that your Results Viewer fills up rather quickly while you are working. If you want to delete everything in there you may find this code helpful.
April 30, 2020 21:42 24 of 30
DATA set manipulation SAS Programming II
Loops
In SAS Programming I we learned IF-THEN-ELSE statement syntax which allowed us to execute one statement. Recall… IF
and END General Form: IF
Macros SAS Programming II
SAS Macros
When you submit SAS programs with macros it goes to macro processor, which generates SAS code from the macro references. SAS then compiles your code and executes your program.
Oddly, not until final execution does SAS finally see any actual data values.
CALL SYMPUT(macro_var_name, value) takes a value from a DATA step and assigns it a macro variable. Note: You will need to convert numeric to character and trim blanks using this approach.
April 30, 2020 21:42 26 of 30
References
References
SAS Programming II April 30, 2020 27 of 30