Transcript Document

SAS Programming II

Matthew A. Lanham Doctoral Student Virginia Polytechnic Institute and State University Pamplin College of Business Department of Business Information Technology

April 30, 2020 2

Outline SAS Programming II

Short Course Outline

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

Load in Data Sets From Yesterday Changing Data From Numeric to Character and Vise-Versa Review of DATA and PROC steps Data Example: National Longitudinal Mortality Survey PROC TTEST, PROC NPAR1WAY, PROC FREQ SQL PROC SQL Basic Plotting Examples SAS Macros References April 30, 2020 21:42 3 of 30

Load Data SAS Programming II

Lets get our Data Sets From Yesterday Loaded

1) Drag your SAS_LISA_2.sas code into the Enhanced Editor Window 2) Highlight lines 1 through 132 3) Select Run April 30, 2020 21:42 4 of 30

Changing Data Types SAS Programming II

Changing Data Types – PUT

You can convert a Numeric variable to a Character variable by using the PUT function: General Form: new_variable = PUT(current_variable, new_format); DROP current_variable; RENAME new_variable = current_variable; April 30, 2020 21:42 5 of 30

Changing Data Types SAS Programming II

Changing Data Types – INPUT

You can convert a Character variable to a Numeric variable by using the INPUT function: General Form: new_variable = INPUT(current_variable, new_format); DROP current_variable; RENAME new_variable = current_variable; April 30, 2020 21:42 6 of 30

NLMS SAS Programming II

National Longitudinal Mortality Survey

• • • • Survey given by the Census Bureau.

Public access data http://www.census.gov/did/www/nlms/publications/public.html

This is the 3 rd release (June 1, 2008) 38 variables; 988,346 observations April 30, 2020 21:42 7 of 30

Using NLMS data SAS Programming II

Lets see what our NLMS data says about this…

Probably a meta-analysis 60 years!! That’s a long time ago http://www.nbcnews.com/id/44122528/ns/health-behavior/t/single-people-may-die-younger-new-study-finds/#.UeUPi20k-Eh April 30, 2020 21:42 8 of 30

Two-sample t-test SAS Programming II

PROC TTEST

A two-sample t-test allows us to test for statistical significance between two groups.

Standard assumptions: (1) IID (“random”) samples, (2) Normally distributed, (3) Equal variances General Form: PROC TTEST data=libref.filename; CLASS < variable that classifies set into two groups> VAR RUN; Two-sided test: 𝐻 1 : 𝜇 𝑚𝑎𝑟𝑟𝑖𝑒𝑑_𝑚𝑎𝑙𝑒 ≠ 𝜇 𝑢𝑛𝑚𝑎𝑟𝑟𝑖𝑒𝑑_𝑚𝑎𝑙𝑒 One-sided test: 𝐻 1 : 𝜇 𝑚𝑎𝑟𝑟𝑖𝑒𝑑_𝑚𝑎𝑙𝑒 > 𝜇 𝑢𝑛𝑚𝑎𝑟𝑟𝑖𝑒𝑑_𝑚𝑎𝑙𝑒 April 30, 2020 21:42 9 of 30

Two-sample t-test SAS Programming II

PROC TTEST

Assumptions: (1) IID (“random”) samples (2) Normally distributed could try non-parametric test (3) Equal variances Use unequal variance test (Satterthwaite) Since we have unequal variances, we would make any conclusions using the unequal t-test The F-test is what you use to test for equal variances: 𝐻 1 : 𝜎 𝑚𝑚 = 𝜎 𝑢𝑚 𝑣𝑠. 𝐻 1 : 𝜎 𝑚𝑚 ≠ 𝜎 𝑢𝑚 April 30, 2020 21:42 10 of 30

Wilcoxon rank sum test SAS Programming II

PROC FREQ or PROC NPAR1WAY

Parametric methods of statistical inference require you to assume that your data come from some underlying probability distribution (in our case Normal) Nonparametric methods relax underlying assumptions about how the data is generated, because maybe you don’t know or the parametric assumptions are not satisfied.

Nonparametric equivalent to two-sample t-test is: Wilcox rank sum test (Wilcoxon-Mann-Whitney Test) Standard assumptions: (1) I hope this works or my dissertation is shot!

General Forms: PROC FREQ data=libref.filename ; TABLES group*measure / ; RUN; PROC NPAR1WAY data=libref.filename ; CLASS < variable that classifies set into two groups> VAR RUN; April 30, 2020 21:42 11 of 30

Wilcoxon rank sum test

PROC FREQ and PROC NPAR1WAY

𝐻 1 : 𝑑𝑖𝑠𝑡 ′ 𝑛 𝑜𝑓 𝑚𝑚 ≠ 𝑑𝑖𝑠𝑡𝑛 𝑜𝑓 𝑢𝑚 SAS Programming II April 30, 2020 We can conclude there is a difference in life expectancy among married men versus unmarried men.

21:42 12 of 30

Old age..

Me in 20 years…

Pursue Industry Academic Life Looks like Kim has a lot to look forward to either way.

April 30, 2020 21:42 SAS Programming II 13 of 30

SQL SAS Programming II

SQL

• • SQL (pronounced “SEQUEL”) means Structured Query Language The language used by relational database management systems (RDMS) Examples: Microsoft Access, MySQL, Oracle, PostgreSQL, SQL Server,… General Form: SELECT FROM < database table or libref.libname to be queried> WHERE GROUP BY HAVING ORDER BY Example: Database Table called Employee Note: You will always have at least a FROM SELECT and a SELECT Name, Salary FROM Employee WHERE ID >=200 AND Salary >48000 April 30, 2020 21:42 14 of 30

PROC SQL

PROC SQL

PROC SQL allows you to treat your SAS Data Sets like Database tables.

SAS Programming II General Form: Does not require a RUN; PROC SQL; SELECT FROM < database table or libref.libname to be queried> WHERE GROUP BY HAVING ORDER BY ; QUIT; • • Select * means select all columns Instead of desc , you could also use asc April 30, 2020 21:42 15 of 30

PROC SQL

Creating a New Variable or New Table

Healthy average male weight (5’11, 65+) is approximately 181 pounds.

SAS Programming II Here we can create a new variable called “pct_above_avg” using AS : General Form: PROC SQL; CREATE TABLE libref.libname

SELECT AS FROM < database table or libref.libname to be queried> ….

QUIT; April 30, 2020 21:42 16 of 30

PROC SQL SAS Programming II

Querying Multiple Tables

PROC SQL allows you to examine data stored in two tables. Combining tables horizontally is called “joining.” Joins do not alter tables!

General Form: PROC SQL; SELECT libname.variable2, libname.variable2

FROM libref.libnameA, libref.libnameB

WHERE libnameA.variable1 = libnameB.variable1; QUIT;

A lisa.B

April 30, 2020 21:42 17 of 30

PROC SQL SAS Programming II

Querying Multiple Tables

This example also shows how to do a many-to-many match merge which is not available using the standard MERGE in a DATA step.

General Form: PROC SQL; SELECT libname.variable2, libname.variable2

FROM libref.libnameA, libref.libnameB

WHERE libnameA.variable1 = libnameB.variable1; QUIT; April 30, 2020 21:42 18 of 30

PROC SQL SAS Programming II

Group By

Use a Group By when you are performing some aggregation function in the SELECT clause like sum(), count(), avg() April 30, 2020 21:42 19 of 30

Plotting

Basic plotting Examples

SGPLOT makes some nice plots. Here are couple quick examples.

Histogram SAS Programming II Side-by-side Box Plots April 30, 2020 21:42 20 of 30

Macros

SAS Macros

SAS macro variables enable you to substitute text in your SAS programs.

SAS Programming II When you reference a macro variable in a SAS program, SAS replaces the reference with the text value that has been assigned to that macro variable. This in turn will enable your programs to become more reusable and dynamic.

The first thing you want to do is create a LET statement.

General Form: %LET variable =value; When using %LET: 1. All values are stored as character strings 2. Mathematical expressions are not evaluated 3. The case of the value is preserved 4. Quotation marks that enclose literals are stored as part of the value 5. Leading and trailing blanks are removed from the value before assignment is made April 30, 2020 21:42 21 of 30

Macros

SAS Macros

A %PUT statement writes text to the SAS log.

General Form: %PUT text; When using %PUT: 1. Writes only to the SAS log 2. Always writes to a new log line, starting in column one 3. Writes blank line if text is not specified 4. Does not require quotation marks around text 5. Can be used either inside or outside a macro definition SAS Programming II The %INDEX function – within another string.

enables you to determine the position of the first character of a string April 30, 2020 21:42 22 of 30

Macros SAS Programming II

SAS Macros

Like macro variables, macro programs (macros) enable you to substitute text into your programs. Macros are different from macro variables because they use conditional logic to make decisions about the text that you substitute into your programs.

You use a macro program you must first define it with a %MACRO statement, then compile it.

General Form: %MACRO macro_name; text %MEND macro_name; April 30, 2020 21:42 23 of 30

Macros SAS Programming II

SAS Macros

You can also pass in parameters to your macro programs. You would first define and compile it like before.

General Form: %MACRO text %MEND ; macro_name(parameter1=, parameter2=,..); Side note: You will notice that your Results Viewer fills up rather quickly while you are working. If you want to delete everything in there you may find this code helpful.

April 30, 2020 21:42 24 of 30

DATA set manipulation SAS Programming II

Loops

In SAS Programming I we learned IF-THEN-ELSE statement syntax which allowed us to execute one statement. Recall… IF THEN ; ELSE IF THEN ; ELSE ; To execute multiple statements, we can implement a loop using DO statements.

and END General Form: IF THEN DO; ; END ; ELSE DO ; ; END ; April 30, 2020 21:42 25 of 30

Macros SAS Programming II

SAS Macros

When you submit SAS programs with macros it goes to macro processor, which generates SAS code from the macro references. SAS then compiles your code and executes your program.

Oddly, not until final execution does SAS finally see any actual data values.

CALL SYMPUT(macro_var_name, value) takes a value from a DATA step and assigns it a macro variable. Note: You will need to convert numeric to character and trim blanks using this approach.

April 30, 2020 21:42 26 of 30

References

References

SAS Programming II April 30, 2020 27 of 30