PROC UNIVARIATE vs. PROC SUMMARY

Download Report

Transcript PROC UNIVARIATE vs. PROC SUMMARY

PROC UNIVARIATE vs.

PROC SUMMARY A Comparison of Performance

Background

• • • • For many of the common things I do, PROCs UNIVARIATE and SUMMARY can accomplish similar results Many years ago, someone suggested I use PROC UNIVARIATE because it had more functions They claimed that both procedures performed about the same – I didn’t bother to check that out Unless I needed something that could be done only with PROC SUMMARY, I got in the habit of using PROC UNIVARIATE

More Background

• • Several months ago, I was becoming frustrated with how long it was taking to run some large PROC UNIVARIATEs for simple functions (like SUM, MEAN, MIN, MAX, etc.) – It also was using a lot of CPU There had to be a better way

My First Experiment

• • • • • Wrote DATA steps to do simple functions Benchmarked the DATA steps again PROC UNIVARIATE steps Compared output results to ensure integrity Ran tests using SAS on both Mainframe and PC The results were surprising

5 000 4 500 1 500 1 000 500 0 4 000 3 500 3 000 2 500 2 000 MF-01

Elapsed Time PROC UNIVARIATE vs. DATA Step 2 Columns with SUM, MIN, MAX

MF-02 PROC Univariate Elapsed MF-03 Data Step Elapsed PC-01 PC-02

3 500 3 000 2 500 2 000 1 500 1 000 500 0 MF-01

CPU Time PROC UNIVARIATE vs. DATA Step 2 Columns with SUM, MIN, MAX

MF-02 PROC Univariate CPU MF-03 Data Step CPU PC-01 PC-02

Results of First Test

• • Data step showed: – 95% reduction in elapsed time – 99% reduction in CPU time Decided to also run tests comparing PROC SUMMARY

Elapsed Time PROC UNIVARIATE vs. DATA Step and PROC SUMMARY 2 Columns with SUM, MIN, MAX

5 000 4 500 1 500 1 000 500 0 4 000 3 500 3 000 2 500 2 000 MF-01 MF-02 PROC Univariate Elapsed MF-03 Data Step Elapsed PC-01 PROC Summary Elapsed PC-02

3 500 3 000 2 500 2 000 1 500 1 000 500 0 MF-01

CPU Time PROC UNIVARIATE vs. DATA Step and PROC SUMMARY 2 Columns with SUM, MIN, MAX

MF-02 PROC Univariate CPU MF-03 Data Step CPU PC-01 PROC Summary CPU PC-02

Results of First Test

• Compared to PROC UNIVARIATE, PROC SUMMARY showed: – 94% reduction in elapsed time – 96% reduction in CPU time

Overall Test Results

• • • • Ran many tests on several types of data Data Step vs. PROC UNIVARIATE – Elapsed time was 71% to 95% lower – CPU was 74% - 99% lower PROC SUMMARY vs. PROC UNIVARIATE – Elapsed time was 72% to 94% lower – CPU was 76% - 96% lower In tests where PROC MEANS was also run, results were similar to PROC SUMMARY – Sometimes a little less CPU and elapsed time, sometimes a little more

Other Observations

• • • Data steps performed slightly better then PROCs SUMMARY and MEANS for simple functions but not as good on more complex functions Most tests were run on both mainframe and PC – Elapsed time and CPU improvement percentages (vs. PROC UNIVARIATE) were usually similar on both platforms The tests were run on an older, slower mainframe and a new Windows 7 PC – For each test, the same data and parameters were run on both the mainframe and PC • The PC generally ran 80-95 percent faster than the same tests on the mainframe (for tested functions) and used 85-95 per less CPU