Transcript Slide 1

Never Lose a SAS Job
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Not Again!!
 Unexpected re-boot, system failures
 Long running job didn’t complete
 Must manually re-start job from step 1
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Gets the Stars Aligned...
SAS checkpoint-restart features
+ LSF requeue capabilities
+ SASGSUB batch submission utility
---------------------------------------------------
Completion of SAS Jobs in Minimal Time
Ideal for critical long-running SAS jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Checkpoint/Restart
 Checkpoint mode
• Record info about data/proc steps in checkpoint library
 Restart mode
• Global statements and macros re-executed
• SAS reads data in checkpoint library to determine which
steps completed
• Program execution resumes with step that was
executing when failure occurred
• Data/proc steps that completed successfully will not be
re-executed
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
To Set Up for Checkpoint-Restart
 Specify following options on batch SAS invocation:
• STEPCHKPT – enables checkpoint mode
• STEPRESTART – causes SAS to use checkpoint-restart data
• NOWORKINIT – does not init WORK library when SAS starts
• NOWORKTERM – saves WORK library when SAS exits
• ERRORCHECK STRICT – puts SAS in syntax check mode
when error in libname, filename, %include and lock stmts
• ERRORABEND – causes SAS to terminate for most errors
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The WORK Directory
 WORK is default location for checkpoint library
• Can use STEPCHKPTLIB to point to permanent library
• Must include libname as first statement in batch program
 WORK directory must be on shared storage
 Example:
• sas92 -noworkinit -noworkterm -work abc
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Use of Both STEPCHKPT and STEPRESTART
 Initial invocation
• Results in checkpoint mode only
• No data in checkpoint library
 Subsequent invocations
• Uses data from checkpoint library
• Continues checkpoint mode for remainder of program
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Manager – Queues
HOST A
SAS
Application
Normal Queue
SAS Grid
Manager
HOST B
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Requeue
 Configure queue to automatically requeue job with
specific exit value
• REQUEUE_EXIT_VALUES=all ~0 ~1
− Any exit code other than 0 or 1 (success & warnings) will be
requeued
• REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1)
− Run requeued job on different host
• Jobs requeued 5 times by default
− MAX_JOB_REQUEUE lets you configure requeue limit, can
be globally specified for all queue or on per queue basis
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Rerun
 A job is automatically rerun when
• Execution host becomes unavailable while a job is
running
• System fails while a job is running
• RERUNNABLE=yes
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
LSF Queue Definition
Jobs dispatched
from this queue
will be rerun if
system failures
Begin Queue
QUEUE_NAME = sas_rerun
PRIORITY = 40
NICE
= 10
RERUNNABLE = YES
REQUEUE_EXIT_VALUES = all ~0 ~1
DESCRIPTION = Jobs submitted to this queue will be
requeued automatically and also rerunnable.
End Queue
Jobs with fatal
exit code will be
requeued
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SASGSUB Capabilities
 Standalone utility that will allow user to
• Submit SAS program to grid for processing
• Display status of user’s jobs on the grid
• Retrieve output from user’s jobs to local directory
• Kill jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Using SASGSUB
 Advantages
• Submit and forget
• View job output while job is running
• Eliminate need for full SAS install on client
• Make use of SAS checkpoint/restart capability
 NOTE - requires shared file system between client and
grid
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job
 Command line interface
• sasgsub –gridsubmitpgm <sas_pgm>
 Example output
Job ID:
6772
Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm"
Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job for Checkpoint-Restart
 GRIDRESTARTOK
• Automatically adds the following options to batch SAS invocation
− STEPCHKPT, STEPRESTART, ERRORCHECK STRICT,
ERRORABEND, NOWORKINIT, NOWORKTERM
• Sets RERUNNABLE parm on job
 Command line interface
• sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Getting Job Status
 Command line interface
• sasgsub –gridgetstatus <job_id | _ALL_>
 Example output
Current Job Information
Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57
Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57
Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Retrieving Results
 Command line interface
• sasgsub –gridgetresults <job_id | _ALL_>
 Example Output
Current Job Information
Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:
08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33
Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm
Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:
08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33
Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm
Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Putting It All Together
HOST A
normal queue
SAS
Application
SAS Grid
Manager
HOST B
sas_rerun queue
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Putting It All Together
HOST A
normal queue
SAS
Application
SAS Grid
Manager
HOST B
sas_rerun queue
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Author contact information second line
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
A simple solution
 Record a checkpoint number, save it in WORK
 If restarting, skip PROC / DATA steps to there
 Tokenize everything
 Execute all global statements
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.