Transcript Slide 1
Never Lose a SAS Job
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Not Again!!
Unexpected re-boot, system failures
Long running job didn’t complete
Must manually re-start job from step 1
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Gets the Stars Aligned...
SAS checkpoint-restart features
+ LSF requeue capabilities
+ SASGSUB batch submission utility
---------------------------------------------------
Completion of SAS Jobs in Minimal Time
Ideal for critical long-running SAS jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Checkpoint/Restart
Checkpoint mode
• Record info about data/proc steps in checkpoint library
Restart mode
• Global statements and macros re-executed
• SAS reads data in checkpoint library to determine which
steps completed
• Program execution resumes with step that was
executing when failure occurred
• Data/proc steps that completed successfully will not be
re-executed
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
To Set Up for Checkpoint-Restart
Specify following options on batch SAS invocation:
• STEPCHKPT – enables checkpoint mode
• STEPRESTART – causes SAS to use checkpoint-restart data
• NOWORKINIT – does not init WORK library when SAS starts
• NOWORKTERM – saves WORK library when SAS exits
• ERRORCHECK STRICT – puts SAS in syntax check mode
when error in libname, filename, %include and lock stmts
• ERRORABEND – causes SAS to terminate for most errors
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The WORK Directory
WORK is default location for checkpoint library
• Can use STEPCHKPTLIB to point to permanent library
• Must include libname as first statement in batch program
WORK directory must be on shared storage
Example:
• sas92 -noworkinit -noworkterm -work abc
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Use of Both STEPCHKPT and STEPRESTART
Initial invocation
• Results in checkpoint mode only
• No data in checkpoint library
Subsequent invocations
• Uses data from checkpoint library
• Continues checkpoint mode for remainder of program
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Manager – Queues
HOST A
SAS
Application
Normal Queue
SAS Grid
Manager
HOST B
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Requeue
Configure queue to automatically requeue job with
specific exit value
• REQUEUE_EXIT_VALUES=all ~0 ~1
− Any exit code other than 0 or 1 (success & warnings) will be
requeued
• REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1)
− Run requeued job on different host
• Jobs requeued 5 times by default
− MAX_JOB_REQUEUE lets you configure requeue limit, can
be globally specified for all queue or on per queue basis
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Rerun
A job is automatically rerun when
• Execution host becomes unavailable while a job is
running
• System fails while a job is running
• RERUNNABLE=yes
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
LSF Queue Definition
Jobs dispatched
from this queue
will be rerun if
system failures
Begin Queue
QUEUE_NAME = sas_rerun
PRIORITY = 40
NICE
= 10
RERUNNABLE = YES
REQUEUE_EXIT_VALUES = all ~0 ~1
DESCRIPTION = Jobs submitted to this queue will be
requeued automatically and also rerunnable.
End Queue
Jobs with fatal
exit code will be
requeued
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SASGSUB Capabilities
Standalone utility that will allow user to
• Submit SAS program to grid for processing
• Display status of user’s jobs on the grid
• Retrieve output from user’s jobs to local directory
• Kill jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Using SASGSUB
Advantages
• Submit and forget
• View job output while job is running
• Eliminate need for full SAS install on client
• Make use of SAS checkpoint/restart capability
NOTE - requires shared file system between client and
grid
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job
Command line interface
• sasgsub –gridsubmitpgm <sas_pgm>
Example output
Job ID:
6772
Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm"
Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job for Checkpoint-Restart
GRIDRESTARTOK
• Automatically adds the following options to batch SAS invocation
− STEPCHKPT, STEPRESTART, ERRORCHECK STRICT,
ERRORABEND, NOWORKINIT, NOWORKTERM
• Sets RERUNNABLE parm on job
Command line interface
• sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Getting Job Status
Command line interface
• sasgsub –gridgetstatus <job_id | _ALL_>
Example output
Current Job Information
Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57
Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57
Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Retrieving Results
Command line interface
• sasgsub –gridgetresults <job_id | _ALL_>
Example Output
Current Job Information
Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:
08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33
Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm
Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:
08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33
Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm
Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Putting It All Together
HOST A
normal queue
SAS
Application
SAS Grid
Manager
HOST B
sas_rerun queue
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Putting It All Together
HOST A
normal queue
SAS
Application
SAS Grid
Manager
HOST B
sas_rerun queue
HOST C
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Author contact information second line
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
A simple solution
Record a checkpoint number, save it in WORK
If restarting, skip PROC / DATA steps to there
Tokenize everything
Execute all global statements
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.