Into to Linux Part 1-4

Download Report

Transcript Into to Linux Part 1-4

Introduction to Shell
scripting
Presented by:
Shailender Nagpal, Al Ritacco
Research Computing
UMASS Medical School
AGENDA
Shell basics: Scalars, Arrays, Expressions, Printing
Built-in commands, Blocks, Branching, Loops
String and Array operations
File operations: Text processing utilities SED, AWK
Writing custom functions
Providing input to programs
Shell scripting strategies
Using Linux scripts with the LSF cluster
2
What is Shell scripting?
• Series of linux commands in a text file that can be
executed on a linux shell in top-down fashion
• The Linux shell provides a high-level, generalpurpose, interpreted, interactive programming
environment
• Simple iterative, top-down, left to right programming
style for users to create small, and large’ish programs
– Mainly for automating linux tasks but also for writing
integrated workflows
3
Features of Shell scripting
•
•
•
•
Linux code is for Linux operating system only
Easy to use and lots of resources are available
Procedural programming, not strongly "typed"
Similar programming syntax as other languages
– if, for, do, functions, etc
• Provides limited methods to manipulate data
– scalars, arrays
• Statements don’t need to be terminated by semi-colon
(but can be)
4
Advantages of Shell scripting
• Not as fully-featured as C, Java, Perl, Shell script, but
still very useful for automation, file processing and
workflow development, making it advantageous to
use it in certain applications like Bioinformatics
–
–
–
–
–
–
Fewer lines of code than C, Java. Similar to Perl, Python
No compilation necessary. Prototype and run!
Run every line of code interactively
Vast command library
Save coding time and automate computing tasks
Code is even more concise than Perl and Python
5
Types of linux "shells"
• Shells provide a user interface (command prompt) to
the underlying unix operating system
• They give users an environment to execute commands
upon login
• Many shells are available, which are mostly the same,
but with some minor differences
– Bourne shell (sh), C shell (csh), TC shell (tcsh), Korn shell
(ksh), Bourne Again Shell (bash)
• Which "shell" are you using?
echo $SHELL
6
Shell features
FEATURES
Bourne C
TC
Korn
BASH
Command history
No
Yes
Yes
Yes
Yes
Command alias
No
Yes
Yes
Yes
Yes
Shell scripts
Yes
Yes
Yes
Yes
Yes
Filename completion
No
Yes
Yes
Yes
Yes
Command line editing
No
No
Yes
Yes
Yes
Job control
No
Yes
Yes
Yes
Yes
7
First Shell program
• The obligatory "Hello World" program
#!/usr/bin/bash
# Comment: 1st program: variable, echo
name="World"
echo "Hello $name"
echo "Hello ${name}"
• Save as ".sh" extension, then at linux shell:
chmod 755 hello.sh
./hello.sh
# Make it executable
Hello World
Hello World
8
Understanding the code
• The first line of a Shell script requires an interpreter
location, which is the path to the "bash" shell
#!/path/to/bash
• 2nd line: A comment, beginning with "#"
• 3rd line: Declaration of a string variable
• 4th, 5th line: echoing some text to the shell with a
variable, whose value is interpolated by $ sign
• The quotes are not echoed, and "name" is replaced by
"World" in the output.
9
Second program
• Report summary statistics of DNA sequence
#!/usr/bin/bash
dna="ATAGCAGATAGCAGACGACGAGA"
dna_length=`echo $dna |wc -m`
echo "Length of DNA is $dna_length"
echo "Number of A bases are" `echo $dna | grep -o "A" | wc -l`
echo "Number of C bases are" `echo $dna | grep -o "C" | wc -l`
echo "Number of G bases are" `echo $dna | grep -o "G" | wc -l`
echo "Number of T bases are" `echo $dna | grep -o "T" | wc -l`
echo "Number of GC dinucleotides are ", `echo $dna | grep -o "GC" | wc -l`
gc=$((`echo $dna | grep -o "G" | wc -l`+`echo $dna | grep -o "C" | wc -l`))
gc_per=`echo $gc/$dna_length*100 | bc -l`
printf "G+C percent content is %2.1f" $gc_per
• Quick summary, re-use code to find motifs, RE sites, etc
10
Linux Commands
ls
mkdir
tail
more
date
wc
gzip
scp
file
bg
who
chmod
awk
cp
pwd
clear
history
who
grep
tar
rsync
cut
fg
df
chown
test
rm
rmdir
vi
export
whoami
man
file
ftp
tee
wait
du
chgrp
expr
11
mv
cat
passwd
alias
last
sort
ssh
echo
dos2unix
top
screen
grep/egrep
csplit
cd
head
less
function
exit
uniq
rsh
touch
ps
time
last
sed
diff
Linux Commands (…contd)
find
hostname
mail
nohup
renice
tee
uname
vmstat
zip
pico
source
svn
locate
jobs
make
passwd
rlogin
test
untar
wget
env
nano
exec
free
finger
join
mount
ps
rsh
top
unless
which
su
bzip2
bash
banner
12
history
kill
umount
pstree
set
tr
unzip
while
sudo
sleep
umask
fgrep
host
ln
nl
nice
setenv
unalias
uptime
xargs
emacs
disown
paste
crontab
Linux Commands (…contd)
if
for
esac
then
do
else
done
13
elif
while
fi
case
Application commands
allegro
bowtie
crossbow
fastx
mfold
primer3
snpEff
vcftools
bedtools
bwa
cufflinks
maq
plink
prinseq
sratools
vmd
blast
clustalW
fasta
maqview
polyphen
samtools
tophat
namd
• To run the "blast" command for example, run this:
blastall –p blastn –d nr –i query.fa
14
Shell comments
• Use "#" character at beginning of line for adding
comments into your code
• Helps you and others to understand your thought
process
• Lets say you intend to sum up a array of numbers
#
100
x=1 x
(sum from 1 to 100 of X)
• The code would look like this:
sum=0
# Initialize variable called "sum" to 0
for i in $(seq 1 100); do # Use "for" loop to iterate over 1 to 100
sum=$(( $sum + $i)) # Add the previous sum to x
done
echo "The sum of 1..x is $sum" # Report the result
15
Shell script: Variables
• Variables
– Provide a location to "store" data we are interested in
• Strings, decimals, integers, characters, arrays, …
– What is a character – a single letter or number
– What is a string – a array of characters
– What is floating point – a number 4.7 (sometimes referred
to as a real if there is a decimal point)
• Variables can be assigned or changed easily within a
Shell script
16
Variables and built-in keywords
• Variable names should represent or describe the
data they contain
– Do not use meta-characters, stick to alphabets, digits and
underscores. Begin variable with alphabet
• Shell scripting as a language has keywords that
should not be used as variable names. They are
reserved for writing syntax and logical flow of the
program
– Examples include: if, then, fi, for, while, do, done, switch,
function, etc
17
Special shell variables
• Shell built in variables available:
$# - Shows number of command line arguments
$* - All arguments are sent to shell
$@ - All arguments, any type, are sent to shell
$$ - Process ID of the program running or ran
$! – Process ID of the last program put into the Back
Ground
$? – Exit code of the command just submitted for
execution
18
Shell "Environment" variables
• Try out the commands
env
printenv
• Variables that control the behavior of the shell are
called Environment Variables
• An important variable is the “PATH” variable, which
controls the order of the directories where
commands will be executed from
• Try:
which man
19
Variables, Arrays
• Variables that hold single strings are string variables
• Variables that hold single integers are integer
variables
rank=3
score=5.3
dna="ATAGGATAGCGA"
• Collection of variables are called arrays… could be a
array of students in a class, scores from a test, etc
students=("Alan" "Shailender" "Chris")
scores=(89.1 65.9 92.4)
binding_pos=(9439984 114028942)
20
Printing text and variables
• Single quotes do not process delimiters or variables
and are therefore generally not used
• Double quotes process variables prefixed with the
"$" sign. Delimiters are not processed with "echo" Ex:
x=1
echo "This \t is a test\nwith text $x"
Output:
This \t is a test\nwith text 1
• To process delimiters use "printf",
printf "This is a \t tab"
printf "This is a \t tab %s %s" $x $x
21
Printing arrays
• Array variables can also be echoed as a array with a
default delimiter, but another way to echo arrays is
put them in a loop and echo them as scalars
students=("Alan" "Shailender" "Chris")
echo "students\n"
# Does not work!
printf "%s %s %s" ${students[@]} # Method 1
printf "%s %s %s" ${students[0]} ${students[1]}
${students[2]} # Method 2
• If you run this as a program, you get this output:
Alan Shailender Chris
Alan Shailender Chris
# Method 1
# Method 2
22
Math Operators and Expressions
• Math operators
–
–
–
–
–
Eg: echo $((3 + 2))
+ is the operator
We read this left to right
Basic operators such as + - / * ** ( ^ )
Variables can be used
echo "Sum of 2 and 3 is " $((2+3))
x = 3
echo "Sum of 2 and x is " $((2+$x))
• PEMDAS rules are followed to build mathematical
expressions. Floating point operations not allowed
23
Mathematical operations
• Another way for integer arithmetic
let "x=3"
let "y=5"
let "z=y+x"
echo $z
let "x=x*z"
let "y++"
echo $x
echo $y
• Yet another way
z=`expr $x + 4` # space required between operands
24
Floating point arithmetic
• Many ways to do this. If "bc" is available, print an
expression and send it to built-in calculator
x=1.5
y=2.9
echo "$x/$y" | bc -l
• One can also use the "awk" program
echo `awk 'BEGIN {print 5/3}'`
z=`awk 'BEGIN { x = 1.5; y = 2.9; printf("%2.1f",
y/x) }'`
echo $z
25
Creating Arrays
• Integer arrays can be created using the command
"seq", which needs a start and end position,
alongwith increment size
seq 1 2 10
echo $(seq 1 2 10)
26
Array Indexing
• Arrays can be indexed by number to retrieve individual
elements
• Indexes have range 0 to (n-1), where 0 is the index of the
first element and n-1 is the last item's index
nucleotides=("adenine" "cytosine" "guanine" "thymine"
"uracil")
echo ${nucleotides[3]} is equal to  thymine
echo ${nucleotides[4]} is equal to what?
• Any element of an array can be re-assigned
nucleotides[4]="Uracil"
echo ${nucleotides[@]}
# @ represents all elements
27
Array Operations
• Consider an array
data=(10 20)
• To get the number of items in array
echo ${#data[@]}
• To add items to the end of the array
data=(${data[@]} 30 40); echo ${data[@]}
• To get the string length of a particular item in array
echo ${#data[3]}
Array Operations (…contd)
• To extract a slice of items in array
echo ${data[@]:2:2}
• To find and replace items in the array
echo ${data[@]/0/5}
• To remove an item at a given position
unset data[3]; echo ${data[@]}
• To remove item based on patterns
echo ${data[@]/2*/}
String Operations: Split
• Shell script provides excellent features for handling strings
contained in variables
• The "split" command allows users to search for patterns in a
string and use them as a delimiter to break the string apart
• For example, to extract the words in a sentence, we use the
space delimiter to capture the words
x="This is a sentence"
echo $x | tr " " "\n"
for word in `echo $x | tr " " "\n"`; do
echo $word;
done
String Operations
• Two strings placed next to one another with a space
will concatenate automatically in the echo command
echo "Hello "" world"
Hello world
words=`echo "Hello "" world"`
echo $words
Hello world
String Sub-scripting
• Once a string is created, it can be subscripted using its indices
that begin with 0
word="Programming"
echo ${word:0}
echo ${word:3}
echo ${word:0:3}
# "Programming"
# "gramming"
# "Pro"
• Slices of Shell script strings cannot be assigned, eg
${word:0:1}="D"
# This won't work
String Commands
• Some examples
dna="ATAGACGACGACGTCAGAGACGA"
• Length of DNA is
echo "Length is" ${#dna}
• Find the index of a pattern
echo `expr index "$dna" GA`
• Extract a substring
echo `expr substr $dna 1 2`
• Convert to uppercase or lowercase
echo $dna | tr [A-Z] [a-z]
String Functions (…contd)
• Delete a pattern within a string
echo ${dna#A*A}
echo ${dna##A*A}
# Delete shortest from front
# Delete longest from front
• Find and replace a string
echo ${dna//AT/GGGGG}# Replace all occurances
File Processing operations
• There are many commands in linux that operate
directly on files, without having to open them and
save data in arrays, etc
• This is a big advantage over Perl, Python
sort
head
cut
tail
uniq
awk
35
sed
tr
split
File Processing operations: AWK
• Consider CSV file "gene_data.txt"
awk -F "," '$2>1000' gene_data.txt
awk -F "," '$2>50 && $4<50' gene_data.txt
awk -F "," '$2>50 && $4<50 {print $1}'
gene_data.txt
awk -F "," '$2>50 && $4<50
{printf("%s\t%s\t%s\n", $1,$3,$5)}' gene_data.txt
awk -F "," '$2>50 && $4<50
{printf("%s\t%s\t%s\n", $1,$3,$5)}' gene_data.txt
awk -F "," '$2>50 && $4<50 {printf("%s\t%f\n",
$1,$4-$2)}' gene_data.txt
36
File Processing operations: SED
• SED is a text stream editor that operates on files as
well as standard output. Main function is to find
patterns and act on them – delete or replace text
• Here’s some simple examples of using SED
– Delete lines from a file containing a pattern
sed '/^>/d' sequence.fa
# Result in STDOUT
sed '/^>/d' –i sequence.fa # In-place
– Replacement of text pattern with another text
sed 's/T/U/g' sequence.fa
37
File Processing operations: CUT
• Dissect the "gene_info.txt" file in a few ways
– Extract the 2nd column from file (each line)
cut -f 2 -d "," gene_info.txt
– Extract the 1st and 4th columns from file (each line)
cut -f 1,4 -d "," --output-delimiter=" "
gene_info.txt
– Extract the 10th character in each line
cut -c 10 gene_info.txt
– Extract the 10th to 12th characters in each line
cut -c 10-12 gene_info.txt
– Extract the 3rd and 13th characters in each line
cut -c 3,13 gene_info.txt
38
File Processing operations: SORT
• Sort the "gene_data.txt" file in different ways
– 1st column, dictionary order. Delimiter is ","
sort -k 1 -d gene_data.txt
– 2nd column, numerical increasing order. Delimiter is ","
sort -k 2 -n -t "," gene_data.txt
– 4th column, numerical decreasing order. Delimiter is ","
sort -k 4 -nr -t "," gene_data.txt
39
File Processing operations: UNIQ
• The "uniq" command finds consecutive lines in files
or STDIN that are the same and merges them for
display
• The best use of the command is with delimited files
where a particular field is "cut" out and sorted
• How many unique chromosomes are represented in
the file "gene_info.txt"?
cat gene_info.txt
cat gene_info.txt | cut -d "," -f 3 | sort | uniq
40
File Processing operations: TR
• "tr" translates, squeezes, and/or deletes characters
from standard input, writing to standard output
– In string, delete all spaces
echo "Sam Smith"
| tr -d ' '
– In string, replace spaces with tabs
echo "Sam Smith"
| tr –s [:space:] '\t'
– In string, delete all spaces
echo "Sam Smith"
| tr -d ' '
– In FASTA file, concatenate all DNA into string
sed '/^>/d' sequence.fa | tr -d '\n'
41
File Processing operations: SPLIT
• Lets say you want to break a FASTQ file into pieces so
you can align each piece separately in parallel – how
would you split the file?
– One approach will be to count the reads and split by "m"
equal reads
– Another would be to divide into "n" pieces of somewhat
equal size – may corrupt FASTQ
– Shown below:
nlines=`wc -l reads.fq | cut -f 1 -d " "`
echo $nlines/100 | bc
split -l 132000 -a 3 -d reads.fq
42
File Processing operations: CAT
• With the "cat" command, many file operations can
be accomplished
– Lines of a file can be loaded into an array
lines=`cat filename.txt`
echo ${lines[@])
– Files can be loaded into STDOUT for string operations
cat filename.txt | wc -l
– Files can be re-directed as output to other files with the redirection operator
cat filename.txt >> filename2.txt
43
Commands blocks in Shell script
• A group of statements surrounded by braces {}
– No! There are no curly braces in Shell script!
– Shell script blocks begin with "then", "elif", "else", "do"
and "case" statements and end in "fi", "done" and "esac"
statements
• Creates a new context for statements and commands
• Ex:
if (( $x>1 )); then
echo "Test"
echo "x is greater than 1"
fi
44
Conditional operations with "ifthen-else"
• If-then-else syntax allows programmers to introduce
logic in their programs
• Blocks of code can be branched to execute only
when certain conditions are met
if [condition1 is true]; then
<statements if condition1 is true>
else;
<statements if condition1 is false>
fi
• Nested if statements are possible
45
Conditions/Tests
• Linux supports many kinds of "tests" that result in a T/F
value, which can be used in an if-then-else statement
if [ -f file.txt ]; then echo "File exists" \
else; echo "Does not exist"; fi
if [ -d dirname ]; then echo "Directory exists" \
else; echo "Does not exist"; fi
if [ "string" = $string ]; then echo \
"Identical strings" else; echo "Not same"; fi
if [ "string" != $string ]; then echo \
"Not identical strings" else; echo "Same"; fi
if [ -n $string]; then echo "String not empty"; \
else "Empty string"; fi
46
Conditions/Tests (..contd)
if [ INTEGER1
echo "" fi
if [ INTEGER1
echo "" fi
if [ INTEGER1
echo "" fi
if [ INTEGER1
echo "" fi
if [ INTEGER1
echo "" fi
if (( $num <=
5"; fi
-eq INTEGER2]; then echo ""; else;
-ge INTEGER2]; then echo ""; else;
-gt INTEGER2]; then echo ""; else;
-le INTEGER2]; then echo ""; else;
-lt INTEGER2]; then echo ""; else;
5 )); then echo "Number less than
• Double square bracket syntax is also used. (When?)
47
Rules of conditional statements
• Always keep spaces between the brackets and the
actual check/comparison
• Always terminate the line with ";" before putting a
new keyword like "then", since it is a shell command
• Quote string variables if you use them in conditions
• You can invert a condition by putting an "!" in front
of it
• You can combine conditions by using "-a" for "and"
and "-o" for "or"
48
Flow Control: "For" loop
• "For" loops allow users to repeat a set of statements a pre-set number of
time.
STAGE=$(seq 1 10)
for i in ${STAGE}; do
echo "Stage $i"
done
• The "in" syntax allows for other arrays to be created
for file in `ls`; do
echo $file
done
for line in `cat gene_info.txt`; do
echo $line
done
Iterating over arrays with "while"
• Example:
nucleotides=("adenine" "cytosine" "guanine"
"thymine" "uracil")
i=0
while [ $i -lt ${#nucleotides[@]} ]; do
printf "Nucleotide is: %s\n" ${nucleotides[i]}
i=$(($i+1))
done
Output:
Nucleotide
Nucleotide
Nucleotide
Nucleotide
Nucleotide
is:
is:
is:
is:
is:
adenine
cytosine
guanine
thymine
uracil
50
Switch-case
• Case statements allow for branching to be performed
on code blocks based on different values a variable
takes
• Like an if-then-else statement, except instead of
condition, the syntax checks for values of variable
x=1
case $x in
"1") echo 1 ;;
"2") echo 2 ;;
*) echo "none" ;;
esac
51
Shell script File access
• What is file access?
– set of Shell script commands/syntax to work with data files
• Why do we need it?
– Makes reading data from files easy, we can also create new
data files
• What different types are there?
– Read, write, append
52
File I/O
• Low level file I/O is usually not performed in Linux
– Abundance of file manipulation tools/commands
• If needed though, ASCII/text files can be read line by line
using shell script easily.
file="sequence.fa"
while read line; do
# display $line or do something with $line
echo "$line"
done < "$file"
File read and write example
file="mailing_list"
while read line; do
printf "%s %s" "$fields[1]" "$fields[0]"
printf "%s %s" "$fields[3]" "$fields[4]"
printf "%s %s %s" "fields[5]" \
"fields[6]" "fields[7]"
done < "$file"
• Output:
Al Smith
123 Apple St., Apt. #1
Cambridge, MA 02139
Input file:
Last name:First name:Age:Address:Apartment:City:State:ZIP
Smith:Al:18:123 Apple St.:Apt. #1:Cambridge:MA:02139
54
Functions
• What is a function?
–
–
–
–
group related statements into a single task
segment code into logical blocks
avoid code and variable based collision
can be "called" by segments of other code
• Subroutines return values
– Explicitly with the return command
– Implicitly as the value of the last executed statement
• Return values can be a scalar or a flat array
55
Functions
• A function can be written in any Shell script program,
it is identified by the "def" keyword
• Writing a function
function echostars {
echo "***********************"
}
function exitIfError {
if [[ $1 -ne 0 ]]; then
echo "ERROR! - return code $1"
exit 1
fi
}
echostars; exitIfError
Functions with Inputs and Outputs
• The "echo" statement can be used to return some output from the
function
function fib2 {
result=(1 1)
a=0; b=1
while [ $b -lt $1 ]; do
result=(${result[@]} $b)
a=$b
b=$(($a+$b))
done
echo ${result[@]}
}
• The function can then be called
source fib2.sh; fib2 100
Providing input to programs
• It is sometimes convenient not to have to edit a
program to change certain data variables
• Shell script allows you to read data from shell directly
into program variables with the "raw_input"
command
• Examples:
echo –n "Enter your name: "
read name
Shailender
echo $name
Shailender
58
Command Line Arguments
• Command line arguments are optional data values
that can be passed as input to the Shell script
program as the program is run
– After the name of the program, place string or numeric
values with spaces separating them
– Accessed them by the xargs variable inside the program or
$1, $2, $3 …
– Avoid entering or replacing data by editing the program
• Examples:
bash arguments.sh arg1 arg2 10 20
Creating a bash "Shell script"
• The power of linux can be captured in a script, where
commands can be placed sequentially to be executed
from top to bottom, left to right
– The text file containing these commands is called a "shell
script"
• Scripts are useful because a compilation of
commands executes a task in an automated and
precise manner, repeatedly
60
Shell scripting strategies
• Use "exit" codes
– Shell scripts can be terminated abruptly with the use of
the "exit" command, it is desirable to terminate if errors
occur, rather than continuing to run
– Example
cd /home/sn34w/project1
rm –rf *
# Change to "project1"
# Delete everything there
– What if "project1" did not exist and there was an error?
• Your entire current directory would get deleted!
• Use of exit codes avoids this problem
61
Shell scripting tips and tricks
(…contd)
• The "$?" special variable stores an error message
after every linux command, has value of 0 if
command was successful, otherwise 1 or more (see
error code array)
cd /home/sn34w/project1
echo $?
if [[ $? eq 0 ]]; then
rm –rf *
fi
62
Useful Shell scripting tips
• Pipes (|) send the output of one command to
another as Standard input so that powerful
constructs for operating on data become possible
– Order of execution is from left to right
cat sequence.fa | grep "ACTTTA" | wc -l
• A linux command can be split across multiple lines by
using the "\" character at the end of the line
cat sequence.fa | grep \
"ACTTTA" | wc -l
63
Useful Shell scripting tips (…contd)
•
•
•
•
•
Shell expansion with wild cards
Input and Output redirection with "<", ">", and ">>"
Tab completion
Combining options/flags
Using flag names with "--"
64
Useful Shell scripting tips (…contd)
• Copying and pasting clipboard with left and
right mouse clicks
• Using multiple shells at the same time
• Using semi-colon to run commands on same
line
• Evaluating linux commands with backticks
• Conditional execution of commands with &&
and ||
65
Shell scripts in our home directory
• Users of the bash shell have scripts in their home
directory that control shell behaviors
– .bashrc, executed with new interactive terminal session
– .bash_profile, executed with new login session
– .bash_history, contains history of commands – saves
commands on exit and loads them upon start of session
– .bash_logout, contains things to do upon logout
• To look any of these, say .bashrc, do:
ls –a ~
vi ~/.bashrc
# Display hidden files in home dir
# Open .bashrc file in home dir
66
Information Services, 00/00/2010
Shell script example: Downloading
the human genome
• The hg19 build of the human genome can be
downloaded from the UCSC website, but before it is
usable, it has to be unzipped, "cleaned up", etc.
vi make_hg19.sh
67
Information Services, 00/00/2010
Using Shell script programs on the
cluster
• Shell script scripts can easily be submitted as jobs to
be run on the MGHPCC infrastructure
• Basic understanding of Linux commands is required,
and an account on the cluster
• Lots of useful and account registration information at
www.umassrc.org
• Feel free to reach out to Research Computing for
help
[email protected]
68
What is a computing "Job"?
• A computing "job" is an instruction to the HPC
system to execute a command or script
– Simple linux commands or Shell script/Shell script/R scripts
that can be executed within miliseconds would probably
not qualify to be submitted as a "job"
– Any command that is expected to take up a big portion of
CPU or memory for more than a few seconds on a node
would qualify to be submitted as a "job". Why? (Hint:
multi-user environment)
69
How to submit a "job"
• The basic syntax is:
bsub <valid linux command>
• bsub: LSF command for submitting a job
• Lets say user wants to execute a Shell script
script. On a linux PC, the command is
bash countDNA.sh
• To submit a job to do the work, do
bsub bash countDNA.sh
70
Specifying more "job" options
• Jobs can be marked with options for better job
tracking and resource management
– Job should be submitted with parameters such as queue
name, estimated runtime, job name, memory required,
output and error files, etc.
• These can be passed on in the bsub command
bsub –q short –W 1:00 –R rusage[mem=2048] –J
"Myjob" –o hpc.out –e hpc.err bash countDNA.sh
71
Job submission "options"
Option flag or Description
name
-q
Name of queue to use. On our systems, possible values are "short"
(<=4 hrs execution time), "long" and "interactive"
-W
Allocation of node time. Specify hours and minutes as HH:MM
-J
Job name. Eg "Myjob"
-o
Output file. Eg. "hpc.out"
-e
Error file. Eg. "hpc.err"
-R
Resources requested from assigned node. Eg: "-R
rusage[mem=1024]", "-R hosts[span=1]"
-n
Number of cores to use on assigned node. Eg. "-n 8"
72
Why use the correct queue?
•
•
•
•
Match requirements to resources
Jobs dispatch quicker
Better for entire cluster
Help GHPCC staff determine when new resources are
needed
73
Questions?
• How can we help further?
• Please check out books we recommend as
well as web references (next 2 slides)
74
Information Services, 00/00/2010
Shell script Books
• Shell script books which may be helpful
– http://shop.oreilly.com/product/9781118983843.do
• Linux Command Line and Shell Scripting Bible, 3rd Edition
– http://shop.oreilly.com/product/9781118004425.do
• Linux Command Line and Shell Scripting Bible, 2nd Edition
– http://shop.oreilly.com/product/9781782162742.do
• Linux Shell Scripting Cookbook, 2nd Edition
– http://shop.oreilly.com/product/9780764583209.do
• Beginning Shell Scripting
75
Shell script References
•
•
•
•
•
http://en.wikipedia.org/wiki/Shell_script
http://linuxcommand.org/writing_shell_scripts.php
http://www.freeos.com/guides/lsst/
http://www.steve-parker.org/sh/sh.shtml
http://linuxconfig.org/bash-scripting-tutorial
76
Information Services, 00/00/2010