Introduction to Perl scripting

Download Report

Transcript Introduction to Perl scripting

1

Introduction to Perl scripting

Part 1 basic perl

2

What is Perl?

 Scripting language  Practical Extraction and Reporting Language  Pathologically Eclectic Rubbish Lister 病态折中式电子列表器

How do I use Perl?

3

    $ vi hello.pl

print “hello world\n”; $ perl hello.pl

hello world $ vi add.pl

print $ARGV[0] + $ARGV[1], “\n”; $ perl add.pl 17 25 42

4

Why Perl?

 FAST text processing  Simple Scripting language  Cross-platform  Many extensions for Biological data

5

TMTOWTDI

 Motto: TMTOWTDI (There’s More Than One Way To Do It)  This can be frustrating to new users  Focus on understanding what you are doing, don’t worry about all the other ways yet.

Getting started

6

 Primitives – – – – – String - “string”, ‘string’ Numeric - 10, 12e4, 1e-3, 120.0123

 Data types scalars - $var = “a”; $num = 10; lists - @lst = (‘apple’, ‘orange’) hashes - %hash=(1:’apple’, 2:’orange’)

7

Starter Code

# assign a variable $var = 12; print “var is $var\n”; # concatenate strings $x = “Alice”; $y = $x . “ & Alex are cousins\n”; print $y; # print can print lists of variables print $y, “var is “, $var, “\n“;

8

Tidbits

    To print to screen – print “string” Special chars – newline - “\n” – tab “\t” strings and numeric conversion automatic All about context

Math

9

       Standard arithmetic +, -, *, / mod operator % - 4 % 2 = 0; 5 % 2 = 1 Operate on in place: $num += 3 Increment variable, $a++, $a- power ** 2 5 = 2**5 sqrt(9) log e (5) = log(5) - log 10 (100) = log(100) / log(10)

10

Precision

 Round down int ($x)  Round up POSIX::ceil ( $x )  Round down POSIX::floor ( $x )  Formatted printing printf/sprintf – – %d, %f, %5.2f, %g, %e More coverage later one

Some Math Code

11

# Pythagorean theorem my $a = 3; my $b = 4; my $c = sqrt($a**2 + $b**2); # what’s left over from the division my $x = 22; my $y = 6; my $div = int ( $x / $y ); my $mod = $x % $y; print $div, “ “, $mod, “\n”; output: 3 4

Logic & Equality

12

 if / unless / elsif / else – if( TEST ) { DO SOMETHING } elsif( TEST ) { SOMETHING ELSE } else { DO SOMETHING ELSE IN CASE }   Equality: == (numbers) and eq (strings) Less/Greater than: <, <=, >, >= – lt, le, gt, ge for string (lexical) comparisons

13

Testing equality

$str1 = “mumbo”; $str2 = “jumbo”; if( $str1 eq $str2 ) { print “strings are equal\n”; } if( $str1 lt $str2 ) { print “less” } else { print “more\n”; if( $y >= $x ) { print “y is greater or equal\n”; }

14

Boolean Logic

   AND – && and OR – || or NOT – ! not if( $a > 10 && $a <= 20) { }

15

Loops

  while( TEST ) { } until( ! TEST ) { } for( $i = 0 ; $i < 10; $i++ ) {}   foreach $item ( @list ) { } for $item ( @list ) { }

16

Using logic

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 { print “$i is 0\n”; } elsif( $i / 2 == 0) { print “$i is even\n”; } else { print “$i is odd } }

What is truth?

17

  True – if( “zero” ) {} – – if( 23 || -1 || ! 0) {} $x = “0 or none”; if( $x ) False – if( 0 || undef || ‘’ || “0” ) { }

18

Special variables

 This is why many people dislike Perl  Too many little silly things to remember  perldoc perlvar for detailed info

Some special variables

19

      $!

$, $/ - error messages here - separator when doing print “@array”; - record delimiter (“\n” usually) $a,$b - used in sorting $_ - implicit variable perldoc perlvar for more info

The Implicit variable

20

   Implicit variable is $_ for ( @list ) { print $_ } while() { print $_}

21 Input/Output: Getting and Writing Data

Getting Data from Files

22

open(HANDLE, “filename”) || die $!

$line1 = ; while(defined($line = )) { if( $line eq ‘line stuff’ ) { } } open(HANDLE, “filename”) || die $!

while(){ print “line is $_”; } open(HANDLE, “filename”) || die $!

@slurp = ;

23

Data from Streams

while() { print “stdin read: $_”; } open(GREP, “grep ‘>’ $filename”) || die $!; my $i = 0; while() { $i++; } close(GREP); print “$i sequences in file\n”;

24

Can pass data into a program

while() { print “stdin read: $_”; } open(GREP, “grep ‘>’ $filename”) || die $!; my $i = 0; while() { $i++; } close(GREP); print “$i sequences in file\n”;

25

Writing out data

open(OUT, “>outname”) || die $!; print OUT “sequence report\n”; close(OUT); # appending with >> open(OUT, “>>outname”) || die $!; print OUT “appended this\n”; close(OUT);

26

Filehandles as variables

  $var = \*STDIN open($fh, “>report.txt”) || die $!; print $fh “line 1\n”;  open($fh2, “report”) || die $!; $fh = $fh2 while(<$fh>) { }

27 String manipulation

28

Some string functions

 .

– - concatenate strings $together = $one . “ “. $two;  reverse - reverse a string (or array)  length  uc - get length of a string - uppercase or lc - lowercase a string

29

split/join

 split: separate a string into a list based on a delimiter – @lst = split(“-”, “hello-there-mr-frog”);  join: make string from list using delimiter – $str = join(“ “, @lst); – Solves fencepost problem nicely  (want to put something between each pair of items in a list)  print join(“\t”, @lst),”\n”;

30

index

 index(STRING, SUBSTRING, [STARTINGPOS])  Find the position of a substring within a string (left to right scanning)  $codon = ‘ATG’; $str = AGCGCATCGCATGGCGATGCAGATG $first = index($str,$codon); $second = index($str, $codon, $first + length($codon));  rindex Same as index, but Right to Left scanning

31

substr

 substr(STRING, START,[LENGTH],[REPLACE]);  Extract a substring from a larger string  $orf = substr($str,10,40); $end = substr($str,40); # get end  Replace string – substr($str,21,10,’NNNNNNNNNNN’);

32

Zero based economy...

 – 1st number is ‘0’ for an index or 1st character in a string most programming languages  Biologists often number 1st base in a sequence as ‘1’ (GenBank, BioPerl)  Interbase coordinates (Kent-UCSC, Chado-GMOD)

33

Coordinate systems

 Zero based, interbase coordinates A T G G G T A G A 0 1 2 3 4 5 6 7 8 9  1 based coordinates A T G G G T A G A 1 2 3 4 5 6 7 8 9

34

Arrays and Lists

 Lists are sets of items  Can be mixed types of scalars (numbers, strings, floats)  Perl uses lists extensively  Variables are prefixed by @

35

List operations

 reverse - reverse list order  $list[$n] - get the $n-th item – $two = $list[2];  scalar - get length of array – – $len = scalar @list; $last_index = $#list  delete $list[10] - delete entry

Autovivication

36

 Automatically allocate space for an item  $array[0] = ‘apple’; print scalar @array, “ ”; $array[4] = ‘elephant’; $array[25] = ‘zebra fish’; print scalar @array, “ ”; delete $array[25]; print scalar @array, “\n”; output: 1 26 5

pop,push,shift,unshift 37

# remove last item $last = pop @list; # remove first item $first = shift @list; # add to end of list push @list, $last; # add to beginning of list unshift @list, $first;

38

splicing an array

splice ARRAY,OFFSET,LENGTH,LIST splice ARRAY,OFFSET,LENGTH splice ARRAY,OFFSET splice ARRAY @list = (‘alice’,’chad’,’rod’); ($x,$y) = splice(@list,1,2); splice(@list, 1,0, (‘marvin’,’alex’)); newlist: (‘alice’,’marvin’,’alex’,’chad’,’rod’);

39

Sorting with sort

@list = (‘tree’,’frog’, ‘log’); @sorted = sort @list; # reverse order @sorted = sort { $b cmp $a } @list; # sort based on numerics @list = (25,21,12,17,9,8); @sorted = sort { $a <=> $b } @list; # reverse order of sort @revsorted = sort { $b <=> $a } @list;

40

How would you sort based on part of string in list?

41

@list = (‘E1’,’F3’,‘A2’); @sorted = sort @list; # sort lexical @sorted = sort { substr($a,1,1) <=> substr($b,1,1) } @list;

42

Filter with grep

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’); @sl = grep { length($_) == 3} @list; @oo = grep { index($_,”oo”) >= 0 } @list; # use it to count my $ct = grep { substr($_,1,1) eq ‘a’} @list;

43

Transforming with map

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’); @lens = map { length($_) } @list; @upper = map { $fch = substr($_,0,1); substr($_,0,1,uc($fch)) } @list

44

More list action

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’); for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; } }

Sort complicated stuff

45

# want to sort these by gene number @list = (‘CG1000.1’, ‘CG0789.1’, ‘CG0321.1’, ‘CG1227.2’); @sorted = sort { ($locus_a) = split(/\./,$a); ($locus_b) = split(/\./,$b); substr($locus_a,0,2,’’); substr($locus_b,0,2,’’); $locus_a cmp $locus_b; } @list; print “sorted are “,join(“,”,@sorted), “\n”;

46

Scope

 The section of program a variable is valid for  Defined by braces { }  use strict;  Use ‘my’ to declare variables

#!/usr/bin/perl -w use strict; my $var = 10; my $var2 = ‘monkey’; print “(outside) var is $var\n”.

“(outside) var2 is $var2\n”; { my $var; $var = 20; print “(inside) var is $var\n”; $var2 = ‘ape’; } print “(outside) var is $var\n”.

“(outside) var2 is $var2\n”;

48

Good practices

 Declare variables with ‘my’  Always ‘use strict’  ‘use warnings’ to get warnings

Let’s practice (old code)

49

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’); for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; } }

Let’s practice

#!/usr/bin/perl use warnings use strict; my @list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

50

for my $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; } }

51

Editors

 vi filename – begin by using this editor

52

Make a perl script

$ pico hello.pl

#!/usr/bin/perl print “hello world\n”; [Control-O , enter, Control-X enter] $ perl hello.pl

hello world $ chmod +x hello.pl

$ ./hello.pl