Transcript Bioinformatics in Computer Sciences at NJIT
Perl online references
• • • • • http://www.rexswain.com/perl5.html
http://www.perl.com/pub/q/documentation http://www-cgi.cs.cmu.edu/cgi-bin/perl-man http://www.squirrel.nl/pub/perlref-5.004.1.pdf
http://perldoc.perl.org/index-language.html
Perl variables
• Scalar – Number – String • Examples – $myname = “Roshan”; – $year = 2006;
Number operators
Strings
• Perl supports very powerful operations on strings. Basic operators below.
Arrays
• Array variables begin with @.
• Example – @myarray=(1,2,3,4,5); – @s=(“dna”, “rna”, “protein”); – print @s outputs dnarnaprotein
Array operations
Hashes
• Hashes are like arrays, except that they are indexed by
user defined keys
instead of non-negative integers.
• Hashes begin with % • Examples – %h=(‘red’, 1, ‘blue’, 2, ‘green’, 3); – %h=(“firstname”, “usman”, “lastname”, “roshan”);
Input and Output
• printf: C-like formatted output, very powerful • print: unformatted output • Example: – printf “My name is %s\n”, $name; – printf “Today is %s %d %d\n”, $month, $day, $year;
open(IN, “in.txt”); $line=
File I/O
open(OUT, “>>out.txt”); print OUT “line2\n”; close OUT;
File I/O
open(IN, “in.txt”); @lines=
Control structures-- if else block
• • if (c) { s1; s2 ;...;} • • if (c) • { s1; s2;...;} • else { s1; s2;...;} • if (c1) • { s1; s2; ...;} • elsif (c2) • { ... } • elsif (c3) • {....} • else • { ...};
Control structures-- if else block
This is program to compute max of two Numbers.
• $max=0; • if($x > $y){ $max = $x;} • else {$max = $y;}
Control structures-- for loop
• for(init_exp; test_exp; iterate_exp) – {s1; s2; … ;} • foreach $i (@some_list) { s1; s2; ...; } • Examples: • for ($i=0; $i < @arr; $i++) – { print $arr[$i]; } • foreach $e (@arr) – { print $e; }
Control structures-- while loop
• while (condition) { s1; s2;...; } • do { s1; s2; ...;} while condition; • while (condition) {s1;if(c){last;} s2;...; } • #last means exit the loop • while (condition) {s1;if(c){next;} s2;...; } • #next means go to next iteration
Subroutines
• sub – .
– .
– return; • }
Scope
• You can create
local
variables using
my
.
• Otherwise all variables are assumed
global
, i.e. they can be accessed and modified by any part of the code.
• Local variables are only accessible in the
scope
of the code.
Pass by value and pass by reference
• All variables are passed by value to subroutines. This means the original variable lying outside the subroutine scope does not get changed.
• You can pass arrays by reference using \. This is the same as passing the memory location of the array.
Splitting and joining strings
• split: splits a string by regular expression and returns array – @s = split(/,/); – @s = split(/\s+/); • join: joins elements of array and returns a string (opposite of split) – $seq=join(“”, @pieces); – $seq=join(“X”, @pieces);
Searching and substitution
• $x =~ /$y/ ---- true if expression $y found in $x • $x =~ /ATG/ --- true if open reading frame ATG found in $x • $x !~ /GC/ --- true if GC not found in $x • $x =~ s/T/U/g --- replace all T’s with U’s • $x =~ s/g/G/g --- convert all lower case g to upper case G
DNA regular expressions
Taken from Jagota’s Perl for Bioinformatics
Exercises
• Write a program to read in two DNA sequences and print the number of matches and mismatches.
• Write a program to translate a DNA string into amino acids • Write a program that searches for the open reading frame in a DNA sequence.
• Write a program called count.pl that reads in a set of strings from a file called string.txt and prints the number of A’s, B’s, C’s, and D’s. For example, if the file looks like AABCAA CDAAD • CCC then your program should output A 6 B 1 C 5 D 2
• • • • • • Write a program to prompt the user for a Yes or No response. Read in the user response using the STDIN file handle and print “OK” is the user enters “Yes,” “I hear you” if the user enters “No,” and “Make up your mind!” if the user enters something other than “Yes” or “No.” Create a script called foods.pl that asks the user for their favorite foods. Ask the user to enter at least 5 foods, each separated by a space (or some other delimiter). Store their answer in a scalar. Split the scalar into an array. Once the array is created, have the script do the following: * Print the array * Print the number of elements in the array * Create an array slice from three elements of the array and print the new array Write a program to open a file containing SSNs and read in the first five records into an array.
• • A. Input your name into a variable called $name and then print "Hello, (your name here)".
B. Input your age into a variable called $age. Have the computer print out how old you'll be eight years from now, ten years from now, twenty-four years from now.
• • • C. Input a word to a variable. Then store the first two letters of that word into another variable and print those two letters to the screen.
D. Use the length function to find the length of a string. Then print the last half of the string to the screen.
E. Input your firstname and lastname as two separate variables called $firstname and $lastname respectively. Concatenate them together using the dot operator '.' into a new variable called $wholename. Then print out $wholename.
• A. Modify if1.pl to get an inputted number for $x. print "Enter a number: "; • B.Fix if2.pl to actually get the else to happen.
• C. Write a program that adds two numbers and then prints out whether the sum of those two numbers is positive or negative.
• D. Input a number and put it into your program. Have a friend attempt to guess the number. Have your program tell your friend whether their guess was high, low, or correct.
• E. Enter your age and your score on the permit test (out of 15). Have the computer output whether or not you're able to get your permit based on being at least 16 years old and having a score of at least 12.
• A. Write a for() loop that counts from 0 to 15 by twos and prints out the number that its on.
• B. Write a while() loop that counts from 0 to 15 by threes and prints out the number that it is on.
• C. Guessing Game revisited: Alter your guessing game to repeat itself until the user guesses the correct number. Output the number of guesses that the person made.
• D. Make an endless loop. If you are successful, you will need to know that control+c is the keyboard command to kill the program.
• E. Print out a multiplication table from 0 to 9. (Hint, use nested loops.)
• A. Write a program that asks a user for five groceries, and then store that list in an array called @grocerylist. Print out the items in the array alphabetically.
• B. Write a program that asks the user for some numbers. Store them in an array called @numarray. Print out the numbers in the opposite order that they were entered. Stop accepting input when the user enters a letter.
• A. Write a program that asks the user for five groceries, and then write them to a file in alphabetical order. (Hint: This is similar to an array exercise.) • B. Write a program that reads the file wrtitten in the last ecxercise, and then prints the contents of the file to the screen in the opposite order as they appeared in the file.
• C. Still working on the grocery list, ask the user for the name of a file that contains their grocery list. Then display the contents of that file, and ask the user if they wish to leave the list alone, add to the list, or replace the list. If the user wishes to add to or replace the list, take the input and write it to the file, as appropriate.
• • Exercise 3.1
Write a program to print out the cube root of a command line argument. Remember that taking the Nth root of a number is equivalent to raising it the the 1/N power.
• • Exercise 3.2
Given a string and a number as command line arguments, write a program to print the string out as many times as the number indicates. (You don’t need a loop.) • • Exercise 3.3
Write a program to add up the size (in bytes) of all the text files whose names are entered on the command line. Print out the total.
• • • • • • • • • • • Exercise 4.1
Prompt the user for a first name and last name, and read in both names at the same time into a single variable. Assume that the input will be in the correct format with precisely one space between the names. Use index to locate the space, and use substr to parse the name. Convert the last name to UPPERCASE and the first name to Title Case. Print the results in the format “LAST, First”.
Exercise 4.2
Modify Exercise 4.1 to prompt for and read in the names inside of a loop. Open a file for writing and print each name, formatted First” to the output file. Continue looping until the user enters the null string in response to the prompt for a name.
as “LAST, Exercise 4.3
Print out two lines in the following format (adjusting for today’s date): Today is "Tuesday, December 8, 1998" already.
It's day 342 of the year.
Use localtime in a list context to determine today’s date. Assign the names of the twelve months of the year to one array and the names of the seven days of the week to another.
Exercise 4.4
Using Perl’s srand and rand functions write a program to simulate a Pick 6 Lotto program. Assuming a range of numbers from 1 through 46, pick 6 numbers and store them in an array. Make sure the same number is not selected more than once. Once the array is filled with 6 random values display the 6 numbers.
• • • • • • • • • • • • • • • • Exercise 5.1
Use the range operator to fill an array with the numbers 30 –40.
@ary = (30..40); Compare what happens when you use the reverse operator in a list context versus a scalar context.
@new = reverse @ary; # list context $var = reverse @ary; # scalar context Print out the values of the @new array and the $var scalar.
Exercise 5.2
Fill an array with the names of some animals commonly found in a zoo. Experiment with some of the list functions: 1. Delete one animal from the end of the array. Print out the name of the animal deleted.
2. Add two new animals to the beginning of the array.
3. Add one new animal to the end of the array.
4. Delete one animal from the beginning of the array and add it to the end of the array. Repeat this rotation two more times.
5. Use the grep function to select all the animal names with lengths greater than 5 characters. Assign to a new array and print it.
6. Use the map function to convert each animal name to uppercase. Assign the results back to the original array.
7. Print out the entire array by deleting one element at a time from the beginning of the array until the array is emptied.
• • • Exercise 5.3
Write a program that captures the outputs of two different commands in two separate arrays. Produce a merged listing of their respective outputs. That is, print line #1 of the first program, then line #1 of the second, then line #2 of the first program, then line #2 of the second, etc.
For Unix systems, try using the commands who and w.
• • Exercise 5.4
Modify Exercise 4.4 (Pick 6 Lotto) to sort the 6 random numbers numerically before displaying.
• • Exercise 5.5
Fill an array with the 26 letters of the alphabet, and print the letters in random order. (Hint: Try using splice with a random subscript into the array to extract and delete one letter at a time.)
• • • • • • • • • • • • • • • Exercise 6.1
Write a program that prints out each string entered on the command line once only. For example, $ ex6.1 red green red blue green yellow red red green blue yellow Use a hash to keep track of which strings have already been seen.
Exercise 6.2
Write a program to print out a word frequency count of the strings entered on the command line, sorted from most frequently to least frequently seen string. For example, $ ex6.2 red green red blue green yellow red 3 red 2 green 1 blue 1 yellow
• • • • • • • • • • • • • Exercise 6.3
Write a program to create a reverse mapping of a hash. Starting with a %phone hash, such as: %phone = ( tom => '456-7123', larry => '321-5500', sriram => '998-9469', ); that uses names as keys and phone numbers as values, build a new hash that uses the phone numbers as keys and names as values. Print out the key –value pairs of the new hash.
What will happen if two or more people in the %phone hash have the same phone number?
Exercise 6.4
Modify Exercise 4.4 (Pick 6 Lotto) to use a hash for keeping track of which random numbers have already been seen. Store the chosen numbers by using them as keys in the hash. Although some value must be assigned for each key to associate to, the value used is irrelevant. For example, to “store” the number 26 as a key: $pick{26} = 'foobar'; When printing out the 6 random numbers, be sure to sort the keys numerically.
• • • • • • • • • Exercise 6.5
Given two lists of non-repeating elements (i.e., no list has the same element more than once), find and print the union and intersection of those lists. For example, given: @a = (1, 3, 5, 6, 7, 8); @b = (0, 2, 3, 5, 7, 9); The union of the values in the two arrays would be: (0, 1, 2, 3, 5, 6, 7, 8, 9) The intersection of the values in the two arrays would be: (3, 5, 7) Use hash variables to build the union and intersection sets.