CS790 – Introduction to Bioinformatics

Download Report

Transcript CS790 – Introduction to Bioinformatics

CS 480/680 – Comparative Languages
Standard Types and Regular
Expressions
Numbers
 Most integers are Fixnum objects
• When they grow too large, the are converted to
Bignum objects

An arbitrary length list of fixnums
 Literals:
• 12345 – decimal

Underscores ignored (12_345 == 12345) (Why?)
• 0377 – octal (leading 0)
• 0x3F7A – hex
• 0b110111010001 – binary
Types & Regular Expressions
2
Numeric Classes
 Integer classes support a number of iterators
•
•
•
•
3.times { … }
1.upto(5) { … }
99.downto(7) { … }
50.step(80, 5) { … } = 50, 55, 60, 65, …, 80
Types & Regular Expressions
3
Strings
 A String is a sequence of 8-bit bytes
• Usually holds ASCII characters, but not necessary,
can hold numbers
 String literals
• Single quotes: only \\\ and \’’
• Double quotes:
Escape sequences like \n
 Any ruby expression:

– #{var1}
– #{2*$var2+var3/7}
Types & Regular Expressions
4
String Literals
 If you want to use another delimeter, you can
use %q (single quotes) or %Q (double quotes)
• %q/string string “string”/
• %Q(This ‘is’ a #{var2} string)
Opening bracket, brace, parenthesis, or less-than sign:
matching delimeter
 Anything else – same character

Types & Regular Expressions
5
“Here Documents”
 Specify a delimiter string using <<STRING
aString = <<END_OF_STRING
The body of the string
is the input lines up to
one ending with the same
text that followed the '<<'
END_OF_STRING
Includes newlines
and spaces
 Delimiter must be in first column
• <<-STRING allows indented delimeter
print <<-STRING1, <<-STRING2
Concat
STRING1
enate
STRING2
Types & Regular Expressions
produces:
Concat
enate
6
String Methods
 String is one of the largest classes in Ruby
• Over 75 standard methods
 Many of the more powerful methods use
regular expressions, so we’ll come back to the
topic of String Methods after we discuss regular
expressions in more detail…
Types & Regular Expressions
7
Ranges
 In Ruby ranges can be used for sequences,
conditions, and intervals
 1..5 = 1, 2, 3, 4, 5
 1…5 = 1, 2, 3, 4 (0…x is useful for arrays)
 Stored efficiently – a range object only stores
the min and max values as Fixnums
 Can convert to an array with to_a
• (1..5).to_a  [1, 2, 3, 4, 5]
• (‘bar’..’bat’).to_a  [‘bar’, ‘bas’, ‘bat’]
Types & Regular Expressions
8
Range Methods and Iterators
 A few useful operations on ranges:
digits = 0..9
digits.include?(5) » true
digits.min » 0
digits.max » 9
digits.reject {|i| i < 5 }
digits.each do |digit|
dial(digit)
end
Types & Regular Expressions
» [5, 6, 7, 8, 9]
9
Range Contents
 Ranges can even be created on objects that you
define, provided that your class…
• Implements the succ() method, providing the
next object in the sequence, and
• Objects are comparable using <=> (the “spaceship
operator”)

Returns -1/0/1 depending on whether the first object is
less-than/equal-to/greater-than the second
Types & Regular Expressions
10
Ranges of objects
 VU holds a volume level, 0 to 9
class VU
include Comparable
attr_reader :volume
def initialize(volume) # Should be 0..9
@volume = volume
# ERROR CHECKING HERE!
end
def inspect
# Prints out as ######...
'#' * @volume
end
# Support for ranges
def <=>(other)
self.volume <=> other.volume
end
def succ
raise(IndexError, "Too loud") if @volume >= 9
VU.new(@volume.succ)
end
end
Types & Regular Expressions
11
Volume Example
 Volume object print out as 0 to 9 #’s
 Can make ranges of volume objects, since they
follow the rules
medium = VU.new(4)..VU.new(7)
medium.to_a
» [####, #####, ######, #######]
Actually, four VU objects
medium.include?(VU.new(3))
Types & Regular Expressions
» false
12
Conditions and Intervals
 Ranges can also be used as conditions and as
intervals for controlling loops
 We’ll see these uses when we talk about loops
in Ruby
Types & Regular Expressions
13
Regular Expressions
 Regular expressions are a powerful tool for
matching patterns against strings
 Available in many languages (AWK, Sed, Perl,
Python, C/C++, others)
 Matching strings with RegExp’s is very
efficient and fast
 In Ruby, RegExp’s are objects, like everything
else
Types & Regular Expressions
14
RegExp literals
 There are three ways to create a regular
expression
• a = Regexp.new(‘pattern’)
• b = /pattern/
• c = %r(pattern)
 Match a Regexp against a string using
• exp.match(string)
• string =~ exp (positive match)
• string !~ exp (negative match)
Types & Regular Expressions
15
String Matching
 =~ and !~ are also defined for strings
• The string on the right is converted to a Regexp
 Return the position of the first match, or nil
• Zero-based
a
a
a
a
= "Fats Waller"
=~ /a/ » 1
=~ /z/ » nil
=~ "ll" » 7
Types & Regular Expressions
16
Regular Expression Patterns
 Most characters match themselves
 Wildcard: . (period) = any character
 Anchors
• ^ = “start of line”
• $ = “end of line”
Types & Regular Expressions
17
Character Classes
 Character classes: appear within [] pairs
•
•
•
•
•
•
Most special Regexp characters (^, $, etc) turned off
Escape sequences (\n etc) still work
[aeiou]
[0-9]
^ as first character = negate the class
You can use the literal characters ] and – if they
appear first: []-abn-z]
Types & Regular Expressions
18
Predefined character classes
 These work inside or outside []’s:
•
•
•
•
•
\d = digit = [0-9]
\D = non-digit = [^0-9]
\s = whitespace, \S = non-whitespace
\w = word character [a-zA-Z0-9_]
\W = non-word character
Types & Regular Expressions
19
Repetition in Regexps
 These quantify the preceding character or class:
•
•
•
•
•
* = zero or more
+ = one or more
? = zero or one
{m, n} = at least m and at most n
{m, } = at least m
 High precedence – Only matches one character
or class, unless grouped:
• /^ran*$/ vs. /^r(an)*$/
Types & Regular Expressions
20
Alternation
 | is like “or” – matches either the regexp before
the | or the one after
 Low precedence – alternates entire regexps
unless grouped
• /red ball|angry sky/ matches “red ball” or “angry
sky” not “red ball sky” or “red angry sky)
• /red (ball|angry) sky/ does the latter
Types & Regular Expressions
21
Side Effects (Ruby Magic)
 After you match a regular expression some
“special” Ruby variables are automatically set:
• $& – the part of the expression that matched the
pattern
• $‘ – the part of the string before the pattern
• $’ – the part of the string after the pattern
Types & Regular Expressions
22
Side effects and grouping
 When you use ()’s for grouping, Ruby assigns
the match within the first () pair to:
• \1 within the pattern
• $1 outside the pattern
“mississippi” =~ /^.*(iss)+.*$/
» $1 = “iss”
/([aeiou][aeiou]).*\1/
Types & Regular Expressions
23
Repetition and greediness
 By default, repetition is greedy, meaning that it
will assign as many characters as possible.
 You can make a repetition modifier non-greedy
by adding ‘?’
a = "The moon is made of cheese“
showRE(a,
showRE(a,
showRE(a,
showRE(a,
showRE(a,
/\w+/)
/\s.*\s/)
/\s.*?\s/)
/[aeiou]{2,99}/)
/mo?o/)
Types & Regular Expressions
»
»
»
»
»
<<The>> moon is made of cheese
The<< moon is made of >>cheese
The<< moon >>is made of cheese
The m<<oo>>n is made of cheese
The <<moo>>n is made of cheese
24
String Methods Revisited
 s.split(regexp) – returns a list of substrings,
with regexp as a delimeter
• Can assign to an array, or use multiple assignment
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
songs.append(Song.new(title, name, length))
end
 s.sqeeze(string) – reduces any runs of more
than one character from string to only one
Types & Regular Expressions
25
String Methods
 s.scan(regexp) – returns a list of parts that
match the pattern
st = "123 45 hello out67there what's 23up?"
a = st.scan(/\d+/)
puts a
»
123
45
67
23
Many more in Built-in Classes and Methods!
Types & Regular Expressions
26
Regexp substitutions
 a.sub (one replacement) & a.gsub (global)
 Replace a regular expression with a string
 The string can include \1, \2, etc. to match parts
of the original pattern
 See substitutions.rb & Ruby book: Standard
Types
Types & Regular Expressions
27