CS790 – Introduction to Bioinformatics
Download
Report
Transcript CS790 – Introduction to Bioinformatics
CS 480/680 – Comparative Languages
Standard Types and Regular
Expressions
Numbers
Most integers are Fixnum objects
• When they grow too large, the are converted to
Bignum objects
An arbitrary length list of fixnums
Literals:
• 12345 – decimal
Underscores ignored (12_345 == 12345) (Why?)
• 0377 – octal (leading 0)
• 0x3F7A – hex
• 0b110111010001 – binary
Types & Regular Expressions
2
Numeric Classes
Integer classes support a number of iterators
•
•
•
•
3.times { … }
1.upto(5) { … }
99.downto(7) { … }
50.step(80, 5) { … } = 50, 55, 60, 65, …, 80
Types & Regular Expressions
3
Strings
A String is a sequence of 8-bit bytes
• Usually holds ASCII characters, but not necessary,
can hold numbers
String literals
• Single quotes: only \\\ and \’’
• Double quotes:
Escape sequences like \n
Any ruby expression:
– #{var1}
– #{2*$var2+var3/7}
Types & Regular Expressions
4
String Literals
If you want to use another delimeter, you can
use %q (single quotes) or %Q (double quotes)
• %q/string string “string”/
• %Q(This ‘is’ a #{var2} string)
Opening bracket, brace, parenthesis, or less-than sign:
matching delimeter
Anything else – same character
Types & Regular Expressions
5
“Here Documents”
Specify a delimiter string using <<STRING
aString = <<END_OF_STRING
The body of the string
is the input lines up to
one ending with the same
text that followed the '<<'
END_OF_STRING
Includes newlines
and spaces
Delimiter must be in first column
• <<-STRING allows indented delimeter
print <<-STRING1, <<-STRING2
Concat
STRING1
enate
STRING2
Types & Regular Expressions
produces:
Concat
enate
6
String Methods
String is one of the largest classes in Ruby
• Over 75 standard methods
Many of the more powerful methods use
regular expressions, so we’ll come back to the
topic of String Methods after we discuss regular
expressions in more detail…
Types & Regular Expressions
7
Ranges
In Ruby ranges can be used for sequences,
conditions, and intervals
1..5 = 1, 2, 3, 4, 5
1…5 = 1, 2, 3, 4 (0…x is useful for arrays)
Stored efficiently – a range object only stores
the min and max values as Fixnums
Can convert to an array with to_a
• (1..5).to_a [1, 2, 3, 4, 5]
• (‘bar’..’bat’).to_a [‘bar’, ‘bas’, ‘bat’]
Types & Regular Expressions
8
Range Methods and Iterators
A few useful operations on ranges:
digits = 0..9
digits.include?(5) » true
digits.min » 0
digits.max » 9
digits.reject {|i| i < 5 }
digits.each do |digit|
dial(digit)
end
Types & Regular Expressions
» [5, 6, 7, 8, 9]
9
Range Contents
Ranges can even be created on objects that you
define, provided that your class…
• Implements the succ() method, providing the
next object in the sequence, and
• Objects are comparable using <=> (the “spaceship
operator”)
Returns -1/0/1 depending on whether the first object is
less-than/equal-to/greater-than the second
Types & Regular Expressions
10
Ranges of objects
VU holds a volume level, 0 to 9
class VU
include Comparable
attr_reader :volume
def initialize(volume) # Should be 0..9
@volume = volume
# ERROR CHECKING HERE!
end
def inspect
# Prints out as ######...
'#' * @volume
end
# Support for ranges
def <=>(other)
self.volume <=> other.volume
end
def succ
raise(IndexError, "Too loud") if @volume >= 9
VU.new(@volume.succ)
end
end
Types & Regular Expressions
11
Volume Example
Volume object print out as 0 to 9 #’s
Can make ranges of volume objects, since they
follow the rules
medium = VU.new(4)..VU.new(7)
medium.to_a
» [####, #####, ######, #######]
Actually, four VU objects
medium.include?(VU.new(3))
Types & Regular Expressions
» false
12
Conditions and Intervals
Ranges can also be used as conditions and as
intervals for controlling loops
We’ll see these uses when we talk about loops
in Ruby
Types & Regular Expressions
13
Regular Expressions
Regular expressions are a powerful tool for
matching patterns against strings
Available in many languages (AWK, Sed, Perl,
Python, C/C++, others)
Matching strings with RegExp’s is very
efficient and fast
In Ruby, RegExp’s are objects, like everything
else
Types & Regular Expressions
14
RegExp literals
There are three ways to create a regular
expression
• a = Regexp.new(‘pattern’)
• b = /pattern/
• c = %r(pattern)
Match a Regexp against a string using
• exp.match(string)
• string =~ exp (positive match)
• string !~ exp (negative match)
Types & Regular Expressions
15
String Matching
=~ and !~ are also defined for strings
• The string on the right is converted to a Regexp
Return the position of the first match, or nil
• Zero-based
a
a
a
a
= "Fats Waller"
=~ /a/ » 1
=~ /z/ » nil
=~ "ll" » 7
Types & Regular Expressions
16
Regular Expression Patterns
Most characters match themselves
Wildcard: . (period) = any character
Anchors
• ^ = “start of line”
• $ = “end of line”
Types & Regular Expressions
17
Character Classes
Character classes: appear within [] pairs
•
•
•
•
•
•
Most special Regexp characters (^, $, etc) turned off
Escape sequences (\n etc) still work
[aeiou]
[0-9]
^ as first character = negate the class
You can use the literal characters ] and – if they
appear first: []-abn-z]
Types & Regular Expressions
18
Predefined character classes
These work inside or outside []’s:
•
•
•
•
•
\d = digit = [0-9]
\D = non-digit = [^0-9]
\s = whitespace, \S = non-whitespace
\w = word character [a-zA-Z0-9_]
\W = non-word character
Types & Regular Expressions
19
Repetition in Regexps
These quantify the preceding character or class:
•
•
•
•
•
* = zero or more
+ = one or more
? = zero or one
{m, n} = at least m and at most n
{m, } = at least m
High precedence – Only matches one character
or class, unless grouped:
• /^ran*$/ vs. /^r(an)*$/
Types & Regular Expressions
20
Alternation
| is like “or” – matches either the regexp before
the | or the one after
Low precedence – alternates entire regexps
unless grouped
• /red ball|angry sky/ matches “red ball” or “angry
sky” not “red ball sky” or “red angry sky)
• /red (ball|angry) sky/ does the latter
Types & Regular Expressions
21
Side Effects (Ruby Magic)
After you match a regular expression some
“special” Ruby variables are automatically set:
• $& – the part of the expression that matched the
pattern
• $‘ – the part of the string before the pattern
• $’ – the part of the string after the pattern
Types & Regular Expressions
22
Side effects and grouping
When you use ()’s for grouping, Ruby assigns
the match within the first () pair to:
• \1 within the pattern
• $1 outside the pattern
“mississippi” =~ /^.*(iss)+.*$/
» $1 = “iss”
/([aeiou][aeiou]).*\1/
Types & Regular Expressions
23
Repetition and greediness
By default, repetition is greedy, meaning that it
will assign as many characters as possible.
You can make a repetition modifier non-greedy
by adding ‘?’
a = "The moon is made of cheese“
showRE(a,
showRE(a,
showRE(a,
showRE(a,
showRE(a,
/\w+/)
/\s.*\s/)
/\s.*?\s/)
/[aeiou]{2,99}/)
/mo?o/)
Types & Regular Expressions
»
»
»
»
»
<<The>> moon is made of cheese
The<< moon is made of >>cheese
The<< moon >>is made of cheese
The m<<oo>>n is made of cheese
The <<moo>>n is made of cheese
24
String Methods Revisited
s.split(regexp) – returns a list of substrings,
with regexp as a delimeter
• Can assign to an array, or use multiple assignment
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
songs.append(Song.new(title, name, length))
end
s.sqeeze(string) – reduces any runs of more
than one character from string to only one
Types & Regular Expressions
25
String Methods
s.scan(regexp) – returns a list of parts that
match the pattern
st = "123 45 hello out67there what's 23up?"
a = st.scan(/\d+/)
puts a
»
123
45
67
23
Many more in Built-in Classes and Methods!
Types & Regular Expressions
26
Regexp substitutions
a.sub (one replacement) & a.gsub (global)
Replace a regular expression with a string
The string can include \1, \2, etc. to match parts
of the original pattern
See substitutions.rb & Ruby book: Standard
Types
Types & Regular Expressions
27