Introduction to Ruby - Web Information Retrieval / Natural

Download Report

Transcript Introduction to Ruby - Web Information Retrieval / Natural

Introduction to Ruby
WING Group Meeting
9 Jun 2006
Min-Yen Kan
What is Ruby?

An interpreted language



Object Oriented


Single inheritance
High level


a.k.a dynamic, scripting
e.g., perl
Good support for system calls, regex and CGI
Relies heavily on convention for syntax
Hello World
#!/ usr/bin/env ruby

shell script directive to run ruby

puts “Hello world”
$ chmod a+x helloWorld.rb
$ helloWorld.rb
Hello world
$
Needed to run any shell script

Call to method puts to write out
“Hello world” with CR

Make program executable
Basic Ruby




Everything is an object
Variables are not typed
Automatic memory allocation and garbage
collection
Comments start with # and go to the end of the
line



You have to escape \# if you want them elsewhere
Carriage returns mark the end of statements
Methods marked with def … end
Control structures
If…elsif…else…end
case when <condition> then
<value>… else… end


unless <condition> … end
while <condition>… end
until <condition>… end
#.times (e.g. 5.times())
#.upto(#) (e.g. 3.upto(6))
<collection>.each {block}
elsif keeps blocks at same
level
case good for checks on
multiple values of same
expression; can use ranges
grade = case score
when 90..100 then “A”
when 80..90 then “B”
else “C”
end

Looping constructs use end
(same as class definitions)

Various iterators allow code
blocks to be run multiple times
Ruby Naming Conventions

Initial characters






Multi-word names



Local variables, method parameters, and method names 
lowercase letter or underscore
Global variable  $
Instance variable  @
Class variable  @@
Class names, module names, constants  uppercase letter
Instance variables  separate words with underscores
Class names  use MixedCase
End characters


? Indicates method that returns true or false to a query
! Indicates method that modifies the object in place rather than
returning a copy (destructive, but usually more efficient)
Another Example
class Temperature
Factor = 5.0/9
def store_C(c)
@celsius = c
end
def store_F(f)
@celsius = (f - 32)*Factor
end
def as_C
@celsius
end
def as_F
(@celsius / Factor) + 32
end
end # end of class definition
Factor is a constant
5.0 makes it a float
4 methods that get/set
an instance variable
Last evaluated
statement is
considered the return
value
Second Try
class Temperature
Factor = 5.0/9
attr_accessor :c
def f=(f)
@c = (f - 32) * Factor
end
def f
(@c / Factor) + 32
end
def initialize (c)
@c = c
end
end
t = Temperature.new(25)
puts t.f # 77.0
t.f = 60 # invokes f=()
puts t.c # 15.55
attr_accessor creates setter
and getter methods
automatically for a class
variable
initialize is the name for a
classes’ constructor
Don’t worry - you can always
override these methods if you
need to
Calls to methods don’t need () if
unambiguous
Regular Expressions
/[A-Z]/i
/[^>]+/
/<A[^>]+/im
/<A[^>]+>.*<\/A>/im
/[^\t]/
/([^\t]+)/
/^([^\t]+)\t([^\t]+)$/

First class objects in
ruby

Sandwiched between
//

Quiz: How do you
make a regexp for
<A>…</A>?
Input and Output - tsv files
f = File.open ARGV[0]
while ! f.eof?
line = f.gets
if line =~ /^#/
next
elsif line =~ /^\s*$/
next
else
puts line
end
end
f.close
ARGV is a special array
holding the
command-line tokens
Gets a line
If it’s not a comment or
a blank line
Print it
Processing TSV files
h = Hash.new
f = File.open ARGV[0]
while ! f.eof?
line = f.gets.chomp
if line =~ /^\#/
next
elsif line =~ /^\s*$/
next
else
tokens = line.split /\t/
h[tokens[2]] = tokens[1]
end
end
f.close
keys =
h.keys.sort {|a,b| a <=> b}
keys.each {|k|
puts "#{k}\t#{h[k]}" }
Declare a hash table
Get lines without \n or \r\n - chomp
split lines into fields delimited with
tabs
Store some data from each field into
the hash
Sort the keys - sort method takes a
block of code as input
each creates an iterator in which k is
set to a value at each pass
#{…} outputs the evaluated
expression in the double quoted
string
Blocks


Allow passing chunks of code in to methods
Receiving method uses “yield” command to call
passed code (can call yield multiple times)

Single line blocks enclosed in {}
Multi-line blocks enclosed in do…end

Can use parameters

[ 1, 3, 4, 7, 9 ].each {|i| puts i }
Keys = h.keys.sort {|a,b| a <=> b }
Running system commands
require 'find'
Find.find('.') do
|filename|
if filename =~ /\.txt$/i
url_output =
filename.gsub(/\.txt$/i, ".html")
url = `cat #{filename}`.chomp
cmd = "curl #{url} -o #{url_output}";
puts cmd
`#{cmd}`
end
end

require reads in another
ruby file - in this case a
module called Find

Find returns an array, we
create an iterator
filename to go thru its
instances
We create a new variable
to hold a new filename
with the same base but
different .html extension
We use backticks `` to
run a system command
and (optionally) save the
output into a variable


• curl is a command in mac os to
retrieve a URL to a file, like wget in
unix
CGI example
require 'cgi'
cgi = CGI.new("html3")
size = cgi.params.size

CGI requires library
Create CGI object
if size > 0 # processing form
in = cgi.params['t'].first.untaint
cgi.out { cgi.html { cgi.head
cgi.body { "Welcome, #{in}!" }
} }
else
puts <<FORM
Content-type: text/html

If parameters passed
<HTML><BODY><FORM>
Enter your name: <INPUT TYPE=TEXT
NAME=t><INPUT TYPE=SUBMIT>
</FORM></BODY></HTML>
FORM
end




Process variable t
untaint variables if
using them in
commands
No parameters?

create form using here
document “<<“
Reflection
...to examine aspects of the program from within the program itself.
#print out all of the objects in our system
ObjectSpace.each_object(Class) {|c| puts c}
#Get all the methods on an object
“Some String”.methods
#see if an object responds to a certain method
obj.respond_to?(:length)
#see if an object is a type
obj.kind_of?(Numeric)
obj.instance_of?(FixNum)
Summary






High-level interpreted OO language
No types; objects can reflect
Native support of regex, iterators, system calls
and CGI
Relies on convention to limit variability
OO encourages modular, maintainable code (if
needed)
Strong support for CGI and unit testing
Only the basic here, multiple inheritance,
exception handling, threading and testing are
not covered
Modules









Marshal - serialize data for load/save between
program executions
Date
GetOptLong - manipulate command line
arguments and switches
Tempfile
Mutex - semaphores for parallel code
Net::URI
Socket - different socket types
CGI
CGI::Session
References






rdoc - ruby javadoc
rake - ruby make
eruby - embedded ruby
(in HTML; think php)
rubygems - ruby package
manager
Irb - interactive ruby
(good for debugging)
Ruby on Rails (rails)



For building web applications
Based on Model-ViewContoller architecture
further integration of ruby with
database backend

Programming Ruby
by Dave Thomas
(the Pickaxe Book)
http://www.ruby-lang.org
http://www.rubyforge.org
http://www.rubycentral.org
http://www.ruby-doc.org
http://www.rubygarden.org
http://www.stlruby.org
http://www.rubyquiz.com
http://www.zenspider.com/Langua
ges/Ruby/QuickRef.html