Structured programming 4 Day 34 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization  http://www.tulane.edu/~ling/NLP/ 16-Nov-2009 LING 681.02, Prof.

Download Report

Transcript Structured programming 4 Day 34 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization  http://www.tulane.edu/~ling/NLP/ 16-Nov-2009 LING 681.02, Prof.

Structured programming 4
Day 34
LING 681.02
Computational Linguistics
Harry Howard
Tulane University
Course organization
 http://www.tulane.edu/~ling/NLP/
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
2
Structured programming
NLPP §4
Today's topics
 Defensive programming
 Debugging
 Algorithm design
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
4
Defensive programming
 Brainstorm with pseudo-code
 Careful naming conventions
 Bottom-up construction
Functional decomposition
 Comment, comment, comment
 Regression testing
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
5
Brainstorm with pseudocode
 Before you write the first line of Python
code, write what your program does as
pseudocode.
 That is to say, before writing a program that
NLTK understands, write it in a way that
people understand.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
6
An example of pseudocode
 SPOT, move forward about 10 inches, turn
left 90 degrees, and start moving forward,
then start looking for a black object with
your ultrasonic sensor, because I want you
to stop when you find a black object, then
turn right 90 degrees, and move backward 2
feet, OK?
 What is good or bad about this example
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
7
A different phrasing of
the example
 SPOT, move forward about 10 inches and stop.
 Now turn left 90 degrees.
 Start moving forward, and turn on your ultrasonic
sensor.
 Stop when you find a black object.
 Turn right 90 degrees and stop.
 Move backward 2 feet and stop.
 What is good or bad about this example?
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
8
Pseudo and real code
 The main advantage of the second phrasing
is that we can match up the commands in
each line to elements in the programming
language.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
9
Careful naming
conditions
 Choose meaningful variable and function
names.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
10
Bottom-up construction
 Instead of writing a 20-line program and
then testing it,
build and test smaller units,
and then combine them.
 In general, these smaller units should be
functions.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
11
NLP pipeline
Fig. 3.1
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
12
Commenting
 Add comments to every line,
unless what a line is does is so obvious that a
comment would get in the way.
 Your pseudo-code could become the
comments on your real code.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
13
Regressive testing
 Keep a suite of test cases.
 As your program gets bigger, it should still work
on previous test cases.
 If it stops working, it has 'regressed'.
 A change in code has the (unintended) side effect of
breaking something that used to work.
 doctest module does testing
 It runs a program as if it were in interactive mode.
 See doctest documentation.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
14
Debugging topics
 Check your assumptions
 Exception > stack trace
 Interactive debugging
 Python's debugger
 Prediction
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
15
Debugging
 "Most code errors result from the programmer
making incorrect assumptions". (NLPP:158)
 When you find an error, first check your
assumptions.
 Add print statements to show
 values of variables and
 how far the program progresses.
 Reduce input to smallest amount needed to cause
the error.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
16
Stack trace
 A runtime error (Python exception) gives a
stack trace that pinpoints the location of
program execution at the time of the error.
 But the error may actually be upstream.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
17
Python's debugger
 Invoke it:
 import pdb
 pdb.run('mymodule')
 It lets you
 monitor execution of program,
 specify line numbers where program should stop
(breakpoints), and
 step through the sections of code inspecting values of
variables.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
18
Prediction
 Try to predict the effect of a potential bugfix
before re-running the program.
 "If the bug isn't fixed, don't fall into the trap of
blindly changing the code in the hope that it will
magically start working again." (NLPP:159)
 For each change, try to articulate what is wrong
and how the change will fix the problem.
 Undo the change if it doesn't work.
 "Programs don't magically work; they magically
don't work." (Robert Goldman)
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
19
Algorithm design
NLPP 4.7
Algorithms
 Divide and conquer
 Start with something that works
 Iteration
 Recursion
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
21
Divide and conquer
 Divide a problem of size n into two
problems of size n/2.
 Binary search - dictionary example.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
22
Start with known
 Transform task into something that already
works.
To find duplicates in a list,
first sort the list,
then check for identity of adjacent pairs.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
23
Iteration vs. recursion
 For some function ƒ…
Iteration
Repeat ƒ some number of times.
 Calling ƒ in a for loop.
Recursion
ƒ calls itself some number of times:
 NP → the N PP.
 PP → P NP.
16-Nov-2009
LING 681.02, Prof. Howard, Tulane University
24
Next time
Start NLPP §6
Learning to classify text