BNF data structure design

Download Report

Transcript BNF data structure design

BNF Grammar
Example design of a data structure
26-Jul-16
Form of BNF rules










<BNF rule> ::= <nonterminal> "::=" <definitions>
<nonterminal> ::= "<" <words> ">"
<terminal> ::= <word> | <punctuation mark> | ' " ' <any chars> ' " '
<words> ::= <word> | <words> <word>
<word> ::= <letter> | <word> <letter> | <word> <digit>
<definitions> ::= <definition> | <definitions> "|" <definition>
<definition> ::= <empty> | <term> | <definition> <term>
<empty> ::=
<term> ::= <terminal> | <nonterminal>
Not defined here (but you know what they are) : <letter>, <digit>,
<punctuation mark>, <any chars> (any printable nonalphabetic character
except double quote)
Notes on terminology




A grammar typically has a “top level” element
A string is a “sentential form” if it satisfies the syntax
of the top level element
A string is a “sentence” if it is a sentential form and
is composed entirely of terminals
A string “belongs to” a grammar if it satisfies the
rules of that grammar
Uses of a grammar

A BNF grammar can be used in two ways:

To generate strings belonging to the grammar


To do this, start with a string containing a nonterminal;
while there are still nonterminals in the string {
replace a nonterminal with one of its definitions
}
To recognize strings belonging to the grammar


This is the way programs are compiled--a program is a
string belonging to the grammar that defines the language
Recognition is much harder than generation
Generating sentences

I want to write a program that reads in a grammar,
stores it in some appropriate data structure, then
generates random sentences belonging to that grammar

I need to decide:



How to store the grammar
What operations to provide on the grammar
These decisions are intertwined!

How I store the grammar determines what operations are easy
and efficient (and even possible!)
Development approaches

Bad approach:
Design a general representation for grammars and a complete set
of operations on them



Actually, this is a good approach if you are writing a general-purpose
package for general use--for example, for inclusion in the Java API
Otherwise, it just makes your program much more complex
Good approach:
Decide the operations you need for this program, and design a
representation for grammars that supports these operations


It’s a nice bonus if the design can later be extended for other purposes
Remember the Extreme Programming slogan YAGNI: “You ain’t gonna
need it.”
Requirements and constraints

We need to read the grammar in



We need to look up the definitions of nonterminals


But we don’t need to modify it later
Any tools for building the grammar structure can be private
We need this because we will need to replace each
nonterminal with one of its definitions
We need to know the top level element of the grammar


But we can just assume that we know what it is
For example, we can insist that the top-level element be
<sentence>
First cut




public class Grammar implements Iterable
List<String> rule;
// a single alternative for a nonterminal
List<List<String>> definition;
// all the rules for one nonterminal
Map<String, List<List<String>>> grammar; // rules for all the nonterminals







public Grammar() { grammar = new TreeMap<String, List<String>>(); }
public void addRule(String rule) throws IllegalArgumentException
public List<List<String>> getDefinition(String nonterminal)
public List<String> getOneRule(String nonterminal) // random choice
public Iterator iterator()
public void print()
First cut: Evaluation

Advantages




Small, easily learned interface
Just one class
Can be made to work
Disadvantages



As designed, <foo> ::= bar | baz is two rules, requiring two calls to
addRule; hence requires caller to do some of the parsing, to separate out
the left-hand side
Requires some fairly complicated use of generics
ArrayList implements List (hence is a List), but consider:
List<List<String>> definition = makeList();

This statement is legal if makeList() returns an ArrayList<List<String>>

It is not legal if makeList() returns an ArrayList<ArrayList<String>>
Second cut: Overview




We can eliminate the compound generics by using more
than one class
public class Grammar implements Iterable
Map<String, Definition> grammar; // all the rules
public class Definition
List<Rule> definition;
public class Rule
String lhs;
// the definiendum
List<String> rhs; // the definiens
Second cut: More detail



public class Grammar implements Iterable
Map<String, Definition> grammar; // rules for all the nonterminals
public Grammar() { grammar = new TreeMap<String, Definition>(); } // constructor
public void addRule(String rule) throws IllegalArgumentException
public Definition getDefinition(String nonterminal)
public Iterator iterator()
public void print()
public class Definition
List<Rule> definition; // all definitions for some unspecified nonterminal
Definition() // constructor
void addRule(Rule rule)
Rule getOneRule()
public String toString()
public class Rule
String lhs;
// the definiendum
List<String> rhs; // the definiens
Rule(String text) // constructor
public String getLeftHandSide()
public List<String> getRightHandSide()
public String toString()
Second cut: Evaluation

Advantages:


Simplifies use of generics
Disadvantages:


Many more methods
Definitions are “unattached” from nonterminal being defined




This makes it easier to parse definitions
Seems a bit unnatural
Need to pass the tokenizer around as an additional argument
Doesn’t help with the problem that the caller still has to
separate out the definiendum from the definiens
Third (very brief) cut

Definition and Rule are basically both just lists of strings


Why not just have them implement List?
Methods to implement:

public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
boolean add(Object o)
void add(int index, Object element)
boolean addAll(Collection c)
boolean addAll(int index, Collection c)
void clear()
boolean contains(Object o)
boolean containsAll(Collection c)
Object get(int index)
int indexOf(Object o)
boolean isEmpty()
Iterator iterator()
int lastIndexOf(Object o)
ListIterator listIterator()
ListIterator listIterator(int index)
boolean remove(Object o)
Object remove(int index)
boolean removeAll(Collection c)
boolean retainAll(Collection c)
Object set(int index, Object element)
int size()
List subList(int fromIndex, int toIndex)
Object[] toArray()
Object[] toArray(Object[] a)
Fourth cut, not quite as brief


The class AbstractList “provides a skeletal
implementation of the List interface...the programmer
needs only to extend this class and provide
implementations for the get(int index) and size()
methods.”
I tried this, but...



If I don’t know how AbstractList is implemented, how can I
write these methods?
No book or API class that I looked at provided any clues
I may be missing something, but it looks like the only thing
to do is to look at the source code for some of Java’s classes
(like ArrayList) to see how they do it

Doable, but too much work!
Letting go of a constraint





It is good practice to use a more general class or
interface if you don’t need the services of a more
specific class
In this problem, I want to use lists, but I don’t care
whether they are ArrayLists, or LinkedLists, or
something else
Hence, I generally prefer declarations like
List<String> list = new ArrayList<String>();
In this case, however, trying to do this just seems to be
the cause of many of the problems
What happens if I just make all lists ArrayLists?
Fifth (and final) cut



public class Grammar

Map<String, Definitions> grammar; // rules for all the nonterminals

public Grammar() { grammar = new TreeMap<String, Definitions>(); }

public void addRule(String rule) throws IllegalArgumentException

public Definitions getDefinitions(String nonterminal)

public void print()

private void addToGrammar(String lhs, SingleDefinition singleDefinition)

private static boolean isNonterminal(String s) { return s.startsWith("<"); }
public class Definitions extends ArrayList<SingleDefinition>

@Override public String toString()
public class SingleDefinition extends ArrayList<String>

@Override public String toString()
Final version: Evaluation

Advantages:





Grammar has one constructor and three public methods
Definitions and SingleDefinition are just ArrayLists, so there are no new
methods to learn
All rule parsing is consolidated into a single public method,
addRule(String rule)
I was able to come up with more meaningful names for classes
Disadvantages:

User has to do a bit more list manipulation; in particular, choosing a
random element from a list

This doesn’t seem like an appropriate thing to have in a grammar, anyway
Morals


“Weeks of programming can save you hours of planning.”
The mistake most programmers make is to use the first design
that comes to mind



Much as we would like to pretend otherwise, programming is an
iterative process--we design, then try to implement, then change
the design, then try to implement....
TDD (Test-Driven Development) is a “lightweight” (low cost)
way to try out a design



This usually can be made to work, but it’s seldom optimal
For example, in my first design, I discovered how difficult it was to write
tests that used the complex generics
Consequently, I never even tried to implement this first design
Morals to take home:


Be flexible; try out more than one design
Do TDD
Aside: Tokenizing the input grammar

I wrote a BnfTokenizer class that returns every token
as a String




Nonterminals keep their angle brackets, and may be multiword
Double-quoted strings are returned as a single token (minus
the double quotes)
::= and | are returned as single tokens
BnfTokenizer uses StreamTokenizer


It provides two constructors,
BnfTokenizer() and BnfTokenizer(String text)
And two methods,
void tokenize(text) and String nextToken()
The End