Transcript cs2110-15-trees
Data Structures: Trees and Grammars
Readings:
Sections 6.1, 7.1-7.4
(more from Ch. 6 later) 1
Goals for this Unit
• Continue focus on data structures and algorithms • Understand concepts of reference based data structures (e.g. linked lists, binary trees) – Some implementation for binary trees • Understand usefulness of trees and hierarchies as useful data models – Recursion used to define data organization 2
Taxonomy of Data Structures
• From the text: – Data type: collection of values and operations • Compare to Abstract Data Type!
– Simple data types vs. composite data types • Book: Data structures are composite data types – Definition: a collection of elements that are some combination of primitive and other composite data types 3
Book
’
s Classification of Data Structures
• Four groupings: – Linear Data Structures – Hierarchical – Graph – Sets and Tables • When defining these, note an element has: – one or more information fields – relationships with other elements 4
Note on Our Book and Our Course
• Our book ’ s strategy – In Ch. 6, discuss principles of Lists • Give an interface, then implement from scratch – In Ch. 7, discuss principles of Trees – Later, in Ch. 9, see what Java gives us • Our course operations ’ s strategy – We did Ch. 9 first. Saw List interfaces and – Then, Ch. 8 on maps and sets – Now, trees with some implementation too 5
Trees Represent…
• Concept of a tree very common and important.
• Tree terminology: – Nodes have one parent – A node ’ s children • Leaf nodes: no children • Root node: top or start; no parent • Data structures that store trees • Execution or processing that can be expressed as a tree – E.g. method calls as a program runs – Searching a maze or puzzle 6
Trees are Important
• Trees are important for cognition and computation – computer science – language processing (human or computer) • parse trees – knowledge representation (or modeling of the “ real world ” ) • E.g. family trees; the Linnaean taxonomy (kingdom, phylum, …, species); etc.
7
Another Tree Example: File System
C:\ CS216 CS120 MyMail lab1 lab2 lab3 school pers list.h
list.cpp
calc.cpp
• What about file links (Unix) or shortcuts (Windows)?
8
Another Tree Example: XML and HTML documents
…
My Page
Blah
blah blahEnd
How is this a tree?
What are the leaves?
9
Tree Data Structures
• Why this now?
– Very useful in coding • TreeMap in Java Collections Framework – Example of recursive data structures – Methods are recursive algorithms 10
Tree Definitions and Terms
• First, general trees vs. binary trees – Each node in a binary tree has at most two children • General tree definition: – Set of nodes T (possibly empty?) with a distinguished node , the root – All other nodes form a set of disjoint subtrees T i • each a tree in its own right • each connected to the root with an edge • Note the recursive definition – Each node is the root of a subtree 11
Picture of Tree Definition
r
T 1 T 2 T 3 • And all subtrees are recursively defined as: – a node with… – subtrees attached to it 12
Tree Terminology
• A node ’ s parent • A node • A node ’ s children – Binary tree: left child and right child – Sibling nodes – Descendants, ancestors ’ s degree (how many children) • Leaf nodes or terminal nodes • Internal or non-terminal nodes 13
Recursive Data Structure
that contains a pointer or reference to an
public class TreeNode
• Recursion is a natural way to express many algorithms.
• For recursive data-structures, recursive algorithms are a natural choice 14
General Trees
• Representing general trees is a bit harder – Each node has a list of child nodes • Turns out that: – Binary trees are simpler and still quite useful • From now on, let ’ s focus on binary trees only 15
ADT Tree
• Remember definition on an ADT?
– Model of information: we just covered that – Operations? See pages 405-406 in textbook • Many are similar to ADT List or any data structure – The “ CRUD ” operations: create, replace, update, delete • Important about this list of operations – some are in terms of one specified node, e.g. hasParent() – others are “ tree-wide ” , e.g. size(), traversal 16
Classes for Binary Trees (pp. 416-431)
• class LinkedBinaryTree (p. 425, bottom) – reference to root BinaryTreeNode – methods: tree-level operations • class BinaryTreeNode (p. 416) – data: an object (of some type) – left: references root of left-subtree (or null) – right: references root of right-subtree (or null) – parent: references this node ’ s parent node • Could this be null? When should it be?
– methods: node-level operations 17
Two-class Strategy for Recursive Data Structures
• Common design: use two classes for a Tree or List • “ Top ” class – has reference to “ first ” node – other things that apply to the whole data-structure object (e.g. the tree-object) • both methods and fields • Node class – Recursive definitions are here as references to other node objects – Also data (of course) – Methods defined in this class are recursive 18
Binary Tree and Node Class
• LinkedBinaryTree class has: – reference to root node – reference to a current node, a cursor – non-recursive methods like: boolean find(tgt) // see if tgt is in the whole tree • Node class has: – data, references to left and right subtrees – recursive versions of methods like find: boolean find(tgt) // is tgt here or in my subtrees?
• Note: BinaryTree.find() just calls Node.find() on the root node!
– Other methods work this way too 19
Why Does This Matter Now?
• This illustrates (again) important design ideas • The tree itself is what we ’ re interested in – There are tree-level operations on it ( “ ADT level ” operations) • The implementation is a recursive data structure – There are recursive methods inside the lower-level classes that are closely related (same name!) to the ADT-level operation • Principles? abstraction (hiding details), delegation (helper classes, methods) 20
ADT Tree Operations:
“
Navigation
” • Positioning: – toRoot(), toParent(), toLeftChild(), toRightChild(), find(Object o) • Checking: – hasParent(), hasLeftChild(), etc.
– equals(Object tree2) • Book calls this a “ deep compare ” • Do two distinct objects have the same structure and contents?
21
ADT Tree Operations: Mutators
• Mutators: – insertRight(Object o), insertLeft(Object o) • create a new node containing new data • make this new node be the child of the current node • Important: We use these to build trees!
– prune() • delete the subtree rooted by the current node 22
Next: Implementation
• Next (in the book) – How to implement Java classes for binary trees – Class for node, another class for BinTree – Interface for both, then two implementations (array and reference) • But for us: – We ’ ll skip some of this, particularly the array version – We ’ ll only look at reference-base implementation – After that: concept of a binary search tree 23
Understanding Implementations
• Let’s review some of the methods on pp. 416-431 – (Done in class, looking at code in book.) • Some topics discussed: – Node class. Parent reference or not?
– Are two trees equal?
– Traversal strategies: Section 7.3.2 in book – visit() method and callback (also 7.3.2) 24
Binary Search Trees
• We often need collections that store items – Maybe a long series of inserts or deletions • We want fast lookup, and often we want to access in sorted order – Lists: O(n) lookup – Could sort them for O(lg n) lookup • Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items • Solution: search tree 25
Binary Search Trees
• Associated with each node is a
key
value that can be compared.
• Binary search tree property: – every node in the left subtree has key
whose value is less than the value of the root
’
s key value, and
– every node in the right subtree has
key whose value is greater than the value of the root
’
s key value.
26
Example
5 4 1 7 3 BINARY SEARCH TREE 8 11 27
Counterexample
8 2 5 7 6 10 4 15 NOT A BINARY SEARCH TREE 11 18 20 21 28
Find and Insert in BST
• Find: look for where it should be • If not there, that ’ s where you insert 29
Recursion and Tree Operations
• Recursive code for tree operations is simple, natural, elegant • Example: pseudo-code for Node.find() boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt else // tgt ’ ’ s data < this.data) next = this.leftChild
s data > this.data
next = this.rightChild
// next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on } 30
Order in BSTs
• How could we traverse a BST so that the nodes are visited in sorted order?
– Does one of our traversal strategies work?
• A very useful property about BSTs • Consider Java ’ s TreeSet and TreeMap – A search tree (not a BST, but be one of its better “ cousins ” ) • In CS2150: AVL trees, Red-Black trees – Guarantee: search times are O(lg n) 31
Deleting from a BST
• Removing a node requires – Moving its left and right subtrees – Not hard if one not there – But if both there?
• Answer: not too tough, but wait for CS2150 to see!
• In CS2110, we ’ ll not worry about this 32
Next: Grammars, Trees, Recursion
• Languages are governed by a set of rules called a grammar – Is a statement legal ?
– Generate or derive a new legal statement • Natural language grammars – Language processing by computers • But, grammars used a lot in computing – Grammar for a programming language – Grammar for valid inputs, messages, data, etc.
33
Backus-Naur Form
• http://en.wikipedia.org/wiki/Backus-Naur_form • BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data • BNF specifics a grammar as a set of derivation rules of this form:
34
BNF for Postal Address
1.
2.
3.
4.
5.
Hooville, VA 22901 Where ’ s the recursion?
35
Grammars in Language
• Rule-based grammars describe – how legal statements can be produced – how to tell if a statement is legal • Study textbook, pp. 389-391, to see rule based grammar for simple Java-like arithmetic expressions – four rules for expressions, terms, factors, and letter – Study how a (possibly) legal statement is parsed to generate a parse tree 36
Computing Parse-Tree Example
• Expression: a * b + c 37
Grammar Terms and Concepts
• First, this is what ’ s called a
context free grammar
– For CS2110, let this.) • A CFG has ’ – a starting symbol s not worry about what this means! (But in CS2102, you learn – a set of variables (AKA non-terminals) – a set of terminal symbols – a set of productions 38
Previous Parse Tree
• Terminal symbols: –
Natural Language Parse Tree
• Statement: The man bit the dog 40
How Can We Use Grammars?
• Parsing – Is a given statement a valid statement in the language? (Is the statement recognized by the grammar?) – Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.) • Production – Generate a legal statement for this grammar – Demo: generate random statements!
• See link on website next to slides 41
Demo
’
s Poem-grammar data file
{
42