cs2110-15-trees

Download Report

Transcript cs2110-15-trees

Data Structures: Trees and Grammars

Readings:

Sections 6.1, 7.1-7.4

(more from Ch. 6 later) 1

Goals for this Unit

• Continue focus on data structures and algorithms • Understand concepts of reference based data structures (e.g. linked lists, binary trees) – Some implementation for binary trees • Understand usefulness of trees and hierarchies as useful data models – Recursion used to define data organization 2

Taxonomy of Data Structures

• From the text: – Data type: collection of values and operations • Compare to Abstract Data Type!

– Simple data types vs. composite data types • Book: Data structures are composite data types – Definition: a collection of elements that are some combination of primitive and other composite data types 3

Book

s Classification of Data Structures

• Four groupings: – Linear Data Structures – Hierarchical – Graph – Sets and Tables • When defining these, note an element has: – one or more information fields – relationships with other elements 4

Note on Our Book and Our Course

• Our book ’ s strategy – In Ch. 6, discuss principles of Lists • Give an interface, then implement from scratch – In Ch. 7, discuss principles of Trees – Later, in Ch. 9, see what Java gives us • Our course operations ’ s strategy – We did Ch. 9 first. Saw List interfaces and – Then, Ch. 8 on maps and sets – Now, trees with some implementation too 5

Trees Represent…

• Concept of a tree very common and important.

• Tree terminology: – Nodes have one parent – A node ’ s childrenLeaf nodes: no children • Root node: top or start; no parent • Data structures that store trees • Execution or processing that can be expressed as a tree – E.g. method calls as a program runs – Searching a maze or puzzle 6

Trees are Important

• Trees are important for cognition and computation – computer science – language processing (human or computer) • parse trees – knowledge representation (or modeling of the “ real world ” ) • E.g. family trees; the Linnaean taxonomy (kingdom, phylum, …, species); etc.

7

Another Tree Example: File System

C:\ CS216 CS120 MyMail lab1 lab2 lab3 school pers list.h

list.cpp

calc.cpp

• What about file links (Unix) or shortcuts (Windows)?

8

Another Tree Example: XML and HTML documents

My Page

Blah

blah blah
End

How is this a tree?

What are the leaves?

9

Tree Data Structures

• Why this now?

– Very useful in coding • TreeMap in Java Collections Framework – Example of recursive data structures – Methods are recursive algorithms 10

Tree Definitions and Terms

• First, general trees vs. binary trees – Each node in a binary tree has at most two children • General tree definition: – Set of nodes T (possibly empty?) with a distinguished node , the root – All other nodes form a set of disjoint subtrees T i • each a tree in its own right • each connected to the root with an edge • Note the recursive definition – Each node is the root of a subtree 11

Picture of Tree Definition

r

T 1 T 2 T 3 • And all subtrees are recursively defined as: – a node with… – subtrees attached to it 12

Tree Terminology

• A node ’ s parent • A node • A node ’ s children – Binary tree: left child and right child – Sibling nodes – Descendants, ancestors ’ s degree (how many children) • Leaf nodes or terminal nodes • Internal or non-terminal nodes 13

Recursive Data Structure

that contains a pointer or reference to an

public class TreeNode { T nodeItem; TreeNode left, right; TreeNode parent; … }

• Recursion is a natural way to express many algorithms.

• For recursive data-structures, recursive algorithms are a natural choice 14

General Trees

• Representing general trees is a bit harder – Each node has a list of child nodes • Turns out that: – Binary trees are simpler and still quite useful • From now on, let ’ s focus on binary trees only 15

ADT Tree

• Remember definition on an ADT?

– Model of information: we just covered that – Operations? See pages 405-406 in textbook • Many are similar to ADT List or any data structure – The “ CRUD ” operations: create, replace, update, delete • Important about this list of operations – some are in terms of one specified node, e.g. hasParent() – others are “ tree-wide ” , e.g. size(), traversal 16

Classes for Binary Trees (pp. 416-431)

• class LinkedBinaryTree (p. 425, bottom) – reference to root BinaryTreeNode – methods: tree-level operations • class BinaryTreeNode (p. 416) – data: an object (of some type) – left: references root of left-subtree (or null) – right: references root of right-subtree (or null) – parent: references this node ’ s parent node • Could this be null? When should it be?

– methods: node-level operations 17

Two-class Strategy for Recursive Data Structures

• Common design: use two classes for a Tree or List • “ Top ” class – has reference to “ first ” node – other things that apply to the whole data-structure object (e.g. the tree-object) • both methods and fields • Node class – Recursive definitions are here as references to other node objects – Also data (of course) – Methods defined in this class are recursive 18

Binary Tree and Node Class

• LinkedBinaryTree class has: – reference to root node – reference to a current node, a cursor – non-recursive methods like: boolean find(tgt) // see if tgt is in the whole tree • Node class has: – data, references to left and right subtrees – recursive versions of methods like find: boolean find(tgt) // is tgt here or in my subtrees?

• Note: BinaryTree.find() just calls Node.find() on the root node!

– Other methods work this way too 19

Why Does This Matter Now?

• This illustrates (again) important design ideas • The tree itself is what we ’ re interested in – There are tree-level operations on it ( “ ADT level ” operations) • The implementation is a recursive data structure – There are recursive methods inside the lower-level classes that are closely related (same name!) to the ADT-level operation • Principles? abstraction (hiding details), delegation (helper classes, methods) 20

ADT Tree Operations:

Navigation

” • Positioning: – toRoot(), toParent(), toLeftChild(), toRightChild(), find(Object o) • Checking: – hasParent(), hasLeftChild(), etc.

– equals(Object tree2) • Book calls this a “ deep compare ” • Do two distinct objects have the same structure and contents?

21

ADT Tree Operations: Mutators

• Mutators: – insertRight(Object o), insertLeft(Object o) • create a new node containing new data • make this new node be the child of the current node • Important: We use these to build trees!

– prune() • delete the subtree rooted by the current node 22

Next: Implementation

• Next (in the book) – How to implement Java classes for binary trees – Class for node, another class for BinTree – Interface for both, then two implementations (array and reference) • But for us: – We ’ ll skip some of this, particularly the array version – We ’ ll only look at reference-base implementation – After that: concept of a binary search tree 23

Understanding Implementations

• Let’s review some of the methods on pp. 416-431 – (Done in class, looking at code in book.) • Some topics discussed: – Node class. Parent reference or not?

– Are two trees equal?

– Traversal strategies: Section 7.3.2 in book – visit() method and callback (also 7.3.2) 24

Binary Search Trees

• We often need collections that store items – Maybe a long series of inserts or deletions • We want fast lookup, and often we want to access in sorted order – Lists: O(n) lookup – Could sort them for O(lg n) lookup • Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items • Solution: search tree 25

Binary Search Trees

• Associated with each node is a

key

value that can be compared.

Binary search tree property: – every node in the left subtree has key

whose value is less than the value of the root

s key value, and

every node in the right subtree has

key whose value is greater than the value of the root

s key value.

26

Example

5 4 1 7 3 BINARY SEARCH TREE 8 11 27

Counterexample

8 2 5 7 6 10 4 15 NOT A BINARY SEARCH TREE 11 18 20 21 28

Find and Insert in BST

• Find: look for where it should be • If not there, that ’ s where you insert 29

Recursion and Tree Operations

• Recursive code for tree operations is simple, natural, elegant • Example: pseudo-code for Node.find() boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt else // tgt ’ ’ s data < this.data) next = this.leftChild

s data > this.data

next = this.rightChild

// next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on } 30

Order in BSTs

• How could we traverse a BST so that the nodes are visited in sorted order?

– Does one of our traversal strategies work?

• A very useful property about BSTs • Consider Java ’ s TreeSet and TreeMap – A search tree (not a BST, but be one of its better “ cousins ” ) • In CS2150: AVL trees, Red-Black trees – Guarantee: search times are O(lg n) 31

Deleting from a BST

• Removing a node requires – Moving its left and right subtrees – Not hard if one not there – But if both there?

• Answer: not too tough, but wait for CS2150 to see!

• In CS2110, we ’ ll not worry about this 32

Next: Grammars, Trees, Recursion

• Languages are governed by a set of rules called a grammar – Is a statement legal ?

– Generate or derive a new legal statement • Natural language grammars – Language processing by computers • But, grammars used a lot in computing – Grammar for a programming language – Grammar for valid inputs, messages, data, etc.

33

Backus-Naur Form

• http://en.wikipedia.org/wiki/Backus-Naur_form • BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data • BNF specifics a grammar as a set of derivation rules of this form: ::= • Look at website and example there (also on next slide) – How are trees involved here? Is it recursive?

34

BNF for Postal Address

1.

2.

3.

4.

5.

::= ::= | "." ::= [] | ::= [] ::= "," Example: Ann Marie G. Jones 123 Main St.

Hooville, VA 22901 Where ’ s the recursion?

35

Grammars in Language

• Rule-based grammars describe – how legal statements can be produced – how to tell if a statement is legal • Study textbook, pp. 389-391, to see rule based grammar for simple Java-like arithmetic expressions – four rules for expressions, terms, factors, and letter – Study how a (possibly) legal statement is parsed to generate a parse tree 36

Computing Parse-Tree Example

• Expression: a * b + c 37

Grammar Terms and Concepts

• First, this is what ’ s called a

context free grammar

– For CS2110, let this.) • A CFG has ’ – a starting symbol s not worry about what this means! (But in CS2102, you learn – a set of variables (AKA non-terminals) – a set of terminal symbols – a set of productions 38

Previous Parse Tree

• Terminal symbols: – could be: – could be: • Production:  + * a b c | 39

Natural Language Parse Tree

• Statement: The man bit the dog 40

How Can We Use Grammars?

• Parsing – Is a given statement a valid statement in the language? (Is the statement recognized by the grammar?) – Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.) • Production – Generate a legal statement for this grammar – Demo: generate random statements!

• See link on website next to slides 41

Demo

s Poem-grammar data file

{ The tonight } { sigh portend like die } { waves big yellow flowers slugs } { warily grumpily } • Note: no recursive productions in this example!

42