Learning a language - Oregon Institute of Technology

Download Report

Transcript Learning a language - Oregon Institute of Technology

Learning a language
• Identify the problem domain
– Graphical? AI? Web? Mobile app? Scientific?
Business?
• Identify the program model
–
–
–
–
–
–
imperative
functional
logical
object-oriented
distributed/parallel/concurrent
event-driven
• Identify the syntax and semantics
CST223
2_syntax
1
Syntax & Semantics
• Syntax
– What does a legal program look like in this
programming language?
x = y;
• Semantic
– What does it mean?
Making x’s value to be some
equivalence of the y’s value.
CST223
2_syntax
2
Syntax
• Rules that describe how things fit together legally
• Think of it as pattern matching
• Start with Rules (grammar) and try to see if a
statement/sentence fits
• It doesn’t have to make sense, it just has to look
right
CST223
2_syntax
3
Syntax vs. Semantics
• Are the following C++ statements look right?
int
int
int
int
___
CST223
ab125cds;
_xy;
______________;
___;
= 30;
2_syntax
4
Syntax vs. Semantics
• How about?
int array1[10];
array1[3] = 20;
3[array1] = 30;
CST223
2_syntax
5
Syntax vs. Semantics
What about this?
int main()
{
int x;
x = 1;
};
CST223
2_syntax
6
Syntax
• How to describe what a legal program looks like
in a particular PL?
– Patterns. Rules.
– Grammar: Describe the rules of a language.
• Two equivalent formal notations are used to
describe the rules of statements in a PL.
– BNF & EBNF (Backus-Naur Form or Backus Normal
Form, Extended BNF) - John Backus & Peter Naur
– Context-Free Grammar - Noam Chomsky
CST223
2_syntax
7
John Backus – Fortran 1959
• Fortran is the first high-level programming
language.
• For whatever reason, memory perhaps, Fortran I
doesn’t use any whitespace.
• It makes the compiler’s job a little harder because
it’s purely based on context.
CST223
2_syntax
8
Fortran Example
• For loop in Fortran
Do I = 1, 10
…. //body of for loop
End Do
• Whitespaces don’t mean anything
• Implicit typing. If you don’t declare the type of
something, the compiler decides what type it is.
CST223
2_syntax
9
Fortran Example
• Whitespaces don’t mean anything
DoI=1,10…EndDo
• Implicit typing. If you don’t declare the type of
something, the compiler decides what type it is.
CST223
2_syntax
10
Fortran Example
What if you made a typo?
DoI=1.10…EndDo
• Implicit typing. If you don’t declare the type of
something, the compiler decides what type it is.
• DoI is not declared, but under implicit typing, all
variable names that start with A-I is real, J-R is
int and S-Z is real
• DoI = 10.5 is perfectly legal, it’s just not a for
loop!!
CST223
2_syntax
11
Syntax Descriptions
• Backus-Naur Form (BNF)
• Extended Backus-Naur Form (EBNF)
– Developed by computer scientists out of the need for
more accurate syntax descriptions of a programming
language.
– PLs tend to have lots of options and many levels of
nesting.
– Ambiguity is not tolerated in a programming
language, unlike natural languages
– Having a description format allows compiler writers
to develop efficient parsing techniques.
CST223
2_syntax
12
Syntax Descriptions
• Chomsky’s Language Hierarchy
– Chomsky is not a computer scientist.
– He studies languages and how we communicate.
– Levels of languages
•
•
•
•
Regular
Context-Free
Context-Sensitive
Recursive Enumerable (almost like natural languages, but
not quite)
– Regular & Context-Free Languages are exactly what
programming languages community need to build
compilers!!!
CST223
2_syntax
13
Parsing = Pattern Matching
• Once you have a description of the syntax, then a
part of the compiler’s job is to match the actual
program with the description to determine what is
legal and what’s not.
• This is called parsing.
• At this point in the compilation process, parser is
only concerned about the syntax, what it looks
like, not what it means.
• Hence, it’s called Context-Free.
CST223
2_syntax
14
Context-Free vs. Context-Sensitive
• Example:
x = y;
• Looks like a legal assignment statement.
–
–
–
–
It has a name (identifier) on the left hand side
It has a = operator
It has another identifier on the right hand side
It has a semicolon.
• The parser’s job is done. If it looks okay, it’s
okay.
CST223
2_syntax
15
Context-Free vs. Context-Sensitive
• The compiler’s job, however, is not done.
x = y;
• The compiler still needs to determine what it
means.
–
–
–
–
–
CST223
Are x and y declared before use?
Are x and y the same type?
If not, can one type be made into another type?
Are there const’s involved?
etc.
2_syntax
16
Context-Free vs. Context-Sensitive
• Type checking is adding context to an assignment
statement.
x = y;
• Parser only deals with syntax. No context
necessary. No semantics. Just what it looks like.
• Semantics must be handled in the compiler also
but it’s a separate step from the syntax
description (rules, grammar)
• Context-Free Grammar only deals with syntax.
CST223
2_syntax
17
Grammar
• Two different (but very similar) notations for
expressing syntax:
– BNF or EBNF
– Context-Free Grammar
• You should be able to express the syntax of a
language construct using a grammar notation.
• You should also be able to parse an input string
given a grammar
CST223
2_syntax
18
Grammar: Write a Rule
• An English Sentence
• A C’s union construct
CST223
2_syntax
19
Grammar of a PL P
<program>
<stmt_list>
-> begin <stmt_list> end
-> <stmt>
| <stmt> ; <stmt_list>
<stmt>
-> <var> := <expression>
<var>
-> A | B | C
<expression> -> <var> + <var>
| <var> - <var>
| <var>
CST223
2_syntax
20
Syntax
• Words or characters that actually appear in the
program:
begin
end
;
:=
A
B
C
+
-
– These are called tokens or terminals.
• Words that don’t appear in the program, but help
us to describe what a legal program looks like:
<program> <stmt> <var> <expression> ...
– These are called non-terminals or sentential form.
CST223
2_syntax
21
Grammar Basics
• Each non-terminal can have multiple rules
– Choices in how constructs are fit together.
• Each non-terminal can also be optional
– Denoted by a lambda ()
• The | denotes “or”
CST223
2_syntax
22
Grammar Basics
• Let’s try a simpler grammar:
<Start> ->
a <Start> b
| 
What does this match?
CST223
2_syntax
23
Grammar
• A grammar for a PL contains rules for
constructing legal statements in the language.
• A grammar is also used for recognizing whether a
particular program is legal.
• Parsing: the process to determine if a program is
legal by using the rules of the grammar.
• Parse tree/Derivation tree: a tree structure used in
parsing.
CST223
2_syntax
24
Grammar of a PL P
<program>
<stmt_list>
-> begin <stmt_list> end
-> <stmt>
| <stmt> ; <stmt_list>
<stmt>
-> <var> := <expression>
<var>
-> A | B | C
<expression> -> <var> + <var>
| <var> - <var>
| <var>
CST223
2_syntax
25
Parse Tree
• Given the grammar for PL P.
• Given the following program:
begin A:= B+C; B:=C end
•
•
•
•
Start from <program>
Substitute <program> with begin <stmt_list> end
Substitute <stmt_list> with <stmt>;<stmt_list>
Continue the substitution process until you have all
tokens (nothing else to substitute)
• If the parse is successful then there shouldn’t be any
non-terminals left and all of the tokens in the parse tree
should match the ones in the program.
CST223
2_syntax
26
Parsing Steps
Substitution Process:
<program>
-> begin <stmt_list> end
-> begin <stmt> ; <stmt_list> end
-> begin <var> := <expression>; <stmt_list> end
-> begin A := <expression>; <stmt_list> end
-> …
CST223
2_syntax
27
Parse Tree
Program:
begin A:= B+C; B:=C end
<program>
begin
<stmt_list>
<stmt>
;
end
<stmt_list>
<var> := <expression>
A
CST223
...
2_syntax
28
Parse Tree
begin A:= B+C; B:=C end
Program:
Left-Most Derivation: Always expand the left-most
non-terminal in the tree first!!
<program>
begin
<stmt_list>
<stmt>
;
end
<stmt_list>
<var> :=...
<expression>
CST223
A
2_syntax
29
Parsing Steps
Substitution Process:
<program>
-> begin <stmt_list> end
-> begin <stmt> ; <stmt_list> end
-> begin <var> := <expression>; <stmt_list> end
-> begin A := <expression>; <stmt_list> end
-> …
CST223
2_syntax
30
Semantics
• Parser is only going to check for structural
correctness.
• The program doesn’t have to make sense to a
parser.
• The parser needs to send the parse tree to the next
step to analyze its meaningfulness - semantic
analysis.
CST223
2_syntax
31
Semantics
• Examples:
char * string1 = “Hello”;
• Allocate space on global space to store Hello.
• Allocate space on stack to hold a pointer to string1
• Assign the address of the character ‘H’ in “Hello” to space
on the stack.
string1[1] = ‘a’;
• Find the memory space for the second letter in string1
– *(address of string1 plus 1)
• Replace it with the character ‘a’
CST223
2_syntax
32
Semantics
int num = string1[2];
•
•
•
•
Allocate space for num on the stack
Take the third character in “Hello”
Convert it to its ASCII value
Assign the value to num
3[string1] = ‘a’;
3[“Hello”] = ‘a’;
CST223
• ???
• Find memory location by take the value of 3 and add to it
the value of “string1” (address of “Hello”)
• Assign ‘a’ to that memory
location
2_syntax
33