CSP 506 Comparative Programming Languages

Download Report

Transcript CSP 506 Comparative Programming Languages

CPS 506
Comparative Programming
Languages
Type Systems,
Semantics and Data
Types
Type Systems
• A completely defined language:
Defined syntax, semantics and type
system
• Type: A set of values and operations
– int
• Values=Z
• Operations={+, -, *, /, mod}
– Boolean
• Values={true, false}
• Operations={AND, OR, NOT, XOR}
2
Type Systems
• Type System
– A system of types and their associated
variables and objects in a program
– To formalize the definition of data types
and their usage in a programming language
– A bridge between syntax and semantics
• Type checked in compile time: a part of
syntax analysis
• Type checked in run time: a part of semantics
3
Type Systems (con’t)
• Statically Typed: each variable is
associated with a single type
during its life in run time.
– Could be explicit or implicit
declaration
– Example: C and Java, Perl
– Type rules are defined on abstract
syntax (Static Semantics)
4
Type Systems (con’t)
• Dynamically Typed: a variable type
can be changed in run time
– Example: LISP, JavaScript, PHP
Java Script example:
List = [10.2 , 3.5]
…
List = 47
– Less reliable, difficult to debug
– More flexible
– Fast compilation
– Slow execution (Type checking in run-time)
5
Type Systems (con’t)
• Type Error: a non well-defined
operation on a variable in run time
– Example: union in C
union flexType {
int i;
float f;
};
union flexType u;
float x;
…
u.I = 10;
x = u.f;
…
– Another example in C ?
6
Type Systems (con’t)
• Strongly Typed: All type errors are detected in compile or
run time before execution
– More reliable
– Example: Java is nearly strongly typed, but C is not
x+1 regardless of the type x
– Coercion (implicit type conversion) rules have an effect on
strong typing
• Weak type example
x = 2;
y = “5”;
print x+y
Visual Basic: 7
JavaScript: “25”
7
Type Systems (con’t)
• Type Safe: A language without
type error
–Strongly Typed -> Type Safe
–Example: Java, Haskell, and ML
8
Type Binding
• The process of associating an
attribute, name, location, value, or
type, to an object
• Example
Identifier i is bound to the
integer type and to a location
specified by the underlying compiler
= 10; Identifier i is bound to value 10
or value 10 is bound to a location
int i;
i
9
Type Binding (con’t)
• Binding time
– Language definition time
• Java: Integers are bound to int, and real numbers are
bound to float
– Language implementation time
• Bounding real values to IEEE 754 standard
– Program writing time
• Declaration of variables
– Compile/Load time
• Bounding static objects to stack or fixed memory
• Execution code is assigned to a memory block
– Run time
• Value are bound to variables
10
Type Binding (con’t)
• Early binding
– An element is bound to a property as early as
possible
– The earlier the binding the more efficient the
language
• Late Binding
– Delay binding until the last possible time
– The later the binding the more flexible the language
– Supports overloading and overriding in Object
Oriented languages
– C++ example ?
11
Type Checking
• Type checking is the activity of ensuring that
the operands of an operator are of
compatible types
• A compatible type is one that is either legal
for the operator, or is allowed under language
rules to be implicitly converted, by compilergenerated code, to a legal type
• If all type bindings are static, nearly all type
checking can be static
• If type bindings are dynamic, type checking
must be dynamic
12
Type Conversion
• A narrowing conversion is one that
converts an object to a type that
cannot include all of the values of
the original type e.g. float to int
• A widening conversion is one in
which an object is converted to a
type that can include at least
approximations to all of the values
of the original type e.g. int to float
13
Type Conversion (con’t)
• Implicit type conversion
(Coercion)
–decreases type error detection
ability. In most languages, all numeric
types are coerced in expressions,
using widening conversions. Ada has no
implicit Conversion
14
Type Conversion (con’t)
–C
double d;
long
l;
int
i;
…
d = i;
l = i;
if (d == l)
l;
– Java
int x;
double d;
x = 5;
d = x + 2;
d = 2 *
15
Type Conversion (con’t)
• Explicit type conversion (Casting)
– ( type-name ) cast-expression
•C
double d = 3.14;
int i = (int) d;
• Java
boolean t = true;
byte b = (byte) (t ? 1 : 0);
• Ada (similar to function call)
3 * Integer(2.0)
2.0 + Float(2)
16
Semantic Domains
• Semantic Domain
– A set with well-defined properties and
operations
– Environment
• A set of pairs <variable, location>
– Memory
• A set of pairs <location, value>
• State
– Product of environment and its memory
σ = { <Var1, Val1>, <Var2, Val2>,…, <Varn, Valn>}
17
Semantic Domains (con’t)
• Three ways to define the
meaning of a program
–Operational Semantics
• Program is interpreted as a set of
sequences of computational steps
• A set of execution rules
Premise -> Conclusion
σ(x) => 4 and σ(y) => 2 -> σ(x+y) => 6
18
Semantic Domains (con’t)
• Three ways to define the meaning of
a program
– Operational Semantics (con’t)
• Usage
– Language manuals and textbooks
– Teaching programming languages
• Structural: define program behavior in
terms of the behavior of its parts
• Natural: define program behavior in terms
of its overall effects, and not from its
single steps
19
Semantic Domains (con’t)
– Axiomatic Semantics
• The program does what it is supposed to do
• Agreement of the program result and
specification
• Formal verification of a program using logic
expressions, assertions
• Hoare triple
{Pre-condition} s {Post-condition}
• Example
{a = 2} b = a; {b = 2}
• Weakest Pre-condition
{?} a = b+1; {a > 1}
20
Semantic Domains (con’t)
– Axiomatic Semantics (con’t)
• Axioms
{P}a{Q}, P  P, Q  Q
{P}a{Q}
{P}a{Q}, {P}a{R}
– Rule of Conjunction
{P}a{Q  R}
true
– Rule of Assignment (s : b = a)
{Q[a \ b]}s{Q}
– Rule of sequence {P}s1{R}, {R}s2{Q}
{P}s1s2{Q}
– Rule of Consequence
– Rule of Condition
s : if c then a else b
{P  c}a{Q}, {P  c}b{Q}
{P}s{Q}
21
Semantic Domains (con’t)
– Axiomatic Semantics (con’t)
• Axioms
{I  c}b{I }
– Rule of Loop
{I }s{I  c}
s : while c do b end
– I is loop invariant
– Loop Invariant is true before the loop, at
the bottom of the loop in each iteration,
and when the loop is terminated.
– Find the loop invariant to prove the
correctness of the loop
22
Semantic Domains (con’t)
– Denotational Semantics
• Define the meaning of statement as a statetransforming mathematical function
• A state of a program indicates the current
values of the active objects
• Example
– Denotational semantics of Integer arithmetic
expressions
» Production rules:
Number ::= N D | D
Digit ::= 0 | 1 | … | 9
Expression ::= E1 + E2 | E1 – E2 | E1 * E2 | E1 /
E2| (E) | N
23
Semantic Domains (con’t)
– Denotational Semantics (con’t)
– Semantic domain:
Integer = { …, -1, 0, 1, …}
– Semantic functions:
Value: Numner => Number
Digit: Digit => Number
Expr: Expression => Integer
– Auxiliary functions:
plus: Number + Number => Number
…
– Semantic equations:
Expr[[E1+E2]] = plus(Expr[E1] , Expr[E2])
24
Data Types
• Elements of a data type
–
–
–
–
Set of possible values
Set of operations
Internal representation
External representation
• Type information
– Implicit
• 5 is implicitly integer
• I is integer, implicitly, in Fortran
– Explicit
• Using variable or function declaration
25
Data Types (con’t)
• Data type classifications
– Built-in
• Included in the language definition
– Primitive
– Composite
– Recursive
– User-defined
• Data types defined by users
• Declared and defined before usage
26
Primitive Data Types
• Unstructured and indivisible entities
• Integer, Real, Boolean, Char
• Depends to the language application
domain
– COBOL: fixed-length strings and
fixed-point numbers
– SNOBOL: Strings with different
length
– Scheme: integer, rational, real, complex
27
Primitive Data Types (con’t)
• Example
–C
• int, float, char
– Java
• int, float, char, boolean
– Pascal
• Integer, Char, Real, Longint
– ML
• bool, real, int, word, char
– Scheme
• integer?, real?, boolean?, char?
28
Primitive Data Types (con’t)
• Integer
– Almost always an exact reflection of
the hardware so the mapping is
trivial
– There may be as many as eight
different integer types in a language
– Java’s signed integer sizes: byte, short,
int, long
29
Primitive Data Types (con’t)
• Float
– Model real numbers, but only as
approximations
– Languages for scientific use support at
least two floating-point types (e.g.,
float and double; sometimes more
– Usually exactly like the hardware, but
not always
– IEEE Floating-Point
– Standard 754
30
Primitive Data Types (con’t)
• Complex
– Some languages support a complex
type, e.g., C99, Fortran, and Python
– Each value consists of two floats,
the real part and the imaginary part
– Literal form (in Python):
(7 + 3j), where 7 is the real part and
3 is the imaginary part
31
Primitive Data Types (con’t)
• Decimal
– For business applications (money)
• Essential to COBOL
• C# offers a decimal data type
– Store a fixed number of decimal digits,
in coded form (BCD) (Binary-Coded
Decimal)
– Advantage: accuracy
– Disadvantages: limited range, wastes
memory
32
Primitive Data Types (con’t)
• Boolean
–Simplest of all
–Range of values: two elements,
one for “true” and one for “false”
–Could be implemented as bits, but
often as bytes
33
Primitive Data Types (con’t)
• Character
– Stored as numeric codings
– Most commonly used coding: ASCII
– An alternative, 16-bit coding: Unicode
(UCS-2) (Universal Character Set)
• Includes characters from most natural
languages
• Originally used in Java
• C# and JavaScript also support Unicode
– 32-bit Unicode (UCS-4)
• Supported by Fortran, starting with 2003
34
Composite Data Types
• Structured or compound types
• Array, String, Enumeration, Pointer,
Record, List, Function
• Homogeneous like Array
• Heterogeneous like Record
• Fixed size like Array
• Dynamic size like Linked List
• Inside the core or as a separate
library
35
Composite Data Types (con’t)
• Example
–C
• Array ([]), Pointer (*),
Struct, enum
–Java
• String, Array
–Pascal
• Record, Array, Pointer
(^)
36
Composite Data Types (con’t)
• String
– C and C++
• Not primitive
• Use char arrays and a library of functions that provide
operations
– SNOBOL4 (a string manipulation language)
• Primitive
• Many operations, including elaborate pattern matching
– Fortran and Python
• Primitive type with assignment and several operations
– Java
• Primitive via the String class
– Perl, JavaScript, Ruby, and PHP
• Provide built-in pattern matching, using regular expressions
37
Composite Data Types (con’t)
• String length option
– Static: COBOL, Java’s String class
– Limited Dynamic Length: C and C++
• In these languages, a special character is used
to indicate the end of a string’s characters,
rather than maintaining the length
– Dynamic (no maximum): SNOBOL4, Perl,
JavaScript
– Ada supports all three string length options
38
Composite Data Types (con’t)
• String Implementation
– Static length: compile-time descriptor
– Limited dynamic length: may need a
run-time descriptor for length (but
not in C and C++)
– Dynamic length: need run-time
descriptor; allocation/de-allocation is
the biggest implementation problem
39
Composite Data Types (con’t)
• Enumeration
– All possible values, which are named constants, are
provided in the definition
– C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
– Design issues
• Is an enumeration constant allowed to appear in more
than one type definition, and if so, how is the type of an
occurrence of that constant checked?
• Are enumeration values coerced to integer?
• Any other type coerced to an enumeration type?
40
Composite Data Types (con’t)
• Enumeration (con’t)
– Aid to readability, e.g. no need to code a color as a
number
enum Colors {Red, Blue, Green, Yellow};
– Aid to reliability, e.g. compiler can check:
• operations (don’t allow colors to be added)
• No enumeration variable can be assigned a value outside
its defined range
• Ada, C#, and Java 5.0 provide better support for
enumeration than C++ because enumeration type
variables in these languages are not coerced into integer
types
41
Composite Data Types (con’t)
• Sub-range Types
– An ordered contiguous subsequence of an ordinal
type
• Example: 12..18 is a sub-range of integer type
– Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekday;
Day2 := Day1;
42
Composite Data Types (con’t)
• Enumeration and Sub-range
implementation
– Enumeration types are implemented as
integers
– Sub-range types are implemented like the
parent types with code inserted (by the
compiler) to restrict assignments to subrange variables
43
Composite Data Types (con’t)
• Array
– An array is an aggregate of homogeneous
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
– A heterogeneous array is one in which the
elements need not be of the same type
• Supported by Perl, Python, JavaScript, and Ruby
44
Composite Data Types (con’t)
• Array Index Type
– FORTRAN, C: integer only
– Ada: integer or enumeration (includes Boolean and
char)
– Java: integer types only
– Index range checking
• C, C++, Perl, and Fortran do not specify range checking
• Java, ML, C# specify range checking
• In Ada, the default is to require range checking, but it
can be turned off
45
Composite Data Types (con’t)
• Array Initialization
– C-based languages
int list [] = {1, 3, 5, 7}
char *names [] = {“Mike”, “Fred”,“Mary Lou”};
– Ada
List : array (1..5) of Integer := (1 => 17, 3
=> 34, others => 0);
– Python
List comprehensions
list = [x ** 2 for x in range(12) if x % 3 == 0]
puts [0, 9, 36, 81] in list
46
Composite Data Types (con’t)
• Array Operations
– APL provides the most powerful array processing
operations for vectors and matrixes as well as
unary operators (for example, to reverse column
elements)
– Ada allows array assignment but also concatenation
– Python’s array assignments, but they are only
reference changes. Python also supports array
concatenation and element membership operations
47
Composite Data Types (con’t)
• Array Operations (con’t)
– Ruby also provides array concatenation
– Fortran provides elemental operations
because they are between pairs of array
elements
– For example, + operator between two
arrays results in an array of the sums of
the element pairs of the two arrays
48
Composite Data Types (con’t)
• Rectangular and Jagged Arrays
– A rectangular array is a multi-dimensioned array in
which all of the rows have the same number of
elements and all columns have the same number of
elements
– A jagged matrix has rows with varying number of
elements
• Possible when multi-dimensioned arrays actually appear as
arrays of arrays
– C, C++, and Java support jagged arrays
– Fortran, Ada, and C# support rectangular arrays
(C# also supports jagged arrays)
49
Composite Data Types (con’t)
• Slices
– A slice is some substructure of an array; nothing
more than a referencing mechanism
– Slices are only useful in languages that have array
operations
– Fortran 95
Integer, Dimension (10) :: Vector
Integer, Dimension (3, 3) :: Mat
Integer, Dimension (3, 3, 4) :: Cube
Vector (3:6) is a four element array
– Ruby supports slices with the slice method
list.slice(2, 2) returns the third and fourth
elements of list
50
Composite Data Types (con’t)
51
Composite Data Types (con’t)
• Array Access
– Access function maps subscript expressions to an
address in the array
– Access function for single-dimensioned arrays:
address(list[k]) = address (list[lower_bound])
+ ((k-lower_bound) * element_size)
– Two common ways:
• Row major order (by rows) – used in most languages
• column major order (by columns) – used in Fortran
52
Composite Data Types (con’t)
• Record
– A record is a possibly heterogeneous aggregate of
data elements in which the individual elements are
identified by names
– COBOL uses level numbers to show nested records;
others use recursive definition
01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID
PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
53
Composite Data Types (con’t)
• Record (con’t)
– Ada
type Emp_Rec_Type is record
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float;
end record;
Emp_Rec: Emp_Rec_Type;
54
Composite Data Types (con’t)
• Record (con’t)
– Pascal
MonthType =
(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct
,Nov,Dec);
DateType = record
Month : MonthType;
Day : 1..31;
Year : 1900..2000;
end;
55
Composite Data Types (con’t)
• Record (con’t)
–C
struct student_type {
char name[20];
int ID;
}
56
Composite Data Types (con’t)
• Record (con’t)
– Java: No record in Java. It is defined using
class.
class Person {
String name;
int id_number;
Date birthday;
int age;
}
57
Composite Data Types (con’t)
• Pointer and Reference Types
– A pointer type variable has a range of values that
consists of memory addresses and a special value,
nil
– Provide the power of indirect addressing
– Provide a way to manage dynamic memory
– A pointer can be used to access a location in the
area where storage is dynamically created (usually
called a heap)
58
Composite Data Types (con’t)
• Pointer Design Issues
– What are the scope and lifetime of a pointer
variable?
– Are pointers restricted as to the type of value to
which they can point?
– Are pointers used for dynamic storage
management, indirect addressing, or both?
– Should the language support pointer types,
reference types, or both?
59
Composite Data Types (con’t)
• Pointer Operations
– Two fundamental operations: assignment and
dereferencing
– Assignment is used to set a pointer variable’s value
to some useful address
– Dereferencing yields the value stored at the
location represented by the pointer’s value
• Dereferencing can be explicit or implicit
• C++ uses an explicit operation via *
j = *ptr
sets j to the value located at ptr
60
Composite Data Types (con’t)
• Pointer Illustration
– The assignment operation j = *ptr
61
Composite Data Types (con’t)
• Pointer Problems
– Dangling pointers (dangerous)
• A pointer points to a heap-dynamic variable that has been
de-allocated
– Lost heap-dynamic variable
• An allocated heap-dynamic variable that is no longer
accessible to the user program (often called garbage)
– Pointer p1 is set to point to a newly created heap-dynamic variable
– Pointer p1 is later set to point to another newly created heapdynamic variable
– The process of losing heap-dynamic variables is called memory
leakage
62
Composite Data Types (con’t)
• Pointer Problems (con’t)
– Ada
• Some dangling pointers are disallowed because
dynamic objects can be automatically deallocated at the end of pointer's type scope
– C, C++
• Extremely flexible but must be used with care
• Pointers can point at any variable regardless of
when or where it was allocated
• Used for dynamic storage management and
addressing
63
Composite Data Types (con’t)
• Pointer Problems (con’t)
– C, C++
• Pointer arithmetic is possible
• Explicit dereferencing and address-of
operators
• Domain type need not be fixed (void *)
void * can point to any type and can be type
checked (cannot be de-referenced)
64
Composite Data Types (con’t)
• Pointer Arithmetics in C, C++
float stuff[100];
float *p;
p = stuff;
*(p+5)
*(p+i)
is equivalent to
is equivalent to
stuff[5]
stuff[i]
and
and
p[5]
p[i]
65
Composite Data Types (con’t)
• Reference Types
– C++ includes a special kind of pointer type called a
reference type that is used primarily for formal
parameters
• Advantages of both pass-by-reference and pass-by-value
– Java extends C++’s reference variables and allows
them to replace pointers entirely
• References are references to objects, rather than being
addresses
– C# includes both the references of Java and the
pointers of C++
66
Composite Data Types (con’t)
• Heap Management
– A very complex run-time process
– Single-size cells vs. variable-size cells
– Two approaches to reclaim garbage
• Reference counters (eager approach):
reclamation is gradual
• Mark-sweep (lazy approach): reclamation
occurs when the list of variable space
becomes empty
67
Composite Data Types (con’t)
• Heap Management (con’t)
– Reference counters
• Maintain a counter in every cell that store the
number of pointers currently pointing at the cell
• Disadvantages: space required, execution time
required, complications for cells connected
circularly
• Advantage: it is intrinsically incremental, so
significant delays in the application execution
are avoided
68
Composite Data Types (con’t)
• Heap Management (con’t)
– Mark-Sweep
• The run-time system allocates storage cells as requested and
disconnects pointers from cells as necessary; mark-sweep then
begins
• Every heap cell has an extra bit used by collection algorithm
• All cells initially set to garbage
• All pointers traced into heap, and reachable cells marked as not
garbage
• All garbage cells returned to list of available cells
• Disadvantages: in its original form, it was done too infrequently.
When done, it caused significant delays in application execution.
Contemporary mark-sweep algorithms avoid this by doing it more
often—called incremental mark-sweep
69
Recursive Data Types
• Recursive or circular data types
• Type composed from objects of
the same type
• Example
–Linked list in C and Pascal
–ML
datatype intlist = nil | cons of int * intlist
5
10
70
Exercises
1. Determine which of the following
programming languages are statically typed
or not: (Explain by example)
–
–
–
–
–
–
–
Ada
Perl
Python
Haskell
Prolog
Fortran
Ruby
71
Exercises
2. Bring another example of type error in C.
3. Show two examples for early and late binding
in a language.
4. Is there any programming language which
does not allow implicit type conversion, say
int to float?
5. Which type of coercions is not safe?
6. compute the Weakest Pre-condition of
{?} a = b * -1; {a > 10}
72
Exercises
2. Using an example, show the rule of
consequence in axiomatic semantic.
{P}a{Q}, P  P, Q  Q
{P}a{Q}
3. Find the loop invariant of the following while
loop.
i = 1;
s = 0;
while (i <= 10) {
s = s + i;
i = i + 1;
}
73
Exercises
7. Which programming language(s) except Ada
and different versions of C, support pointer?
8. What are the rules of call-by-value and callby-reference in Pascal? Give examples.
9. Name two programming languages which have
automatic garbage collection. What are the
negative and positive effects of this
operation in a language?
74