Lecture Data Structures and Practise

Download Report

Transcript Lecture Data Structures and Practise

Lecture 6
Concepts of Programming
Languages
Arne Kutzner
Hanyang University / Seoul Korea
Topics
•
•
•
•
•
•
•
•
Primitive Data Types
Character String Types
User-Defined Ordinal Types
Array Types
Associative Arrays
Record Types
Union Types
Pointer and Reference Types
Concepts of Programming Languages
L4.2
Introduction
•
•
A data type defines a collection of data
objects and a set of predefined operations
on those objects
Design issue for data types:
What operations are defined and how are
they specified?
Concepts of Programming Languages
L4.3
Primitive Data Types
• Primitive data types:
Data types that are not defined in terms of
other data types
– Some primitive data types are some times merely
reflections of the hardware
– Others require only a little non-hardware support
for their implementation (Code for supporting
floating-point numbers on processor that does not
support these.)
• Almost all programming languages provide a
set of primitive data types
Concepts of Programming Languages
L4.4
Primitive Data Types: Integer
• For representing numbers without
fractions (without dot)
• Often there are several different integer
types with different sizes:
– E.g.: Java’s signed integer sizes: byte,
short, int, long
Concepts of Programming Languages
L4.5
Primitive Data Types:
Floating Point
• Model real numbers (but only as
approximations)
• Many languages support at least two floatingpoint types (e.g., float and double) of
different sizes
• There is a standard for floating-point number:
• IEEE Floating-Point Standard 754
Concepts of Programming Languages
L4.6
Representation of
floating-point values
32 bit single
precision float
64 bit double
precision float
Concepts of Programming Languages
L4.7
Floating point example
• IEEE representation scheme scheme:
exponent
fraction
Example:
bit 23
bit 1
exponent is 1
Concepts of Programming Languages
L4.8
Problems with floating point
values
• The IEEE representation scheme can
result in unexpected approximations in
the context of decimal numbers: E.g.
the decimal value 0.1
• Experiment: sum up 1Mio times 0.1 and
check the result
Concepts of Programming Languages
L4.9
Primitive Data Types:
Complex
• Some languages support a complex type,
(e.g., Fortran and Python)
• Representation:
Each value consists of two floats, the real part
and the imaginary part
• Example:
Literal form in Python:
(7 + 3j), where 7 is the real part and 3 is
the imaginary part
Concepts of Programming Languages
L4.10
Primitive Data Types: Decimal
• Stores a number by using decimal digits (4 bits per
digit)
• Advantage: accuracy
• Disadvantages: limited range, wastes memory
• Supported by e.g.:
– COBOL
– C#
• Important in the context business applications
(money)
Concepts of Programming Languages
L4.11
Primitive Data Types: Boolean
• Simplest of all
• Range of values: “true” and “false”
• Could be implemented as bits, but often
as bytes
Concepts of Programming Languages
L4.12
Primitive Data Types:
Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
– Includes characters from most natural
languages
– Originally used in Java
– C# and JavaScript also support Unicode
Concepts of Programming Languages
L4.13
Character String Types
• Values are sequences of characters
• Design issues:
– Supported as
• primitive data type or
• build on top of other data-types (e.g. arrays)
– Support of “immutable” strings
– Pattern-matching as part of the language
definition
Concepts of Programming Languages
L4.14
Character String Types
Operations
• Typical operations:
– Assignment and copying
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching (support of regular
expressions)
Concepts of Programming Languages
L4.15
Character String Type in
Certain Languages
• C and C++
– Not primitive
– Use char arrays and a library of functions that provide
operations
• C++
– Not primitive two approaches (C-strings and std::stringclass)
• Fortran and Python
– Primitive type with assignment and several operations
• Java
– Semi-primitive via the String and StringBuffer classes
• Perl, JavaScript, Ruby, and PHP
– Primitive type
– built-in pattern matching by using regular expressions
Concepts of Programming Languages
L4.16
Ordinal Types
• An ordinal type is one in which the range of
possible values can be easily associated with
the set of positive integers
• Examples of primitive types that represent
ordinal types
– integer, char, boolean
• User defined ordinal types
– Enumerated types
– Subrange types
Concepts of Programming Languages
L4.17
Enumeration Types
• Possible values are named constants
– The named constants are provided in the
context of data type definition
– Examples:
C#:
enum
sat,
Ada:
type
fri,
days {mon, tue, wed, thu, fri,
sun};
Days is (mon, tue, wed, thu,
sat, sun);
• Represent user defined ordinal types
Concepts of Programming Languages
L4.18
Enumeration Types (cont.)
• Design Issue: are enumeration values
coerced to integer?
– In C, C++ coerced to integer types
– In Ada, C#, Java 5.0 no coercion to integer
types
• Purposes:
– Aid to readability, e.g., no need to code a color as
a number
– Aid to reliability, due to improved type safeness.
(But only if not coerced to integers)
Concepts of Programming Languages
L4.19
Subrange Types
• Ordered contiguous subsequence of an ordinal
type
• Ada examples:
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Concepts of Programming Languages
L4.20
Purposes of subrange types
• Improves readability
– Informs readers that variables can only
store certain range of values
• Improves reliability
– Allows convenient subrange checking /
subrange monitoring
Concepts of Programming Languages
L4.21
Implementation of UserDefined Ordinal Types
• Enumeration types are normally
implemented as integers
• Subrange types are implemented like
the parent types with code inserted (by
the compiler) for range checking
Concepts of Programming Languages
L4.22
Array Types
• An array is an aggregate of
homogeneous data elements in which
an individual element is identified by its
position in the aggregate, relative to the
first element.
Concepts of Programming Languages
L4.23
Array Indexing
• Indexing (or subscripting) is a mapping
from indices to elements
array_name (index_value_list) 
element
an
• Index Syntax
– FORTRAN, PL/I, Ada use parentheses
– Most other languages use brackets
Concepts of Programming Languages
L4.24
Array Indexing (cont.)
• Types of array indices:
– Integer only: Java, C, C++, Fortran
– Ordinal types in general: Ada
• Index Range Checking
– By default: Java, C#
– Not at all: C, Fortran
– On demand: C++, Ada
Concepts of Programming Languages
L4.25
Integration of Arrays
• Direct via the language definition
Examples: C, Fortran, Perl
• Indirect via an appropriate class
definition
Examples: Java, C#
– In this case arrays represent objects.
• Languages can support both concepts
simultaneously
Example: C++
Concepts of Programming Languages
L4.26
Arrays Memory Aspects
• Location of array:
– Static (allocation during compile time)
– Stack (allocation during run time)
– Heap (allocation during run time)
• Size management
– Static array sizes (known during compile
time)
– Dynamic management during runtime
(automatic expansion if required)
Concepts of Programming Languages
L4.27
Multidimensional arrays
Rectangular versus Jagged Arrays
• A rectangular array is a multidimensioned array in which all of the
rows have the same number of
elements and all columns have the
same number of elements
• A jagged array has rows with varying
number of elements
– Possible when multi-dimensioned arrays
actually appear as arrays of arrays
Concepts of Programming Languages
L4.28
Addressing schemes for
rectangular arrays
• Two common ways:
– Row major order (by rows) – used in most
languages
– column major order (by columns) – used in
Fortran
Concepts of Programming Languages
L4.29
Row-order addressing scheme
• Formula:
Location (a[I,j]) = address of a [row_lb,col_lb]
+ (((I - row_lb) * n)
+ (j - col_lb)) * element_size
Concepts of Programming Languages
L4.30
Array Initialization
• Many languages allow an array initialization
in the context of the array definition.
Example for C, C++, Java, C#:
int list [] = {4, 5, 7, 83};
Concepts of Programming Languages
L4.31
Heterogeneous Arrays
• A heterogeneous array is one in which
the elements need not be of the same
type
• Supported by Perl, Python, JavaScript,
and Ruby
Concepts of Programming Languages
L4.32
Slices
• A slice is some substructure of an array
• more or less a referencing mechanism
• Examples: Newer editions of Fortran
(e.g. Fortran 95)
Concepts of Programming Languages
L4.33
Slices (cont.)
Concepts of Programming Languages
L4.34
Associative Arrays
•
An associative array is an array where the
index (subscript) is not of ordinal type.
– E.g. Indices of type string
•
Design issue:
- What is the form of references to elements?
Concepts of Programming Languages
L4.35
Associative Arrays in Perl
• Names begin with %; literals are delimited
by parentheses
%hi_temps = ("Mon" => 77, "Tue" => 79,
"Wed" => 65);
• Subscripting is done using braces and keys
$hi_temps{"Wed"} = 83;
Concepts of Programming Languages
L4.36
Record Types / Structures
• A record is a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by
names
• Design issues:
– Syntactic form of references to fields
– Support of elliptical references
Concepts of Programming Languages
L4.37
C++ Example
• Definition of Datatype Emp_Rec_Type:
struct Emp_Rec_Type {
std::string First;
std::string Mid;
std::string Last;
float Hourly_Rate;
};
Data definition:
Emp_Rec_Type Emp_Rec;
Concepts of Programming Languages
L4.38
Ada Example
• Definition of Datatype Emp_Rec_Type:
type Emp_Rec_Type is record
First: String;
Mid: String;
Last: String;
Hourly_Rate: Float;
end record;
Data definition:
Emp_Rec: Emp_Rec_Type;
Concepts of Programming Languages
L4.39
COBOL Example
• COBOL uses level numbers for creating
nested records
• Definition of record EMP-REC:
01 EMP-REC.
string consisting of
Level 02 EMP-NAME.
20 characters
numbers
05 FIRST PIC X(20).
05 MID
PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
Concepts of Programming Languages
L4.40
References to Record/Structure
fields
• Record field references
– Dot notation (almost all languages):
record.field_name
Example:
Emp_Rec.Mid
– Exception COBOL
field_name OF record
Example:
FIRST OF EMP-NAME OF EMP-REC
Concepts of Programming Languages
L4.41
Fully qualified versus elliptical
references
• Fully qualified references must
include all components
• Elliptical references allow leaving out
record names as long as the reference
is unambiguous,
– Example in COBOL:
FIRST, FIRST OF EMP-NAME, and
FIRST of EMP-REC are elliptical
references to the employee’s first name
Concepts of Programming Languages
L4.42
Comparison to Arrays
• Records are used when the collection of
data values is heterogeneous
• Access to array elements is a bit slower
than access to record fields, because
subscripts are dynamic (field names are
static)
Concepts of Programming Languages
L4.43
Unions Types
• A union is a type whose variables are
allowed to store different type values at
different times during execution
• Design issues
– Type-checking on union level
– Embedding with respect to
records/structures
Concepts of Programming Languages
L4.44
Discriminated vs. Free Unions
• Fortran, C, and C++ provide union
constructs without support of type
checking for unions
– This type of union is called free union
• Unions with type-checking (during
runtime) are called tagged unions or
discriminated unions
– Requires additional type indicator
– Supported e.g by Ada
Concepts of Programming Languages
L4.45
Ada Example for record with
union
type Shape is (Circle, Triangle, Rectangle);
type Colors is (Red, Green, Blue);
type Figure (Form: Shape) is record
Filled: Boolean;
Color: Colors;
case Form is
when Circle => Diameter: Float;
when Triangle =>
Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case;
end record;
Concepts of Programming Languages
L4.46
Example graphically illustrated
Concepts of Programming Languages
L4.47
Unions - Discussion
• Free unions are unsafe
– Do not support type checking during runtime
• Java and C# do not support unions
– Reflective of growing concerns for safety in
programming language
• Descriminated unions are safer than free
unions but you have to pay a price
– Type check during runtime consumes additional
computational resources
Concepts of Programming Languages
L4.48
Pointer and Reference Types
• A pointer type variable has a range of values
that consists of memory addresses and a
special value, nil/null/NULL
• Provides the power of indirect addressing /
memory access
• Provides a way for dynamic memory
management
– Heap allocation / access
Concepts of Programming Languages
L4.49
Issues with Pointers
• Existence of untyped pointer variables
(something like void * in C)
• Existence of reference types (e.g. C++)
• Explicit or implicit dereferencing
– C/C++: explicit via *
– Java: implicit (all non primitive types are
references)
• In the case of heap-management via
pointers:
– Allocation/Deallocation scheme
1. By hand using some new and delete
2. Automatic allocation and automatic deallocation
(automatic garbage collection)
Concepts of Programming Languages
L4.50
Pointer Operations
• Two fundamental operations
– Assignment is used to set a pointer
variable’s value to some useful address
– Dereferencing yields the value stored at
the location represented by the pointer’s
value
Concepts of Programming Languages
L4.51
Pointer Assignment Illustrated
The assignment operation j = *ptr
Concepts of Programming Languages
L4.52
Problems with Pointers and
heaps
• Dangling pointers
– A pointer points to a heap-dynamic variable
that has been deallocated
• Lost heap-dynamic variable / lost
heap space
– There is an allocated heap-area that is no
longer referred by any pointer
• Resulting effect of “losing memory” is called
memory leakage
Concepts of Programming Languages
L4.53
Pointers in C and C++
• Extremely flexible but must be used with care
• Pointers can point at any variable regardless of when
or where it was allocated
• Used for dynamic storage management and
addressing
• Pointer arithmetic is possible
• Explicit dereferencing and address-of operators
• Untype pointers (void *)
void * can point to any type..
Concepts of Programming Languages
L4.54
Reference Types
• C++ includes a special kind of pointer type
called a reference type that is used primarily
for formal parameters
– Advantages of both pass-by-reference and passby-value
• Java extends C++’s reference variables and
allows them to replace pointers entirely
– References are references to objects, rather than
being addresses
• C# includes both the references of Java and
the pointers of C++
Concepts of Programming Languages
L4.55
Dangling Pointer Problem
• Tombstone: extra heap cell that is a pointer to the
heap-dynamic variable
– The actual pointer variable points only at tombstones
– When heap-dynamic variable de-allocated, tombstone
remains but set to nil
– Costly in time and space
• Locks-and-keys: Pointer values are represented as
(key, address) pairs
– Heap-dynamic variables are represented as variable plus
cell for integer lock value
– When heap-dynamic variable allocated, lock value is created
and placed in lock cell and key cell of pointer
Concepts of Programming Languages
L4.56
Automatic Heap Management
• Complex run-time process
• Single-size cells vs. variable-size cells
• Two approaches to reclaim garbage
– Reference counters (eager approach):
reclamation is gradual
– Mark-sweep (lazy approach):
reclamation occurs in memory tight
situations
Concepts of Programming Languages
L4.57
Reference Counter
• Reference counters:
Maintain a counter in every cell that
store the number of pointers currently
pointing at the cell
– Advantage: intrinsically incremental, so
significant delays in the application
execution are avoided
– Disadvantage: complications for cells
connected circularly
Concepts of Programming Languages
L4.58
Mark-Sweep
• The run-time system allocates storage cells
as requested and disconnects pointers from
cells as necessary. Method:
– Every heap cell has an extra bit used by collection
algorithm
– All cells initially set to garbage
– All pointers traced into heap, and reachable cells
marked as not garbage
– All garbage cells returned to list of available cells
• Disadvantage: When done, it causes
significant delays during application
execution.
Concepts of Programming Languages
L4.59
Marking Algorithm
Concepts of Programming Languages
L4.60
Variable-Size Cells
• All the difficulties of single-size cells plus
more
• Required by most programming languages
• If mark-sweep is used, the marking process
becomes more difficult
Concepts of Programming Languages
L4.61