Lecture note 18-19

Download Report

Transcript Lecture note 18-19

MT311 Java Programming and
Programming Languages
Li Tak Sing (李德成)
Programming Languages

You need to buy the following book:
Concepts of Programming Languages (7th
Edition) by Robert W. Sebesta.
Programming Languages

We are not teaching programming in this
module. We are teaching different features
of different programming languages and
comparing their advantages and
disadvantages.
Motivation to learn programming
languages


Simulate a useful facility, which is not
supported in the language you are forced to
use. For example, after studying how virtual
methods are implemented in C++, you may
try to simulate this facility in C.
Choose a suitable language for a project.
After you know the criteria for critically
evaluating a programming language, you are
much more capable of choosing an
appropriate language.
Motivation to learn programming
languages

Learn or design a new language more easily.
Most languages share common fundamental
principles. For example, after studying the
object-oriented language paradigm, you
should find it easier to learn C++ or Java.
Motivation to learn programming
languages

Write codes that are more efficient or fix a
bug more easily. By studying the
implementation and mechanism of different
constructs, you should find that you are able
to understand a language more deeply. For
example, after studying the mechanism of
parameter passing, you would avoid passing
a large array as value parameter to a
function in Pascal.
Criteria for language evaluation




readability
writability
reliability and
cost.
Readability

Readability of a program measures how
easily it can be understood. If a program is
difficult to read, then it is difficult to
implement and maintain. Since the
maintenance stage is the longest stage in the
software life cycle, it is important for a
language to have high readability. The
following reading discusses the factors that
affect the readability of a language.
Readability
Five characteristics of programming
languages can affect their readability:
1 Overall simplicity of a programming language.
There are three factors that determine overall
simplicity.
–
Number of basic components. For example, there
are 31 different binary operators in C:
Readability
Even a competent programmer would find it difficult
to remember all of their functions.
– Alternative means for accomplishing the same
operation. The method that is used by the author
may not be one with which the reader is also
familiar. For example, in C, the 3rd element of an
array A can be addressed in two ways:
Readability
A[2] or *(A+2) (Note that the 1st element is
A[0])
If the programmer uses the latter form, then it is
much less readable.
Readability
–
The meanings of operators are re-definable.
Programs are difficult to read if such definitions
are not of common sense. For example, if A and
B are records and A+B is defined to be the sum of
one of the fields of the two records (like salary).
Unless you trace back to the definition of the
operator, it is highly unlikely that you will
understand this statement.
Readability
2 Number of exceptional rules (orthogonality).
If a programming language has only a small
number of rules and the exceptions to these
rules are few, then the language is easier to
learn and therefore easier to read. For
example, all data in Smalltalk is object and
all respond to the message class, which
Readability
returns the class type of the object. Therefore,
you should have no problem to understand
the meaning when you encounter this
statement. On the other hand, in Java, some
data are primitive data while others are
objects. Therefore you have to remember a
different set of rules for primitive data and
objects.
Readability
3 Control statements. It is known that program
readability is severely reduced by
indistinguishing usage of goto statements. If
a programming language has sufficient
control constructs, the need for goto
statements can be nearly eliminated. This
increases the readability.
Readability
4 Data types and data structures. The
availability of appropriate facilities for
defining user-defined data types and data
structures in a language can also improve
program readability significantly. For
example, assume that an inventory record
has two fields: item number and price. In
Pascal, you would have defined it as:
Readability
record inventory_record
item_number: integer;
price: integer;
end;
If you want 100 such records, you would have
to declare the array as:
inventory_records:array[1..100] of
inventory_record;
Readability
Therefore, the item number and price of the 8th
record would be written as:
inventory_records[8].item_number
inventory_records[8].price
Readability
Their meanings are self-explanatory. However,
since FORTRAN does not allow you to
declare a record, the array of the records has
to be separated into two arrays:
integer itemno(100)
integer price(100)
The item number and price of the 8th record
would be written as: itemno(8) and price(8)
respectively.
Readability
Unless you are the author of this program, it is
unlikely that you will notice that price(8) is the
price of the item with item number itemno(8).
5 Syntax of a programming language. Syntax
affects readability in two ways:
–
Identifier forms. If a language allows longer
names, the program would be easier to
understand. For example, inventory_level is much
more self-explanatory then invlev.
Readability
–
Special words.
Consider the C program:
for (int i=1;i<10;i++) {
if (a==1) {
.....
}
}
and the corresponding Ada program:
Readability
for i in 1..10
if a=1 then
.....
end if;
end loop;
The special words end if, end loop are used
to end the if block and for block. So it is
easier to read then the C version.
Writability


Writability is a measure of how easily a
language can be used to write programs.
program writability is affected by the
following factors:
1 Simplicity and fewer exceptional rules. If a
language consists of a small number of primitive
constructs and there are only a few exceptional
rules, then it is easier to write because there is
less to remember.
Writability
2 Support for abstraction. Abstraction allows
programmers to hide the implementation details
of complicated data structures and operations,
and thus encourages and simplifies the use of
these details. For example, a stack can be
implemented using pointer or array. The user of
the stack would not be affected if this underlying
structure were changed.
Writability
–
Expressivity. This means that a language has
convenient ways of specifying computation. For
example, there is a multiple assignment
statement in C like:
a=b=c=1;
If this is to be done in Pascal, it has to be written
as three separate assignment statements:
a:=1;
b:=1;
c:=1;
Reliability


A program is reliable if it performs its
specified operations under all conditions.
the following features can affect program
reliability:
Reliability
–
Type checking. If the parameters passed to a
function are not those expected, then errors will
occur. Therefore, it is important to check that the
types are correct. Usually, this is done in most
languages by restricting the programmer to
declare a function first before it can be used
somewhere. Languages like FORTRAN, and the
original C language do not do any type checking
for parameters of function calls. Therefore, they
are very unreliable.
Reliability
–
Exception handling. This is a mechanism that
intercepts run-time errors, takes corrective
procedures and then continues the program
execution. Without this, a program that
encountered errors might stop or continue to do
something that would cause damages to the
system.
Reliability
–
Alias-ing. This refers to the situation of having two
or more different referencing methods for the
same memory location. Alias-ing significantly
reduces program reliability. In Pascal, you can
define a variant record like this:
type Shape=(Square, Rectangle);
Dimensions=record
case WhatShape: Shape of
Square: (Side1: real);
Rectangle: (Length, Width: real)
end;
Reliability
In this example, Dimensions can have two forms:
one for a Square and the other for a Rectangle.
This means that the memory occupied by this
record may be accessed by either the Side1 field
or the Length field. However, Pascal will not
complain when a Dimensions record of a Square
is used as Rectangle. This would, of course, be
an error because the Length and Width fields
would be rubbish.
Reliability
–
Readability and Writability. Usually the easier the
program to write the more likely it is to be correct.
However, we will later show that sometimes
writability can decrease reliability. The easier the
program is to read, the easier it is to spot errors.
Cost

the seven factors that influence the cost are:
–
–
The training cost. If the language is simpler, with
fewer exceptional rules and the programmer is
more experienced, then the cost is less.
The cost of writing programs. Higher writability
reduces the cost of writing programs.
Cost
–
The cost of running programs. Interpreted
languages like Basic have higher running costs
than compiled languages like Fortran. But, of
course, Basic programs need not be compiled in
the first place and therefore have no cost in
compilation. There are always tradeoffs between
compilation cost and running cost.
Cost
–
The cost of the language implementation system.
Some languages like OCCAM require a special
hardware platform known as transputer to run.
Therefore they are more expensive.
Cost
–
–
The cost of reliability. For critical systems, the
cost is very high.
The cost of maintaining programs in the language.
Higher readability would reduce this cost as the
program could be easily understood and therefore
would make enhancement and bug fixing easier.
Cost
–
The cost of compiling programs. This includes the
time and resources that are required in compiling
a program. For example, some compilers have to
read in a program source twice (known as 2
passes compilers). They are therefore more time
consuming and require more resources than
those that have to read in a program only once.
Influences on language design

There are two more factors that can affect
the design of a language:
–
–
computer architecture
software development methodology.
Computer architecture

Computer architecture may affect the design
of a language. For example a 32-bit or a 64bit processor may affect the size of an
integer.
Software development
methodologies

There are mainly three kinds of software
development methodologies:
–
–
–
Top-down design and stepwise refinement
Object-oriented design
Process-oriented design
Top-down design and stepwise
refinement

The main task is subdivided into smaller
tasks. These smaller tasks are further
subdivided until they are small enough to be
implemented as a module or a subprogram.
For example, the program for controlling an
automatic teller machine can be subdivided
into the following sub-tasks: password
validation, balance checking, money delivery,
updating account. Then these tasks may be
further divided.
Object-oriented design

The objects exist in the real world are
modeled by encapsulating their attributes
and operations. Then the whole system is
modeled by the interactions between these
objects. For example, one of the objects in
the automatic teller machine system is the
customer. The attributes of the customer are
customer number, name, etc. An operation of
a customer is money withdrawal.
Process-oriented design

Different processes are modeled separately and the
whole system is modeled by the interactions
between these processes. This is mainly used in real
time concurrent systems. For example, in a nuclear
power plant system, there should be a process that
regulates the water temperature and a process
waiting to receive commands from a control panel.
When the latter process receives a command to
regulate the water temperature, it passes a signal to
the former process to do the job.
The stepwise refinement
methodology

The stepwise refinement methodology
promotes the use of subprograms and
modules. For example, standard Pascal does
not support modules and therefore is not
very useful when this methodology is used.
The object-oriented methodology

The object-oriented methodology promotes
the use of object-oriented programming
languages. Therefore, when this
methodology becomes more popular, objectoriented programming languages would also
gain in popularity. At this moment, you have
at least learned one object oriented
programming language, namely Java.
Process-oriented methodology

The process-oriented programming
methodology promotes the use of processoriented programming languages like CSP or
Ada.
Language categories


Imperative languages. The procedures on
how to perform the computation are stated
explicitly, e.g., C, Pascal;
Object-oriented languages. The behavior of
different objects are stated explicitly, e.g.,
Smalltalk, Java;
Language categories


Functional programming languages. The
result, but not the procedures, of applying a
function to some parameters are stated
explicitly, e.g., LISP;
Logic programming languages. The rules
that have to be observed are stated explicitly,
e.g., Prolog.
Variables, data types and
expressions

Variables have six attributes:
–
–
–
–
–
–
name
address
value
type
lifetime
scope.
Variable Name


The maximum length of a name. Short
names reduce readability. For example,
item_number would be much self-explaining
then itemno.
The connector characters that are allowed
to be used in a name. You can see that the
use of ‘_’ in names would also increases
readability. For example, item_number would
be easier to read than itemnumber.
Variable Name

The case sensitivity of names. Languages
that are case sensitive have the potential of
being misused. For example, if Edge and
edge are two variables, then both the
programmer and reader of the program may
be confused.
Variable name

Reserved words or keywords in the
language. If a language has too many
reserved words, then a program in this
language is more difficult to write. This
problem is particular severe with COBOL,
which has more than 300 reserved words. It
is a headache for a COBOL programmer to
avoid using a reserved word as an identifier.
On the other hand, PL/1 does not have
reserved words. It only has keywords.
Variable name

Therefore, you can write statements like this:
IF IF=ELSE THEN THEN=ELSE; ELSE
ELSE=END; END
The disadvantage of no reserved words is
that if a very common keyword, like IF, is
given a new meaning, then the program
would be very difficult to read.
Variable name

There are two advantages of having no
reserved words:
–
–
The programmer does not have the problem to
use an identifier that has been reserved.
If reserved words are used and the language is
extended, then there may be new reserved words.
Then existing codes may no longer be valid.
Variable address, type and value

Address is the location of the memory space
for a variable. The time when this address is
fixed is called the binding time. It can be fixed
at load time (this occurs with variables in
FORTRAN) or it can be fixed at run time (as
for local variables of a sub-program in Pascal
or C). A more detailed discussion of binding
time is given in the next section of the unit.
Variable address, type and value

Type is the range of possible values of a
variable. Primitive data types are defined
when the language is designed. User defined
types are defined when a program is written.
Variable address, type and value

Some languages do not check whether a
value stored in a variable is within the
defined range because doing so would
greatly decrease the efficiency of the
program. Most languages do not check
whether a variable has been initialized before
it is used. It is therefore the programmer’s
responsibility to write extra codes to do this if
he/she really wants such checking.
Lifetime and scope

The remaining two attributes of variables are
lifetime and scope. The lifetime of a variable
begins when memory space is allocated to
the variable and ends when the space
becomes unavailable. The scope of a
variable is the range of statements in which
the variable can be referenced.
Lifetime and scope
For example, in this C function:
double area_of_circle(double radius)
{
double area;
area = 3.1415 * radius * radius;
return area;
}
Lifetime and scope

The lifetime of the variable area begins when
the function is invoked (when storage is
created for area ) and ends when the
function returns its value to the caller (when
the storage for area is to be returned to the
system). The scope of area begins just after
it is declared as a double and ends at the
end of the function.
The concept of binding

Binding is the abstract concept of the
association between an attribute and an
entity (e.g. the association of a type to a
variable), or between an operation and a
symbol (e.g. the association of the
multiplication meaning to the symbol *). We
call the time when the binding happens the
binding time.
Binding

Consider the following C statements:
void myfunction(int a)
{
int x;
…………
x=a+3;
…………
otherfunction(x);
}
Binding
Binding

Bindings can happen at any of these times:
–
–
Language design time. For example, the meaning
of the operation ‘+’ is defined when the language
is designed.
Language implementation time. The possible
range of int is usually defined when the language
is implemented. Some C compilers may define int
to have the range –32768 to 32767, while others
may define it differently.
Binding
–
–
Link time. For example, the relative address of the
function otherfunction is bound when the program
is linked. The address is relative because it can
be loaded into anywhere in the memory when the
program is executed.
Compile time. For example, the type of the
variable x is defined when this program is
compiled.
Binding
–
–
Load time. For example, the absolute address of
the function otherfunction is bound when the
program is loaded.
Run time. For example, the address of the
variable x is bound when myfunction is called
during execution. Note that if myfunction is a
recursive function, then there can be multiple
instances of x at one time.
Static binding or dynamic binding

We can distinguish two types of bindings. A binding
is called static if it occurs before run time and the
binding is unchanged throughout program execution.
On the other hand, a binding is dynamic if it occurs
during run time or can change during program
execution. When we look at the designs of
languages, we can distinguish whether bindings are
static or dynamic. Each binding type can be
examined according to data type and storage.
Type binding

To appreciate the designs of different
programming languages and explore type
bindings, let’s ask these questions:
–
–
How and when is the data type of a variable in a
program decided?
Does the language allow the data type of a
variable changed during program execution?
Static binding

Many programming languages like C, Pascal,
FORTRAN and COBOL have static type
bindings:
–
–
the data type of a variable is defined using
explicit declaration or implicit declaration, and
the compiler takes note of the data type. Type
binding occurs at compilation time;
the data type of a variable does not change
throughout the program execution.
Static binding

This method of binding has one advantage
that the type of a variable is known at
compile time and therefore the compiler can
detect errors resulting from using
incompatible types. For example, consider
the following Pascal fragment:
Static binding
var a:boolean;
i:integer;
...
a:=i+1;

The Pascal compiler would complain about this
because an integer value is assigned to a Boolean
variable. This is possible because the types of the
variables are known during compile time and they
cannot be changed in the run time.
Dynamic binding

Some programming languages like APL,
SNOBOL4 have dynamic type binding:
–
–
the data type of a variable is determined at run
time (by the interpreter) — type binding occurs at
run time;
the data type of a variable can change in the
course of program execution.
Dynamic binding

Dynamic type binding has one advantage:
–
It enables us to write generic subroutines, i.e. the
same subroutine can be used for many different
variables. For example, the same sorting
subroutine can be used to sort different types of
variable. On the other hand, if static type binding
is used, then usually, a subroutine is only written
for a particular type of variable.
Dynamic binding

Dynamic type binding has two disadvantages:
–
–
he type checking mechanism of static type
binding cannot operate;
the running cost is higher.
Storage bindings and the lifetime
of a variable

In order to see how the designs of
programming languages can differ, let’s
tackle these two questions:
–
–
How and when does the address of a variable get
assigned?
How is the lifetime of a variable determined in the
language?
Storage bindings and the lifetime
of a variable

The lifetime of a variable is the time during
which the variable is bound to specific
memory locations. It begins when memory
locations are taken from a pool of available
memory to be used by the variable —
allocation. It ends when the memory
locations previously bound to the variable are
placed back to the pool of available
memory — deallocation.
Storage bindings and the lifetime
of a variable

Four categories of variables can be
distinguished according to their lifetime:
–
Static variables — they are bound to specific
memory locations before program execution
begins and remain bound to the same locations
until program execution ends. In Java, all static
attributes of a class are static variables.
public class AClass {
static int staticVar;
}
Storage bindings and the lifetime
of a variable

Stack-dynamic variables — they are bound
to storage at run time, when their declaration
statements are elaborated. Elaboration of a
variable declaration occurs when execution
reaches the code that does the allocation of
memory spaces and binding to the variable.
The memory for stack-dynamic variables is
allocated from the runtime stack.
Storage bindings and the lifetime
of a variable

In Java, local variables of primitive types are
stack-dynamic, e.g. the local variable sum
below. Storage is allocated when the
execution reaches the declaration statement,
and it is deallocated when execution reaches
the end of the scope of the variable.
Storage bindings and the lifetime
of a variable
1. class AClass {
2. public void aMethod(){
3. double sum=0;
4. for (int i=0; i<10; i++){
5.
sum+=i;
6. }
7. }
8. }
Storage bindings and the lifetime
of a variable

In the above example, memory is allocated
to sum when execution reaches line 3, and is
deallocated when execution reaches line 6.
Although the storage bindings for stackdynamic variables are dynamic (the storage
is bound and unbound in the course of
program execution), their type bindings are
still static (the data type is bound at compile
time and is fixed during program execution).
Storage bindings and the lifetime
of a variable

Explicit heap-dynamic variables — their storage is
allocated and deallocated by using system functions
in a program. The memory is usually allocated from
the heap. They cannot be referenced by simple
variable names, and are referenced using pointer
variables. In Java, object variables are explicit heapdynamic variables. Their storage is allocated when
the ‘new’ method is called for a class. In most other
languages, pointers are explicit heap-dynamic
variables.
Storage bindings and the lifetime
of a variable

Implicit heap-dynamic variables — the
storage, type and value are bound at runtime only when they are assigned values. We
have an example of this kind of variable in
APL:
PRICE ← 48.5
Storage bindings and the lifetime
of a variable

PRICE is only allocated storage when
execution reaches the above assignment
statement. The data type is also set to be
real at the same time.
Type checking


Some languages do type checking while
others do not.
Type checking means that the compiler will
report errors when the actual type of a
variable does not match the expected type.
Strong typing




A language is strongly typed if all type errors
are reported either during runtime or compile
time.
Fortran is not a strongly typed language
because it does not report type errors.
Pascal is a nearly strongly typed language
because it reports most type errors.
Java is a strongly typed language because it
reports all type errors.
Type compatibility
Consider the following declaration statements:
type arraytype1=array [1..10] of integer;
arraytype2=array [1..10] of integer;
var
A,B:arraytype1;
C: arraytype2;
Type compatibility
There are two fundamental rules for type
compatibility:
 Name type compatibility. Two variables are
of the same type only if they are declared
with the same type name. A, B in the above
example are of the same type. However, C is
not of the same type with A.
Type compatibility

Structure type compatibility. Two variables
are of the same type if they are of the same
structure. A and C would be of the same
type.
A comparison of the two types of
compatibility
Writability
 Name type compatibility has lower writability
because it is more restrictive.
 Structure type compatibility has higher
writability because the programmer can now
treat more data types to be the same and
therefore can manipulate them together.
A comparison of the two types of
compatibility
Implementation cost
 Name type compatibility has lower
implementation cost because it is easy to
check two variables to be of the same type.
 Structure type compatibility has higher
implementation case because it is more
difficult to check whether two variables are of
the same type.
A comparison of the two types of
compatibility
Reliability
 Name type compatibility is more restrictive
means that it is less likely to mistakenly
assign a value to a wrong variable. So it is
more reliable.
 Structure type compatibility is less restrictive
and therefore is less reliable.
Type compatibility

Few programming languages use strict name type
compatibility or strict structure type compatibility.
Most use combinations of the two. For example,
Pascal uses a slight variation of name type
compatibility called declaration equivalence — a
programmer may define a data type to be equivalent
to another type, then the two data types are
compatible even though they have different names.
C uses a variation of structure type compatibility,
while C++ and Ada use a variation of name type
compatibility.
Scope

The scope of a variable is the range of
statements in which the variable can be
referenced. The next reading discusses the
scope method that is used in most
programming languages — static scope.
Static scope

In languages with static scoping, the
compiler can determine the scope of each
variable by inspecting the program.
Static scope
 Using Pascal
procedure sub1;
var x, y : integer;
procedure sub2;
var x : integer
begin
x := 1;
y := 2;
end
begin
...
end
syntax:
Static scope
Since sub2 is nested in sub1, we called sub1 a
static parent of sub2. Any static parent of
sub2, and their parents, and so forth
becomes a static ancestor of sub1.y is
called a non-local variable of sub2 since the
declaration of y is not located in sub2.
Static scope

An important task of a compiler of a language with
static scope is to find the correct declaration of any
variable it encounters. For example, in the above
Pascal program fragment, when the compiler sees
“y:=2”, it has to search for the declaration statement
of y. It will try to find the declaration of y in sub2 (the
current procedure), then try sub1 and every static
ancestor of sub2 until it finds the declaration. In this
case, the declaration is in the static parent sub1.
Dynamic scope

With dynamic scoping, the scope of variables
can only be determined at run time. The
calling sequence of the subprograms in a
program will affect the scope of a variable.
Dynamic scope
Compared to static scoping, dynamic scoping
has the following disadvantages:

Dynamic scoping is less reliable because all local
variables of the calling subprogram are visible from
the called subprogram. This would make information
hiding impossible. For example, when subprogram A
calls subprogram B, then all the local variables of A
are visible to B. There is no way to prevent B from
changing the values of those variables.
Dynamic scope



The compiler cannot check type compatibility
because it does not know where a non-local variable
is declared. This information is only available at run
time.
Referencing non-local variables is more expensive;
The program is more difficult to read because the
identity of a nonlocal variable is difficult to trace by
just reading the source program.
Scope and lifetime



Scope is something about 'where' while
lifetime is something about 'when'.
Scope is the position in a program where a
variable is visible.
Lifetime of a variable is the time when it
begin to exist until it is no longer referable.
Scope and lifetime
For example, consider the following C code:
void meth() {
static int a=2;
int b=3;
....
}
The scope of a is the statements after its
declaration. The lifetime of it is from the
beginning of the program till it ends.

Scope and lifetime

The scope of b is the statements after its
declaration. The lifetime of the variable is
when the function is invoked till the function
ends.
Referencing environments

The scope of a variable is the range of
statements that can reference the variables.
The referencing environment is just the same
concept, but in this case we look from the
point of view of a program statement. The
referencing environment of a statement is the
collection of all variables that are visible in
the statement.
Data types

Primitive data types: Primitive data types not
only are useful by themselves; they also
become building blocks for defining userdefined data types, e.g. record structures,
arrays, in languages that allow them. The
following primitive data types are commonly
available:
Primitive data type



Numeric types — integer, floating-point and
decimal. The size of integer is usually one
word. The size of floating-point is usually four
bytes.
Boolean types — usually has a size of one
byte for efficient access.
Character types — usually has a size of one
byte except those for Unicode character set.
Primitive data type

The language C is special in that the
differences between these three primitive
types are very vague. First of all, it has no
Boolean types, and variables of both numeric
types and character types can be used
where a Boolean expression is required.
Primitive data types

Secondly, variables of character types and
integer types are interchangeable. The only
constraint regarding this is the size difference
between an integer variable and a character
variable. This philosophy makes the
language very flexible. For example, we can
change the value of a character variable from
‘a’ to ‘b’ by adding 1 to it.
Primitive data types

With other languages, you have to call a
function to do that. The disadvantage is that
the type checking mechanism of the
compiler is defected because a mixture of
different primitive types in an expression is
still considered to be valid. This is another
example of the conflict between writability
and reliability of a language.
Character string types

The key questions that you should ask as
you analyse the design of character string
types in a programming language are:
–
–
–
Are character strings a primitive type in the
language or are they constructed as an array of
characters?
Are character strings in the language declared
with fixed lengths, or can they have variable
lengths?
What operations are allowed on the character
string type?
User-defined ordinal types

The two kinds of user-defined ordinal types
are the enumeration type and the subrange
type. The main advantage of using these
types is the improved readability and
reliability of the program. However, the
enumeration type provided in C only
increases readability because the data of
enumeration type is internally converted into
integer.
User-defined ordinal types

Therefore, function that accepts a parameter
of an enumeration type would also accept
any integer. Therefore reliability is not
increased by using enumeration type in C.
Array types

The key points in the design of array types in
a language can be emphasized by asking
these questions:
–
What types are legal as subscripts? Readability
and reliability increase if enumerated types are
accepted as subscripts.
Array types
–
Are subscripts ranges checked at run time? Some
compilers will include run time range checks into
generated code to check if an array reference is
out of range. Some compilers, including most C
compilers, will not. Such checking increases the
reliability and running cost.
Array types
–
–
When are subscript ranges bound? Some arrays
can have sizes determined at time, others must
be determined at run time.
When is storage allocated? The storage can be
bound statically (at compile time) or dynamically
(at run time). For dynamically bound array, the
storage could be allocated from the stack or from
the heap.
Array types
–
–
How many subscripts are allowed? Most modern
languages do not put any limit on the number of
subscripts.
Can arrays be initialized at storage allocation?
Allowing this would increase the writability
because if a language does not have this facility
then initialization has to be done with a number of
assignment statements.
Array types
Is there a way of defining an array type with no
subscript bounds? Consider the case when we
need to write a subprogram to sort an array of
integers. In Pascal, we would have the following
fragment:
type arr_type = array [1..10] of integer;
......
procedure sort(var a:arr_type)
begin
.......
Array types
The problem of this code is that sort is only
suitable for sorting arrays that are of type
arr_type. This means that it cannot be used to
sort an integer array of integers that has other
than ten members. We would need another
procedure for sorting an array with 11 members
and one for 12 members, etc. Ada solves this
problem by defining an unconstrained array. The
same fragment in Ada would be:
Array types
type arr_type is array (Integer range <>) of
Integer;
......
procedure sort(a:in out arr_type)
begin
.......
Array type
–
Now, arr_type is an array and its subscripts range
is not specified. Now, if we declare two variables
A and B as:
A: arr_type(0..9);
B: arr_type(3..11);
Then both A and B are of type arr_type and
therefore can be sorted by using sort. Within sort,
the lower and upper bounds of the array can be
accessed using different standard attributes of
arrays in Ada:
Array type
A’First is the index of the first element in A.
A’Last is the index of the last element in A.
Since C uses pointers to access array, the problem
does not apply. However, there is a problem of
getting the size of the array. Therefore, in C, we
have to explicitly pass the size of the array to the
function. Therefore, the same fragment in C
would be:
Array Type
void sort(int *a, int size) {
.. .. .. ..
}
We can see that if there is a way of defining an
array type without bounds, then the writability
would be increased.
Row-major order

In row-major storage, a multidimensional
array in linear memory is accessed such that
rows are stored one after the other. It is the
approach used by the C programming
language as well as many other langauges,
with the notable exception of Fortran. When
using row-major order, the difference
between addresses of array cells in
increasing rows is larger than addresses of
cells in increasing columns.
Row-major order
For example, consider this 2×3 array:
123
456
Declaring this array in C as
int A[2][3];

Row-major order
would find the array laid-out in linear memory
as:
123456
Row-major order
The difference in offset from one column to the
next is 1 and from one row to the next is 3.
The linear offset from the beginning of the
array to any given element A[row][column]
can then be computed as:
offset = row*NUMCOLS + column
where NUMCOLS represents the number of
columns in the array—in this case, 3.
Row-major order
To generalize the above formula, if we have the
following C array:
int A[n1][n2][n3][n4][n5]
Then, the offset of the element
A[m1][m2][m3][m4][m5] are:
offset = m1*n2*n3*n4*n5+
m2*n3*n4*n5+m3*n4+m3*n4*n5+m4*n5+m5
Column-major order
Column-major order is a similar method of
flattening arrays onto linear memory, but the
columns are listed in sequence. The
programming language Fortran uses columnmajor ordering.
Column-major order
The array
123
456
789
if stored in memory with column-major order
would look like the following:
147258369
Column-major order
With columns listed first. The memory offset
could then be computed as:
offset = row + column*NUMROWS
Where NUMROWS is the number of rows in
the array.
Column-major order
To generalize the above formula, if we have the
following C array:
int A[n1][n2][n3][n4][n5]
Then, the offset of the element
A[m1][m2][m3][m4][m5] are:
offset = m1+
m2*n1+m3*n2*n1+m4*n3*n2*n1+m4*n3*n2*
n1+m5*n4*n3*n2*n1
Example
Consider the following array:
int A[3][7][8];

Assume that A[0][0][0] is at address 20000.
What is the address of A[2][3][4]
(i) if row-major order is used?
(ii) if column-major order is used?
Example


(i) an integer has 4 bytes, so the address of
A[2][3][4] is:
20000+(2*7*8+3*8+4)*4
(ii) if column-major order is used, the address
is:
20000+(2+3*3+4*3*7)*4