Transcript MT311 Java Application Development and Programming Languages Li Tak Sing (李德成)
MT311 Java Application Development and Programming Languages
Li Tak Sing (
李德成
)
Data types
Primitive data types: Primitive data types not only are useful by themselves; they also become building blocks for defining user defined data types, e.g. record structures, arrays, in languages that allow them. The following primitive data types are commonly available:
Primitive data type
Numeric types
— integer, floating-point and decimal. The size of integer is usually one word. The size of floating-point is usually four bytes.
Boolean types
— usually has a size of one byte for efficient access.
Character types
— usually has a size of one byte except those for Unicode character set.
Primitive data type
The language C is special in that the differences between these three primitive types are very vague. First of all, it has no Boolean types, and variables of both numeric types and character types can be used where a Boolean expression is required.
Primitive data types
Secondly, variables of character types and integer types are interchangeable. The only constraint regarding this is the size difference between an integer variable and a character variable. This philosophy makes the language very flexible. For example, we can change the value of a character variable from ‘a’ to ‘b’ by adding 1 to it.
Primitive data types
With other languages, you have to call a function to do that. The disadvantage is that the type checking mechanism of the compiler is defected because a mixture of different primitive types in an expression is still considered to be valid. This is another example of the conflict between writability and reliability of a language.
Character string types
The key questions that you should ask as you analyse the design of character string types in a programming language are: – Are character strings a primitive type in the language or are they constructed as an array of characters?
– Are character strings in the language declared with fixed lengths, or can they have variable lengths?
– What operations are allowed on the character string type?
User-defined ordinal types
The two kinds of user-defined ordinal types are the enumeration type and the subrange type. The main advantage of using these types is the improved readability and reliability of the program. However, the enumeration type provided in C only increases readability because the data of enumeration type is internally converted into integer.
User-defined ordinal types
Therefore, function that accepts a parameter of an enumeration type would also accept any integer. Therefore reliability is not increased by using enumeration type in C.
Array types
The key points in the design of array types in a language can be emphasized by asking these questions: –
What types are legal as subscripts?
Readability and reliability increase if enumerated types are accepted as subscripts.
Array types
–
Are subscripts ranges checked at run time?
Some compilers will include run time range checks into generated code to check if an array reference is out of range. Some compilers, including most C compilers, will not. Such checking increases the reliability and running cost.
Array types
–
When are subscript ranges bound?
Some arrays can have sizes determined at time, others must be determined at run time.
–
When is storage allocated?
The storage can be bound statically (at compile time) or dynamically (at run time). For dynamically bound array, the storage could be allocated from the stack or from the heap.
Array types
–
How many subscripts are allowed?
Most modern languages do not put any limit on the number of subscripts. –
Can arrays be initialized at storage allocation?
Allowing this would increase the writability because if a language does not have this facility then initialization has to be done with a number of assignment statements.
Array types
Is there a way of defining an array type with no subscript bounds?
Consider the case when we need to write a subprogram to sort an array of integers. In Pascal, we would have the following fragment:
type
arr_type =
array
[1..10]
of integer
; ......
procedure
sort(
var
a:arr_type)
begin
.......
Array types
The problem of this code is that sort is only suitable for sorting arrays that are of type arr_type. This means that it cannot be used to sort an integer array of integers that has other than ten members. We would need another procedure for sorting an array with 11 members and one for 12 members, etc. Ada solves this problem by defining an
unconstrained array
. The same fragment in Ada would be:
Array types
type
arr_type
is array
(
Integer range
<>)
of Integer
; ......
procedure
sort(a:
in out
arr_type) begin .......
Array type
– Now, arr_type is an array and its subscripts range is not specified. Now, if we declare two variables A and B as: A: arr_type(0..9); B: arr_type(3..11); Then both A and B are of type arr_type and therefore can be sorted by using sort. Within sort, the lower and upper bounds of the array can be accessed using different standard attributes of arrays in Ada:
Array type
A’First is the index of the first element in A.
A’Last is the index of the last element in A.
Since C uses pointers to access array, the problem does not apply. However, there is a problem of getting the size of the array. Therefore, in C, we have to explicitly pass the size of the array to the function. Therefore, the same fragment in C would be:
Array Type
void sort(int *a, int size) { } .. .. .. ..
We can see that if there is a way of defining an array type without bounds, then the writability would be increased.
Row-major order
In row-major storage, a multidimensional array in linear memory is accessed such that rows are stored one after the other. It is the approach used by the C programming language as well as many other langauges, with the notable exception of Fortran. When using row-major order, the difference between addresses of array cells in increasing rows is larger than addresses of cells in increasing columns.
Row-major order
For example, consider this 2 ×3 array: 1 2 3 4 5 6 Declaring this array in C as int A[2][3];
Row-major order
would find the array laid-out in linear memory as: 1 2 3 4 5 6
Row-major order
The difference in offset from one column to the next is 1 and from one row to the next is 3. The linear offset from the beginning of the array to any given element A[row][column] can then be computed as:
offset = row*NUMCOLS + column
where NUMCOLS represents the number of columns in the array —in this case, 3.
Row-major order
To generalize the above formula, if we have the following C array: int A[n1][n2][n3][n4][n5] Then, the offset of the element A[m1][m2][m3][m4][m5] are: offset = m1*n2*n3*n4*n5+ m2*n3*n4*n5+m3*n4+m3*n4*n5+m4*n5+m5
Column-major order
Column-major order
is a similar method of flattening arrays onto linear memory, but the columns are listed in sequence. The programming language Fortran uses column major ordering.
Column-major order
The array 1 2 3 4 5 6 7 8 9 if stored in memory with column-major order would look like the following: 1 4 7 2 5 8 3 6 9
Column-major order
With columns listed first. The memory offset could then be computed as:
offset = row + column*NUMROWS Where NUMROWS is the number of rows in the array.
Column-major order
To generalize the above formula, if we have the following C array: int A[n1][n2][n3][n4][n5] Then, the offset of the element A[m1][m2][m3][m4][m5] are: offset = m1+ m2*n1+m3*n2*n1+m4*n3*n2*n1+m4*n3*n2* n1+m5*n4*n3*n2*n1
Example
Consider the following array: int A[3][7][8]; Assume that A[0][0][0] is at address 20000. What is the address of A[2][3][4] (i) if row-major order is used?
(ii) if column-major order is used?
Example
(i) an integer has 4 bytes, so the address of A[2][3][4] is: 20000+(2*7*8+3*8+4)*4 (ii) if column-major order is used, the address is: 20000+(2+3*3+4*3*7)*4