CS337 Comparative Programming Languages

Transcript CS337 Comparative Programming Languages

CI346 Programming Languages

[away 2008-9] Mike Smith (W424) [email protected]

Course content

• History • Imperative (procedural, O-O) languages – comparison of common features – unique or ‘interesting’ features • Declarative languages – functional languages – logic (constraint) languages – others...

Resources

• Recommended text – "Comparative Programming Languages", Wilson & Clark (Addison Wesley 2001) • BURKS 6 – http://burks.bton.ac.uk/burks – many compilers, tutorials, manuals – "Introduction to Programming Languages" by Anthony A. Aaby – Free Online Dictionary of Computing

Assessment

• Coursework – learn two new languages (sufficient to write a small program) – short presentation of significant features • Exam – analyse some aspect of your chosen languages (e.g. the range of data types it provides)

Why study languages?

• Discover alternative forms of expression – ‘if you only have a hammer, everything looks like a nail...’ • Make it easier to learn new languages – can relate features across languages • Cross-fertilisation – inventing new languages

Language goals

• Constructing reliable software – error detection (early!), correctness • Maintainability – readability, modularity • Expressive power – reuse, extensibility, genericity • Ease of use – simplicity, orthogonality (e.g. you can declare arrays and functions, but can you return them as the result of a function?)

Implementation

• Reliable implementations – clear language standards – compiler bugs and conformity tests • Interpret or compile?

– interpreting: easier, more portable – compiling: more efficient, earlier error detection – a mixture of approaches is normal

Non-linguistic aspects

• Compiler quality – error messages, code size and speed • Debugging – step, run, set breakpoints, inspect variables • Development environment – visual design tools, syntax colouring, prettyprinting, wizards

Language evolution

• Languages have always adopted good ideas from their predecessors • Languages have (almost) always learnt from the mistakes of their predecessors – mistakes are usually only discovered after the language has been in use for a while – sometimes mistakes are discovered when someone tries to implement the bright ideas of the language designers...

Fortran

Language evolution

Algol 60 Cobol Basic PL/1 Algol 68 Modula 2 Algol W Pascal Simula CPL BCPL Oberon Smalltalk Ada Objective C Java C++ C# B C

Lisp ML

Language evolution

Prolog Algol / C Perl Unix (sh) Pop-2 PHP Ruby Haskell COMIT Snobol Icon Forth PostScript

Some important ideas

• Algol 60: formal grammars, block structure, stack-based implementation • Algol 68: orthogonality, concurrency, operator overloading • Lisp: garbage collection, recursion • CLU: generics, exception handling • Smalltalk: OOP • Pascal: data types, strong typing • Modula-2: modularity • Eiffel: design by contract

Orthogonality

• How do language features affect one another?

– independent features are ‘orthogonal’, i.e. at ‘right angles’ to each other • Examples: – arithmethic expressions and arrays – arrays and procedure parameters – exception handling and concurrency

Formalism

• Fortran used an English description – ambiguous: a correct Fortran program is a program which a particular compiler accepts • Algol 60 introduced formal syntax (BNF) – compilers could use syntax rules to determine correctness – semantics were described in English, so some ambiguities remained

Formalism

• Algol 68 used van Wijngaarden (‘two level’) grammars – could specify syntax and semantics precisely – could not construct automated parsers – general opinion: elegant, but too complicated for mere mortals

Foot-shootery

• Fortran DO-loop • Algol 60 & Coral comments • PL/1 type conversions • C & C++ equality tests • Algol 60, C: null statements using ‘;’

Awkward design

• Dangling ELSE – C, C++, Java, Pascal, ...

• Inconsistent syntax – semicolons as separators in Algol, Pascal – declarations in Pascal • Ad-hoc restrictions – statement layout, subscripts in Fortran – order of declarations in Pascal

Language issues

• Program structure – control structures – procedures, functions, subroutines – parameter passing – nesting, recursion – classes, methods, polymorphism – concurrency

Language issues

• Variables – lifetime and scope – ‘referential transparency’ • Data types – scalars, arrays, structures, unions...

– range checking • Error handling – throwing and catching exceptions – abort or resume?

Program structure

• Compilation units: – Algol 60/68, Pascal: monolithic – Fortran: subroutine or function – Ada: procedure, function or package – C, C++: source file (usually one or more functions) plus ‘include files’ – Java: class

Control structures

• In the beginning was the GOTO statement...

– reflects the underlying assembly language – can lead to ‘spaghetti code’ • Common requirements: – decisions – loops – special language constructs invented to support these requirements

Decisions

• Fortran: IF (n) a,b,c – this is a 3-way GOTO statement • Algol 60: if a > b then c else d; – c cannot be another IF statement – use begin ... end blocks to enclose sequences of statements • C, C++, Java, C#: if (a > b) c else d – ambiguity if c is another IF statement – use { ... } to enclose sequences of statements

Loops

• Fortran: GOTO, or DO loop – DO is like modern FOR loops – ‘DO 10 I=1,100,2’ • Algol 60: for & while loops – while loops could replace use of GOTO – ‘for i := 1 step 1 until 10 while a < b do ...’ – floating-point loops also possible: ‘step 0.1’ – where is control variable of ‘for’ loop declared and what is its scope?

Loops

• Snobol: – a string-processing language – each statement can match a string against a pattern – each statement can have two labels to jump to on completion (match succeeds or fails)

Loops

• Icon: – ‘generators’ either succeed and produce a result, or fail – generators which succeed can be called again to produce the ‘next’ result (if any) – loops can iterate over every result of a generator – ‘every i := (1 to 10) | (91 to 100) do write i’ – ‘every write( (1 to 10) | (91 to 100) )’

Block structure

• Algol 60 introduced the notion of ‘blocks’ – sequence of statements (and local variable declarations) enclosed by begin ... end – used where more that one statement needed inside an if or loop statement – this is usually the case!

– C and its descendents use { ... } instead of begin ... end

Block structure

• C, C++, Java, C#: the ‘dangling else’ problem – if (a) if (b) then c; else d; • Algol 60: no dangling ‘else’ – if X then Y: Y cannot be another if statement (but can be a block which contains one) • Algol 68, Ada: all control structures fully bracketed (so no need to use blocks): – Algol 68: if a then b else c fi; – Ada: if a then b else c end if;

Procedures

• Most languages have procedures (no result) and functions (which return a result) – function calls can be used in arithmetic expressions – Fortran called procedures ‘subroutines’ • Algol 68, C, C++, Java, C#: functions can be used as procedures (result is discarded)

Procedures

• Procedures and functions can have parameters – limitations on what types are allowed?

– limitations on what types can be returned by function?

– limitations on what the procedure or function can do with the parameter (e.g. read only, read/write)

Procedures

• Are procedures and functions first-class objects?

– e.g. integers are first-class objects: can be stored in variables, passed as parameters, returned as results – can procedures be stored, passed, returned in this way?

• A related idea: closures – a closure is a function call together with its current scope

Parameter passing

• By value: a copy of the parameter is given to the function (C, C++, Java, Ada) – changing the parameter only changes the copy • By reference: a reference to a variable is given to the function (C++, Ada, Fortran) – changing the parameter changes the original variable – can be done by passing a pointer by value

Parameter passing

• By value/return: a copy of the parameter is given to the function and the updated value is written back to the corresponding variable at the end (Ada) • By name: the parameter is re-evaluated each time it is referenced (Algol 60) – ‘Jensen’s device’: much too complex for practical use

Nesting

• Can functions be defined inside functions?

– Ada, Pascal, Algol: yes – C, C++, Java: no • Nested functions can access variables defined in the surrounding functions – makes life slightly more difficult for the compiler writer

Recursion

• Can functions call themselves?

– yes, except in languages like Fortran where variables are static and the return address is stored in a static variable – a recursive call in Fortran would overwrite the previous return address • Recursion requires dynamic variables – each time you call the function a new set of local variables needs to be created

Classes

• A feature of object-oriented languages – Java, C++, Ada 95, ...

– contains data and methods (functions) – supports inheritance • Multiple inheritance?

– allowed in Eiffel, C++ – not allowed in Smalltalk, Ada 95 – ‘interfaces’ in Java and C#

Classes

• Polymorphism: overriding inherited methods in a derived class – standard behaviour in most languages (e.g. Smalltalk, Java) – only ‘virtual’ methods in C++ • Classes also cross the divide into data types...

Concurrency

• How fine-grained is the concurrency?

– Ada, PL/1: task blocks (coarse grained) – Algol 68, occam: individual statements (fine grained) • How do tasks synchronize their activities?

– PL/1: no general mechanism – Algol 68: semaphores (low-level)

Concurrency

• Higher-level synchronization mechanisms introduced later: – monitors (Concurrent Pascal, Ada, Java) – rendezvous (Ada) – channels (occam)

Variables

• Lifetime: how long does the variable exist for?

– static (same as lifetime of program) – dynamic (same as lifetime of block where the declaration occurs) • Scope: where can the variable name be used?

– inside block where it’s declared – anywhere, or only after declaration?

Variable lifetime

• Fortran: all variables are static • C/C++: global and static variables are static, all others are dynamic • Java: static variables are static, class variables live as long as the object they belong to, all others are local • Ada: variables in packages are static, all others are dynamic

Variable scope

• Variables can usually only be accessed from the point of declaration to the end of the block containing the declaration – Fortran: COMMON blocks allow several modules to access the same variables – C/C++: extern allows access to variables defined externally (also C++ friends) – Ada: packages allow variables to be shared

Data types

• Fundamental dichotomy for elementary data: – numeric types – text types • Pascal, Ada etc. also allow user-defined ‘enumeration’ types – type Day is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);

Data types

• Numeric types: – integer – real (floating point; fixed point) – decimal or binary?

– range and precision?

• Text types: – fixed or variable length?

– bounded or unbounded?

Data types

• Composite types: – arrays – records/structures • Pointer types: – linked lists, trees etc.

– can pointers be converted to other types?

– can pointers be manipulated?

Composite types

• Arrays: – fixed bounds? (e.g. Ada) – static bounds? (e.g. C) – flexible bounds? (e.g. Algol 68) – strings?

• Records: – structural variants?

Composite types

• When are two composite types the same?

– same structure? (Algol 68) – same type name? (Ada) – same array bounds? (Pascal)

Pointers

• Pointers allow dynamic data structures – point anywhere (C) or just to objects which are heap-allocated (Pascal)?

– typed (C) or untyped (PL/1)?

– interconvertible (C) or not (Ada)?

• Reclamation issues: – manual or automatic garbage collection?

Pointers

• Algol 68 uses reference types as a general mechanism – an

int

is an integer (constant, e.g. 123) – a

ref int

is a reference to an integer (an integer variable, which can refer to a particular

int

value like 123) – a

ref ref int

is a reference (pointer) to a

ref int

object (an

int

variable)...

Pointers

• Java ‘references’ are really pointers – but they can’t be manipulated as in C/C++ • PHP has references too – more like links in a Unix file system • Languages like Lisp use pointers to implement lists – these are not accessible at the language level

Names and types

• Names (for e.g. variables) must be defined (implicitly or explicitly) • Implicit declaration (e.g. Fortran, Perl) – Fortran has implicit typing – everything in Perl is a string, array or hash (identified by first character: $, @ or %) • Problems: – lexical errors (typing errors!) will create new variables

Names and types

• Typeless declaration (e.g. Lisp, Smalltalk) – names must be defined before use – everything in Lisp is a list – everything in Smalltalk is an object • Problems: – type mismatches not detected until runtime

Names and types

• Typed declarations (most languages) – strong typing or weak typing?

• Strong typing: – no automatic type conversion (e.g. Ada) – some escape mechanism is usually required (e.g. Unchecked_Conversion)

Names and types

• Weak typing – automatic type conversions provided (e.g. PL/1) – can disguise problems due to use of a wrong variable • Most languages are a compromise – some automatic conversions (e.g Algol 68)

Names and types

• Type inferencing (e.g. Haskell) – types inferred from operations used – most general available type used (e.g. Num for numeric types) – use of specific types (e.g. Int where Num is required) will force use of Int throughout

Higher-order types

• Functions as types – e.g. Lisp, Haskell – functions can create new functions and return them as their result • Types as types?

– e.g. Smalltalk has a class called Class – so does Java (reflection API)

Fortran

• Fortran ("FORmula TRANslation") – developed 1954 by John Backus, IBM – designed for scientific calculations • Implementation – based on IBM 704 systems – language features designed ad-hoc to suit IBM 704 – buggy hand-coded compilers (pre-1958)

Fortran

• No formal language specification – compiler was the acid test of correctness • Line-oriented format – based on 80-column punched cards – 1-5: statement label, C for comment – 6: continuation of previous line if not blank – 7-72: statement – 73-80: sequence number

Fortran

• Automatic declarations – I to N: integer – A to H, O to Z: real (floating point) – static (so no recursion) • Control constructs: – GOTO n, GOTO (a,b,c) n – IF (n) a,b,c – DO n I=a,b

Cobol

• COmmon Business Oriented Language – descendant of Flow-matic (1955), Univac I – English-like vocabulary and syntax • Four "divisions" in a program: – identification division – environment division – data division – procedure division

Cobol

• Business oriented features: – file processing – report formatting • Data description via "picture" clauses – PIC 9999: 4-digit decimal number – PIC X(20): 20-character string – PIC 999V99: 5-digit number with 2 decimal places

Algol 60

• The most influential language ever – nearly all imperative languages are descendants of Algol 60 • Descended from IAL (International Algebraic Language), aka Algol 58 – initial Algol 60 report: 1959/60 – revised report: 1963

Algol 60

• Innovations: – formal syntax specification – reference language, publication language, hardware representation – block structure, local variables, recursion – structured control statements – free format

Algol 60

• Problems: – ambitious design ignored compilation issues – compilers did not appear until mid-1960s – only caught on in Europe (USA continued to use Fortran) – no standard input/output – no string manipulation

Algol 60

• Main descendants: – JOVIAL (Jules' Own Version of the IAL) – Simula 67 (ancestor of OOP concepts in C++ and Java) – Algol W (Wirth), hence Pascal, Modula-2, Oberon, Ada – CPL, hence BCPL, B, C, C++, Java – Algol 68

Algol 68

• Successor to Algol 60 – supported by IFIP – "Report on the Algorithmic Language Algol 68", van Wijngaarden et al., 1969 – revised report, 1975 • Committee split by disagreements – Wirth et al. produced minority report – went on to develop Pascal instead

Algol 68

• Main features – highly orthogonal language – expression-based – references used as a way of uniting names, values, pointers etc.

– concurrency support • Descendants: – RTL/2, Bourne shell, SPL

Algol 68

• Syntax described using two-level "van Wijngaarden" grammar – context-free upper level grammar generated infinite set of "hypernotions" – these generated an infinite number of rules by substituting hypernotions into lower level rules – described semantic properties (name bindings, coercions, etc.)

Algol 68

• Problems – grammar seen as too complex (and unparsable) – language seen as too complex • Implementations were slow to appear – only subsets were usually implemented (Algol 68R, Algol 68C, etc.)

PL/1

• IBM's attempt at a general-purpose language – part of System/360 project (1964) – amalgam of Fortran, Cobol, Algol – originally called NPL (New Programming Language), then MPPL (Multi-Purpose Programming Language)

PL/1

• Problems – large, complex, difficult to implement – IBM's personal property (they also copyrighted PL/2 to PL/100) – descendants include PL/360, PL/M

Basic

• "Beginner's All-purpose Symbolic Instruction Code" – Kemeny & Kurtz, early 1960s • Developed as teaching language at Dartmouth College (USA) – simplified Fortran-like syntax – interactive!

APL

• "A Programming Language" (K. Iverson) – array processing using powerful array operators – designed for expressing numerical algorithms – proverbially cryptic, requires extensive character set – ancestor of J

String processing

• Application areas: – natural language processing – symbolic mathematics – general text processing • COMIT – MIT, late 1950s

String processing

• SNOBOL – descendant of COMIT from Bell Labs – Snobol 4 (Ralph Griswold, 1967) – ancestor of Icon • Uses success/failure of string matching to control program flow – generalised into Icon's "generators"

String processing

• Scripting languages – Perl, Python, Tcl, Ruby, ...

– also shell command languages – used to "script" actions of programs in other languages • Macro languages – GPM, Trac, M2, ML/1, PHP, ...

– embedded text processing

List processing

• Application areas – artificial intelligence, natural language processing • IPL-V – Carnegie Institute, late 1950s • LISP – developed by John McCarthy (Stanford), co-author of Algol 60 revised report

List processing

• LISP development: – LISP 1, LISP 1.5 (late 1950s) – Common Lisp (ANSI, early 1980s) • Innovations: – programs are lists too – garbage collection – functional programming style (recursion instead of iteration)

Systems programming

• For implementing "system software" – also real-time or embedded applications – operating systems, compilers, etc.

• Requirements: – efficiency is crucial – low-level access top hardware facilities – bit-twiddling operations

Systems programming

• BCPL – "Basic CPL" – untyped language (or monotyped) • C – descended from BCPL via B – used to implement Unix

Systems programming

• Bliss – used by DEC for writing operating systems and utilities • Coral 66 – MOD standard based on Algol 60 – direct access to memory • RTL/2 – real-time language supporting concurrency

Systems programming

• JOVIAL – descendant of IAL (Algol 58) – used extensively by US DoD • Forth – embedded language developed for radio telescope control – ancestor of PostScript (printer page description language)

Object-oriented languages

• Simula 67 – simulation language descended from Algol 60 (Dahl & Nygaard) – introduced the concept of a "class" • Smalltalk – developed by Alan Kay at Xerox PARC, 1972 (revised to produce Smalltalk-80) – interpreted language, message-passing

Object-oriented languages

• Objective-C – mixture of C and Smalltalk by Brad Cox – primary language for NeXT computers • C++ – mixture of C and Simula by Bjarne Stroustrup – Bell Labs: telephone routing simulation etc.

Object-oriented languages

• Java – originally called Oak (James Gosling, early 1990s) – redeveloped at Sun for Internet use, 1995 • Eiffel – developed by Bertrand Meyer, early 1980s – assertion-based with compiler-checked preconditions and postconditions

Other languages

• Functional languages – Lisp, Scheme – FP, ML, Miranda, Haskell • Rationale – break away from von Neumann tradition (mutable variables, iteration etc.) – mathematical purity will mean program correctness can be proved formally

Other languages

• Constraint-based languages – Prolog (Colmerauer and Roussel, 1971) and many derivative languages – unification-based, goal-oriented theorem proving approach • Applications: – natural language, artificial intelligence – basis for Japan's 5th generation project

Other languages

• Other AI languages: – POP 2 (Robin Popplestone, Edinburgh, late 1960s) – Pop-11 (mid 1970s) – POPLOG (Sussex, late 1980s) combined Pop-11, ML, Common Lisp

Other languages

• Visual languages – programs constructed using 2D graphics instead of a 1D text stream • Examples – spreadsheets?

– ProGraph – Lego Mindstorms – Alice

The von Neumann legacy

• Computers all based on von Neumann's design principles – single memory consisting of identical numbered cells – memory used for instructions and data – values in memory are mutable – processor executes one instruction at a time – instruction sequence given explicitly

The von Neumann legacy

• Most languages mirror the von Neumann design – variables: mutable objects in memory – execution sequence given by sequence of instructions, (and loops, ifs, etc.) – single thread of control – pointers: memory cells can reference ("point to") other memory cells – no type information in memory cells

The von Neumann legacy

• Problems: – dangling pointers – uninitialised variables – variables going out of range – can't reason about values of mutating variables – non-optimal sequencing

The von Neumann legacy

• Dangling pointers: – pointer to ‘thin air’?

– pointer to a valid (but wrong) address?

– pointer to right address, but wrong type of data • Related problems: – memory leaks – garbage collection

The von Neumann legacy

• Uninitialised variables: – undetectable, since any value in a memory cell might be valid – value is undefined (note: technical term!) – uninitialised pointers • Could automatically initialise variables – e.g. Java (partly), Basic – loss of efficiency?

The von Neumann legacy

• Out-of-range variables – arithmetic overflow (wrong sign for result) – no hardware support for range checking • Software range checking (like Ada) – added overhead, so reduced efficiency – but: manual checks can't be optimised by compiler

The von Neumann legacy

• Mutating variables – can't apply mathematical proof techniques – values can change arbitrarily • Global variables – can be affected non-locally – hard to debug – rogue pointers + single memory means all memory is effectively global!

The von Neumann legacy

• Explicit sequencing – might be a better sequence?

– optimal sequence might depend on external events • Multithreading – synchronization issues (mutable data again)

The von Neumann legacy

• Data-flow sequencing – instructions can be executable at any time as long as all dependencies are satisfied • Example: making tea – tasks: boil water, put tea in pot, put water in pot, pour milk, pour tea – tasks 1, 2, 4 possibly concurrent – task 3 depends on 1, 5 depends on 3

Functional programming

• Influenced by Church's lambda calculus – mathematical theory of functions – if x and y are lambda expressions, so is (x.y) – if x is a variable and y is a lambda expression, λxy is a lambda expression • Can describe rules of arithmetic using nothing but lambda expressions

Lambda calculus

• Evaluating: – (λxy . z) is reduced to y, with all occurrences of x in y replaced by copies of z – like function application (x is formal parameter, y is function body, z is actual parameter)

Lisp

• Original functional language (John McCarthy, ~1960) • A list is a pair (car . cdr) – car = head of list, cdr = tail of list – longer lists possible, e.g. (a . (b . (c . nil))) – abbreviation: write this as (a b c) • Programs are represented by lists, too

Lisp

• Items are atoms or lists – list components are either atoms or other lists – the empty list: (), or "nil" • Basic functions: – (car (a . b)) = a, (cdr (a . b)) = b – (cons a b) = (a . b) – (atom x): true if x is an atom

Lisp

• Booleans: – true represented by atom T, false by () or NIL • Arithmetic: – prefix notation used, e.g. (+ 1 2) = 3 • Control: – COND function for choosing alternatives – recursion instead of loops

Lisp

• Example: program to sum the positive numbers in a list (def sum (lambda (x) ) (cond ((null x) 0) (t (+ (car x) (sum (cdr x)))) )) – LISP = "Lots of Irritating Superfluous Parentheses"

Lisp

• Important ideas – use of lists to represent data structures – use of lists to represent programs, too – use of recursive functions – garbage collection – functions as first-class objects: (def name lambda-expression)

Lisp

• Problems with Lisp: – inefficient interpreted implementations – too different!

– untyped, so no error-checking at compile time • Non-functional features added later – prog, setq, goto

FP

• John Backus' Turing Award paper – ‘Can programming be liberated from the von Neumann style?’ (Communications of the ACM, 1978) – explicit discussion using a language called FP

Functional languages

• Later developments: – efficient implementations: combinators, tail recursion, better garbage collection – pattern matching notation: fun sum () = 0 | sum (x : xs) = x + sum xs – lazy evaluation (operations on infinite lists)

Functional languages

• Other concepts: – ‘curried’ functions (named after Haskell Curry): partial application – higher-order functions: functions that operate on functions – strong typing, type inferencing – monads: sequencing in a pure functional style, I/O using streams

Haskell

• Named after Haskell Curry (again) • An example of a modern functional language – lazy evaluation – strong typing – type inferencing – polymorphism

Haskell

• Data types – basic types (Int, Char etc) • 123 'x' – lists: [t] is a list of type t • [1, 2, 3] [1..5] [2..] [ x*x | x <- [1..5]] • [1,2] same as (1:(2:[])) – tuples: (t1,t2,t3) is a tuple of different types • (1, "this", 5.0) (1,2) • (Int, [Char], Float) (Int,Int)

Prolog

• Uses a theorem-proving approach to programming • Programs consist of: – a list of starting points (facts) – a list of rules giving relationships between facts – a goal

Prolog

• No explicit sequencing is needed – the interpreter attempts to construct a path to the goal using the rules and facts provided – if it reaches a dead end, it will backtrack and try a different sequence • No variables are needed – values are bound using unification

Prolog

• Example facts: male(albert).

male(edward).

female(alice).

female(victoria).

parents(edward,victoria,albert).

parents(alice,victoria,albert).

Prolog

• Example rules: sibling(X,Y) :- parent(X,A,B), parent(Y,A,B).

sister(X,Y) :- sibling(X,Y), female(Y).

• Example goal: ?- sister(edward,S).

– evaluation proceeds by trying to match this goal using the available rules and facts – variables are unified with matching values

Prolog

• Example evaluation: – sister(edward,S) matches sister(X,Y) where X=edward and Y=S, so...

– new goals: sibling(edward,S) and female(S) – sibling(edward,S) creates new goals: parent(edward,X,Y) and parent(S,X,Y)

Prolog

• Example evaluation: – parent(edward,X,Y) matches parent(edward,victoria,albert), so X=victoria and Y=albert – parent(S,victoria,albert) matches S=edward – female(edward) fails, so backtrack: undo the last match (S=edward) and try S=alice

Prolog

• Pattern matching used for lists: append([],X,X).

append([A|B],C,[A|D]) :- append(B,C,D).

• Example: append([1,2],[3,4],X) – X = [1,D] where append([2],[3,4],D) – X = [1,2,D'] where append([],[3,4],D') – X = [1,2,3,4] • Also: append(X,Y,[1,2,3,4]).

Prolog

• Data structures: – built-in support for integers and integer arithmetic – lists (similar to Haskell) – facts as tree structures, e.g. parent(edward,victoria,albert)

Prolog

• Evaluation sequence: – try to match rules sequentially – bind variables to values as part of this process – if a rule fails, backtrack (undo the last successful match) and try the next rule

Prolog

• Once a variable has been unified, it keeps the same value throughout the current application of the current rule – the same name may be bound to different values in different rules (names are local) – the same name may have a different value in a different instance of the same rule • Exception: the anonymous variable ‘_’

Prolog

• Prevent backtracking using a ‘cut’ – symbolised by ‘!’ – if X is matched successfully in X, !, Y failures in Y will not backtrack past the cut to search for another value for X parents(adam,0) :- !.

parents(eve,0) :- !.

parents(X,2).

Prolog

• More cut examples: factorial(N,F) :- N < 0, !, fail.

factorial(0,1) :- !.

factorial(N,F) : N2 is N-1, factorial(N2,F2), F is N*F2.

– ‘cut and fail’ in first rule prevents negative arguments – cut in second rule prevents attempts to use rule 3 to evaluate factorial 0

Prolog

• Input & output: – special predicates (e.g. write/1) which always succeed – I/O operation performed as a hidden side effect • Storing state information – use asserta/1 to add a rule – use retract/1 to retract a rule

Prolog

• State example: can_go(Place):- here(X), connect(X, Place).

can_go(Place): write('You can''t get there from here.'), nl, fail.

move(Place): retract(here(X)), asserta(here(Place)).

here(kitchen).

connect(kitchen,hall).

move(hall).

Complexity

• Languages have evolved to be able to cope with problems of increasing complexity – systems of upwards of 100KLOC are common in real life – early languages had problems with systems of this size

Complexity

• The coupling problem – the more one piece of software knows about another, the higher the coupling – can't make changes without affecting other parts of the system that are coupled to it – primary goal is to minimise or abolish coupling between components – analogy: standard connectors in hardware

Complexity

• Spaghetti programming – unstructured, use of

goto

statements – makes it hard to partition software • Can’t make assumptions about the properties of any block of code – can jump into the middle of any block from somewhere else

Complexity

• Structured programming – program composed of structured blocks – one entry point, one exit point – can reason about code within a block • Structured blocks decouple flow-of control issues – but not flow-of-data issues

Complexity

• Global variables – can be accessed from anywhere, so hard to reason about values of variables • Local variables – only accessible within a single block, so can reason about them – static variables retain their values from one execution of a block to the next

Complexity

• Subroutines (procedures, functions) – allows program to be decomposed into functional modules – can pass parameters as a way of avoiding using global variables for communication – reduces data coupling if global variables avoided

Complexity

• Separate compilation – allows code reuse through subroutine libraries – can compiler check that an external subroutine is being called correctly?

– requires cooperation from linker...

Complexity

• Fortran – separately compiled subroutines – no link-time checking – static & global variables only (so no recursion) – unstructured coding (e.g. no structured if statement)

Complexity

• Algol 60/68, Pascal – no separate compilation (non-standard language extensions needed) – procedures are nested within main program, so call checking is possible – structured programming possible – local / global variables, no static variables – recursion possible

Complexity

• C – local, global, static variables – no linkage checking – use of common ‘header files’ to allow compiler checking (but not foolproof)

Complexity

• Modules – related groups of procedures, variables etc.

– can selectively export some of the contents but can hide others – can create abstract data types – examples: Modula-2, Oberon, Ada (and also O-O languages)

Complexity

• Modules have a public interface – allows compile-time checking of calls – implementation is hidden (and can be changed) – interface specification forms a contract between a module and its users – inter-module coupling is reduced

Complexity

• Object oriented languages – modules become data types!

– inheritance, polymorphism • Problems with inheritance errors – incorrect signature when overloading operations (only C# fixes this one so far)

Complexity

• Modules/classes should be client-neutral – should not make any assumptions about their users – should allow code reuse • Exception handling – decouples error detection from error recovery

Complexity

• The goal: ‘software components’ – use like hardware components – buy off the shelf, ready tested, and assemble into a complete system • Generic programming – Ada generics, C++ templates – allows algorithms to be decoupled from the data types they operate on

Errors

• Error detection – at compile time (errors within a module) – at link time (errors resulting from mismatched calls) – at run time – at instantiation time (e.g. C++ templates) – at derivation time (errors in inheritance, inability to override/implement operations)

Errors

• Compile time errors – well known how to detect errors, but what errors does the language allow to be detected?

– strong typing allows type mismatches to be detected – preconditions and postconditions (e.g. Eiffel) allow for better static verification – variable-length parameter lists (e.g. C) reduce checkability

Errors

• Link-time errors – do separately-compiled modules include parameter information?

– C doesn’t do this – C++ used ‘name mangling’ to add type information to function names for use with standard linkers

Errors

• Run-time errors: – e.g. array bounds checking – use of exceptions to report errors and recover from them – run-time checks decrease program efficiency (extra code means more space and time) – leaving checks out in production code like leaving parachute behind after you get your pilot’s license

Errors

• Instantiation time errors: – happens when trying to reuse code written much earlier – code is generated automatically with generic parameters plugged in – if parameters don’t match the way the code uses them, the compilation fails – error messages usually relate to generic code, not what you wrote – Ada generics have precise contracts to avoid this problem

Errors

• Derivation time errors: – happens when deriving new classes in object-oriented languages – happens long after original class was written – you override an operation but the old one gets called (possible in C++, Ada) – you think you’re overriding but you get it wrong and add a new function (only C# handles this one)

Errors

• Concurrency – allows the use of ‘active objects’ executing independently – can partition program across multiple processors and/or machines – use RPC, CORBA, Ada partitions, ...

– occam and transputers

Errors

• Imperative languages have problems with concurrency – problem must be partitioned by hand (the programmer must identify concurrency) – lack of referential transparency means synchronisation needed when updating shared variables – sychronisation easy to omit or get wrong

Errors

• Other problems with concurrency: – errors can be time-dependent – hard to reproduce reliably – hard to diagnose – new failure modes (e.g. deadlock)

Summary

• Languages evolve at the whim of their designers – usually in the direction of increased expressiveness and safety, but not always – some features turn out to be bad, but only in hindsight – there is a dynamic between easy of use and safety (e.g. strong vs. weak typing) – programmers can be resistant to new ideas