Transcript CS337 Comparative Programming Languages
CI346 Programming Languages
John English (W425) [email protected]
[away 2008-9] Mike Smith (W424) [email protected]
Course content
• History • Imperative (procedural, O-O) languages – comparison of common features – unique or ‘interesting’ features • Declarative languages – functional languages – logic (constraint) languages – others...
Resources
• Recommended text – "Comparative Programming Languages", Wilson & Clark (Addison Wesley 2001) • BURKS 6 – http://burks.bton.ac.uk/burks – many compilers, tutorials, manuals – "Introduction to Programming Languages" by Anthony A. Aaby – Free Online Dictionary of Computing
Assessment
• Coursework – learn two new languages (sufficient to write a small program) – short presentation of significant features • Exam – analyse some aspect of your chosen languages (e.g. the range of data types it provides)
Why study languages?
• Discover alternative forms of expression – ‘if you only have a hammer, everything looks like a nail...’ • Make it easier to learn new languages – can relate features across languages • Cross-fertilisation – inventing new languages
Language goals
• Constructing reliable software – error detection (early!), correctness • Maintainability – readability, modularity • Expressive power – reuse, extensibility, genericity • Ease of use – simplicity, orthogonality (e.g. you can declare arrays and functions, but can you return them as the result of a function?)
Implementation
• Reliable implementations – clear language standards – compiler bugs and conformity tests • Interpret or compile?
– interpreting: easier, more portable – compiling: more efficient, earlier error detection – a mixture of approaches is normal
Non-linguistic aspects
• Compiler quality – error messages, code size and speed • Debugging – step, run, set breakpoints, inspect variables • Development environment – visual design tools, syntax colouring, prettyprinting, wizards
Language evolution
• Languages have always adopted good ideas from their predecessors • Languages have (almost) always learnt from the mistakes of their predecessors – mistakes are usually only discovered after the language has been in use for a while – sometimes mistakes are discovered when someone tries to implement the bright ideas of the language designers...
Fortran
Language evolution
Algol 60 Cobol Basic PL/1 Algol 68 Modula 2 Algol W Pascal Simula CPL BCPL Oberon Smalltalk Ada Objective C Java C++ C# B C
Lisp ML
Language evolution
Prolog Algol / C Perl Unix (sh) Pop-2 PHP Ruby Haskell COMIT Snobol Icon Forth PostScript
Some important ideas
• Algol 60: formal grammars, block structure, stack-based implementation • Algol 68: orthogonality, concurrency, operator overloading • Lisp: garbage collection, recursion • CLU: generics, exception handling • Smalltalk: OOP • Pascal: data types, strong typing • Modula-2: modularity • Eiffel: design by contract
Orthogonality
• How do language features affect one another?
– independent features are ‘orthogonal’, i.e. at ‘right angles’ to each other • Examples: – arithmethic expressions and arrays – arrays and procedure parameters – exception handling and concurrency
Formalism
• Fortran used an English description – ambiguous: a correct Fortran program is a program which a particular compiler accepts • Algol 60 introduced formal syntax (BNF) – compilers could use syntax rules to determine correctness – semantics were described in English, so some ambiguities remained
Formalism
• Algol 68 used van Wijngaarden (‘two level’) grammars – could specify syntax and semantics precisely – could not construct automated parsers – general opinion: elegant, but too complicated for mere mortals
Foot-shootery
• Fortran DO-loop • Algol 60 & Coral comments • PL/1 type conversions • C & C++ equality tests • Algol 60, C: null statements using ‘;’
Awkward design
• Dangling ELSE – C, C++, Java, Pascal, ...
• Inconsistent syntax – semicolons as separators in Algol, Pascal – declarations in Pascal • Ad-hoc restrictions – statement layout, subscripts in Fortran – order of declarations in Pascal
Language issues
• Program structure – control structures – procedures, functions, subroutines – parameter passing – nesting, recursion – classes, methods, polymorphism – concurrency
Language issues
• Variables – lifetime and scope – ‘referential transparency’ • Data types – scalars, arrays, structures, unions...
– range checking • Error handling – throwing and catching exceptions – abort or resume?
Program structure
• Compilation units: – Algol 60/68, Pascal: monolithic – Fortran: subroutine or function – Ada: procedure, function or package – C, C++: source file (usually one or more functions) plus ‘include files’ – Java: class
Control structures
• In the beginning was the GOTO statement...
– reflects the underlying assembly language – can lead to ‘spaghetti code’ • Common requirements: – decisions – loops – special language constructs invented to support these requirements
Decisions
• Fortran: IF (n) a,b,c – this is a 3-way GOTO statement • Algol 60: if a > b then c else d; – c cannot be another IF statement – use begin ... end blocks to enclose sequences of statements • C, C++, Java, C#: if (a > b) c else d – ambiguity if c is another IF statement – use { ... } to enclose sequences of statements
Loops
• Fortran: GOTO, or DO loop – DO is like modern FOR loops – ‘DO 10 I=1,100,2’ • Algol 60: for & while loops – while loops could replace use of GOTO – ‘for i := 1 step 1 until 10 while a < b do ...’ – floating-point loops also possible: ‘step 0.1’ – where is control variable of ‘for’ loop declared and what is its scope?
Loops
• Snobol: – a string-processing language – each statement can match a string against a pattern – each statement can have two labels to jump to on completion (match succeeds or fails)
Loops
• Icon: – ‘generators’ either succeed and produce a result, or fail – generators which succeed can be called again to produce the ‘next’ result (if any) – loops can iterate over every result of a generator – ‘every i := (1 to 10) | (91 to 100) do write i’ – ‘every write( (1 to 10) | (91 to 100) )’
Block structure
• Algol 60 introduced the notion of ‘blocks’ – sequence of statements (and local variable declarations) enclosed by begin ... end – used where more that one statement needed inside an if or loop statement – this is usually the case!
– C and its descendents use { ... } instead of begin ... end
Block structure
• C, C++, Java, C#: the ‘dangling else’ problem – if (a) if (b) then c; else d; • Algol 60: no dangling ‘else’ – if X then Y: Y cannot be another if statement (but can be a block which contains one) • Algol 68, Ada: all control structures fully bracketed (so no need to use blocks): – Algol 68: if a then b else c fi; – Ada: if a then b else c end if;
Procedures
• Most languages have procedures (no result) and functions (which return a result) – function calls can be used in arithmetic expressions – Fortran called procedures ‘subroutines’ • Algol 68, C, C++, Java, C#: functions can be used as procedures (result is discarded)
Procedures
• Procedures and functions can have parameters – limitations on what types are allowed?
– limitations on what types can be returned by function?
– limitations on what the procedure or function can do with the parameter (e.g. read only, read/write)
Procedures
• Are procedures and functions first-class objects?
– e.g. integers are first-class objects: can be stored in variables, passed as parameters, returned as results – can procedures be stored, passed, returned in this way?
• A related idea: closures – a closure is a function call together with its current scope
Parameter passing
• By value: a copy of the parameter is given to the function (C, C++, Java, Ada) – changing the parameter only changes the copy • By reference: a reference to a variable is given to the function (C++, Ada, Fortran) – changing the parameter changes the original variable – can be done by passing a pointer by value
Parameter passing
• By value/return: a copy of the parameter is given to the function and the updated value is written back to the corresponding variable at the end (Ada) • By name: the parameter is re-evaluated each time it is referenced (Algol 60) – ‘Jensen’s device’: much too complex for practical use
Nesting
• Can functions be defined inside functions?
– Ada, Pascal, Algol: yes – C, C++, Java: no • Nested functions can access variables defined in the surrounding functions – makes life slightly more difficult for the compiler writer
Recursion
• Can functions call themselves?
– yes, except in languages like Fortran where variables are static and the return address is stored in a static variable – a recursive call in Fortran would overwrite the previous return address • Recursion requires dynamic variables – each time you call the function a new set of local variables needs to be created
Classes
• A feature of object-oriented languages – Java, C++, Ada 95, ...
– contains data and methods (functions) – supports inheritance • Multiple inheritance?
– allowed in Eiffel, C++ – not allowed in Smalltalk, Ada 95 – ‘interfaces’ in Java and C#
Classes
• Polymorphism: overriding inherited methods in a derived class – standard behaviour in most languages (e.g. Smalltalk, Java) – only ‘virtual’ methods in C++ • Classes also cross the divide into data types...
Concurrency
• How fine-grained is the concurrency?
– Ada, PL/1: task blocks (coarse grained) – Algol 68, occam: individual statements (fine grained) • How do tasks synchronize their activities?
– PL/1: no general mechanism – Algol 68: semaphores (low-level)
Concurrency
• Higher-level synchronization mechanisms introduced later: – monitors (Concurrent Pascal, Ada, Java) – rendezvous (Ada) – channels (occam)
Variables
• Lifetime: how long does the variable exist for?
– static (same as lifetime of program) – dynamic (same as lifetime of block where the declaration occurs) • Scope: where can the variable name be used?
– inside block where it’s declared – anywhere, or only after declaration?
Variable lifetime
• Fortran: all variables are static • C/C++: global and static variables are static, all others are dynamic • Java: static variables are static, class variables live as long as the object they belong to, all others are local • Ada: variables in packages are static, all others are dynamic
Variable scope
• Variables can usually only be accessed from the point of declaration to the end of the block containing the declaration – Fortran: COMMON blocks allow several modules to access the same variables – C/C++: extern allows access to variables defined externally (also C++ friends) – Ada: packages allow variables to be shared
Data types
• Fundamental dichotomy for elementary data: – numeric types – text types • Pascal, Ada etc. also allow user-defined ‘enumeration’ types – type Day is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
Data types
• Numeric types: – integer – real (floating point; fixed point) – decimal or binary?
– range and precision?
• Text types: – fixed or variable length?
– bounded or unbounded?
Data types
• Composite types: – arrays – records/structures • Pointer types: – linked lists, trees etc.
– can pointers be converted to other types?
– can pointers be manipulated?
Composite types
• Arrays: – fixed bounds? (e.g. Ada) – static bounds? (e.g. C) – flexible bounds? (e.g. Algol 68) – strings?
• Records: – structural variants?
Composite types
• When are two composite types the same?
– same structure? (Algol 68) – same type name? (Ada) – same array bounds? (Pascal)
Pointers
• Pointers allow dynamic data structures – point anywhere (C) or just to objects which are heap-allocated (Pascal)?
– typed (C) or untyped (PL/1)?
– interconvertible (C) or not (Ada)?
• Reclamation issues: – manual or automatic garbage collection?
Pointers
• Algol 68 uses reference types as a general mechanism – an
int
is an integer (constant, e.g. 123) – a
ref int
is a reference to an integer (an integer variable, which can refer to a particular
int
value like 123) – a
ref ref int
is a reference (pointer) to a
ref int
object (an
int
variable)...
Pointers
• Java ‘references’ are really pointers – but they can’t be manipulated as in C/C++ • PHP has references too – more like links in a Unix file system • Languages like Lisp use pointers to implement lists – these are not accessible at the language level
Names and types
• Names (for e.g. variables) must be defined (implicitly or explicitly) • Implicit declaration (e.g. Fortran, Perl) – Fortran has implicit typing – everything in Perl is a string, array or hash (identified by first character: $, @ or %) • Problems: – lexical errors (typing errors!) will create new variables
Names and types
• Typeless declaration (e.g. Lisp, Smalltalk) – names must be defined before use – everything in Lisp is a list – everything in Smalltalk is an object • Problems: – type mismatches not detected until runtime
Names and types
• Typed declarations (most languages) – strong typing or weak typing?
• Strong typing: – no automatic type conversion (e.g. Ada) – some escape mechanism is usually required (e.g. Unchecked_Conversion)
Names and types
• Weak typing – automatic type conversions provided (e.g. PL/1) – can disguise problems due to use of a wrong variable • Most languages are a compromise – some automatic conversions (e.g Algol 68)
Names and types
• Type inferencing (e.g. Haskell) – types inferred from operations used – most general available type used (e.g. Num for numeric types) – use of specific types (e.g. Int where Num is required) will force use of Int throughout
Higher-order types
• Functions as types – e.g. Lisp, Haskell – functions can create new functions and return them as their result • Types as types?
– e.g. Smalltalk has a class called Class – so does Java (reflection API)
Fortran
• Fortran ("FORmula TRANslation") – developed 1954 by John Backus, IBM – designed for scientific calculations • Implementation – based on IBM 704 systems – language features designed ad-hoc to suit IBM 704 – buggy hand-coded compilers (pre-1958)
Fortran
• No formal language specification – compiler was the acid test of correctness • Line-oriented format – based on 80-column punched cards – 1-5: statement label, C for comment – 6: continuation of previous line if not blank – 7-72: statement – 73-80: sequence number
Fortran
• Automatic declarations – I to N: integer – A to H, O to Z: real (floating point) – static (so no recursion) • Control constructs: – GOTO n, GOTO (a,b,c) n – IF (n) a,b,c – DO n I=a,b
Cobol
• COmmon Business Oriented Language – descendant of Flow-matic (1955), Univac I – English-like vocabulary and syntax • Four "divisions" in a program: – identification division – environment division – data division – procedure division
Cobol
• Business oriented features: – file processing – report formatting • Data description via "picture" clauses – PIC 9999: 4-digit decimal number – PIC X(20): 20-character string – PIC 999V99: 5-digit number with 2 decimal places
Algol 60
• The most influential language ever – nearly all imperative languages are descendants of Algol 60 • Descended from IAL (International Algebraic Language), aka Algol 58 – initial Algol 60 report: 1959/60 – revised report: 1963
Algol 60
• Innovations: – formal syntax specification – reference language, publication language, hardware representation – block structure, local variables, recursion – structured control statements – free format
Algol 60
• Problems: – ambitious design ignored compilation issues – compilers did not appear until mid-1960s – only caught on in Europe (USA continued to use Fortran) – no standard input/output – no string manipulation
Algol 60
• Main descendants: – JOVIAL (Jules' Own Version of the IAL) – Simula 67 (ancestor of OOP concepts in C++ and Java) – Algol W (Wirth), hence Pascal, Modula-2, Oberon, Ada – CPL, hence BCPL, B, C, C++, Java – Algol 68
Algol 68
• Successor to Algol 60 – supported by IFIP – "Report on the Algorithmic Language Algol 68", van Wijngaarden et al., 1969 – revised report, 1975 • Committee split by disagreements – Wirth et al. produced minority report – went on to develop Pascal instead
Algol 68
• Main features – highly orthogonal language – expression-based – references used as a way of uniting names, values, pointers etc.
– concurrency support • Descendants: – RTL/2, Bourne shell, SPL
Algol 68
• Syntax described using two-level "van Wijngaarden" grammar – context-free upper level grammar generated infinite set of "hypernotions" – these generated an infinite number of rules by substituting hypernotions into lower level rules – described semantic properties (name bindings, coercions, etc.)
Algol 68
• Problems – grammar seen as too complex (and unparsable) – language seen as too complex • Implementations were slow to appear – only subsets were usually implemented (Algol 68R, Algol 68C, etc.)
PL/1
• IBM's attempt at a general-purpose language – part of System/360 project (1964) – amalgam of Fortran, Cobol, Algol – originally called NPL (New Programming Language), then MPPL (Multi-Purpose Programming Language)
PL/1
• Problems – large, complex, difficult to implement – IBM's personal property (they also copyrighted PL/2 to PL/100) – descendants include PL/360, PL/M
Basic
• "Beginner's All-purpose Symbolic Instruction Code" – Kemeny & Kurtz, early 1960s • Developed as teaching language at Dartmouth College (USA) – simplified Fortran-like syntax – interactive!
APL
• "A Programming Language" (K. Iverson) – array processing using powerful array operators – designed for expressing numerical algorithms – proverbially cryptic, requires extensive character set – ancestor of J
String processing
• Application areas: – natural language processing – symbolic mathematics – general text processing • COMIT – MIT, late 1950s
String processing
• SNOBOL – descendant of COMIT from Bell Labs – Snobol 4 (Ralph Griswold, 1967) – ancestor of Icon • Uses success/failure of string matching to control program flow – generalised into Icon's "generators"
String processing
• Scripting languages – Perl, Python, Tcl, Ruby, ...
– also shell command languages – used to "script" actions of programs in other languages • Macro languages – GPM, Trac, M2, ML/1, PHP, ...
– embedded text processing
List processing
• Application areas – artificial intelligence, natural language processing • IPL-V – Carnegie Institute, late 1950s • LISP – developed by John McCarthy (Stanford), co-author of Algol 60 revised report
List processing
• LISP development: – LISP 1, LISP 1.5 (late 1950s) – Common Lisp (ANSI, early 1980s) • Innovations: – programs are lists too – garbage collection – functional programming style (recursion instead of iteration)
Systems programming
• For implementing "system software" – also real-time or embedded applications – operating systems, compilers, etc.
• Requirements: – efficiency is crucial – low-level access top hardware facilities – bit-twiddling operations
Systems programming
• BCPL – "Basic CPL" – untyped language (or monotyped) • C – descended from BCPL via B – used to implement Unix
Systems programming
• Bliss – used by DEC for writing operating systems and utilities • Coral 66 – MOD standard based on Algol 60 – direct access to memory • RTL/2 – real-time language supporting concurrency
Systems programming
• JOVIAL – descendant of IAL (Algol 58) – used extensively by US DoD • Forth – embedded language developed for radio telescope control – ancestor of PostScript (printer page description language)
Object-oriented languages
• Simula 67 – simulation language descended from Algol 60 (Dahl & Nygaard) – introduced the concept of a "class" • Smalltalk – developed by Alan Kay at Xerox PARC, 1972 (revised to produce Smalltalk-80) – interpreted language, message-passing
Object-oriented languages
• Objective-C – mixture of C and Smalltalk by Brad Cox – primary language for NeXT computers • C++ – mixture of C and Simula by Bjarne Stroustrup – Bell Labs: telephone routing simulation etc.
Object-oriented languages
• Java – originally called Oak (James Gosling, early 1990s) – redeveloped at Sun for Internet use, 1995 • Eiffel – developed by Bertrand Meyer, early 1980s – assertion-based with compiler-checked preconditions and postconditions
Other languages
• Functional languages – Lisp, Scheme – FP, ML, Miranda, Haskell • Rationale – break away from von Neumann tradition (mutable variables, iteration etc.) – mathematical purity will mean program correctness can be proved formally
Other languages
• Constraint-based languages – Prolog (Colmerauer and Roussel, 1971) and many derivative languages – unification-based, goal-oriented theorem proving approach • Applications: – natural language, artificial intelligence – basis for Japan's 5th generation project
Other languages
• Other AI languages: – POP 2 (Robin Popplestone, Edinburgh, late 1960s) – Pop-11 (mid 1970s) – POPLOG (Sussex, late 1980s) combined Pop-11, ML, Common Lisp
Other languages
• Visual languages – programs constructed using 2D graphics instead of a 1D text stream • Examples – spreadsheets?
– ProGraph – Lego Mindstorms – Alice
The von Neumann legacy
• Computers all based on von Neumann's design principles – single memory consisting of identical numbered cells – memory used for instructions and data – values in memory are mutable – processor executes one instruction at a time – instruction sequence given explicitly
The von Neumann legacy
• Most languages mirror the von Neumann design – variables: mutable objects in memory – execution sequence given by sequence of instructions, (and loops, ifs, etc.) – single thread of control – pointers: memory cells can reference ("point to") other memory cells – no type information in memory cells
The von Neumann legacy
• Problems: – dangling pointers – uninitialised variables – variables going out of range – can't reason about values of mutating variables – non-optimal sequencing
The von Neumann legacy
• Dangling pointers: – pointer to ‘thin air’?
– pointer to a valid (but wrong) address?
– pointer to right address, but wrong type of data • Related problems: – memory leaks – garbage collection
The von Neumann legacy
• Uninitialised variables: – undetectable, since any value in a memory cell might be valid – value is undefined (note: technical term!) – uninitialised pointers • Could automatically initialise variables – e.g. Java (partly), Basic – loss of efficiency?
The von Neumann legacy
• Out-of-range variables – arithmetic overflow (wrong sign for result) – no hardware support for range checking • Software range checking (like Ada) – added overhead, so reduced efficiency – but: manual checks can't be optimised by compiler
The von Neumann legacy
• Mutating variables – can't apply mathematical proof techniques – values can change arbitrarily • Global variables – can be affected non-locally – hard to debug – rogue pointers + single memory means all memory is effectively global!
The von Neumann legacy
• Explicit sequencing – might be a better sequence?
– optimal sequence might depend on external events • Multithreading – synchronization issues (mutable data again)
The von Neumann legacy
• Data-flow sequencing – instructions can be executable at any time as long as all dependencies are satisfied • Example: making tea – tasks: boil water, put tea in pot, put water in pot, pour milk, pour tea – tasks 1, 2, 4 possibly concurrent – task 3 depends on 1, 5 depends on 3
Functional programming
• Influenced by Church's lambda calculus – mathematical theory of functions – if x and y are lambda expressions, so is (x.y) – if x is a variable and y is a lambda expression, λxy is a lambda expression • Can describe rules of arithmetic using nothing but lambda expressions
Lambda calculus
• Evaluating: – (λxy . z) is reduced to y, with all occurrences of x in y replaced by copies of z – like function application (x is formal parameter, y is function body, z is actual parameter)
Lisp
• Original functional language (John McCarthy, ~1960) • A list is a pair (car . cdr) – car = head of list, cdr = tail of list – longer lists possible, e.g. (a . (b . (c . nil))) – abbreviation: write this as (a b c) • Programs are represented by lists, too
Lisp
• Items are atoms or lists – list components are either atoms or other lists – the empty list: (), or "nil" • Basic functions: – (car (a . b)) = a, (cdr (a . b)) = b – (cons a b) = (a . b) – (atom x): true if x is an atom
Lisp
• Booleans: – true represented by atom T, false by () or NIL • Arithmetic: – prefix notation used, e.g. (+ 1 2) = 3 • Control: – COND function for choosing alternatives – recursion instead of loops
Lisp
• Example: program to sum the positive numbers in a list (def sum (lambda (x) ) (cond ((null x) 0) (t (+ (car x) (sum (cdr x)))) )) – LISP = "Lots of Irritating Superfluous Parentheses"
Lisp
• Important ideas – use of lists to represent data structures – use of lists to represent programs, too – use of recursive functions – garbage collection – functions as first-class objects: (def name lambda-expression)
Lisp
• Problems with Lisp: – inefficient interpreted implementations – too different!
– untyped, so no error-checking at compile time • Non-functional features added later – prog, setq, goto
FP
• John Backus' Turing Award paper – ‘Can programming be liberated from the von Neumann style?’ (Communications of the ACM, 1978) – explicit discussion using a language called FP
Functional languages
• Later developments: – efficient implementations: combinators, tail recursion, better garbage collection – pattern matching notation: fun sum () = 0 | sum (x : xs) = x + sum xs – lazy evaluation (operations on infinite lists)
Functional languages
• Other concepts: – ‘curried’ functions (named after Haskell Curry): partial application – higher-order functions: functions that operate on functions – strong typing, type inferencing – monads: sequencing in a pure functional style, I/O using streams
Haskell
• Named after Haskell Curry (again) • An example of a modern functional language – lazy evaluation – strong typing – type inferencing – polymorphism
Haskell
• Data types – basic types (Int, Char etc) • 123 'x' – lists: [t] is a list of type t • [1, 2, 3] [1..5] [2..] [ x*x | x <- [1..5]] • [1,2] same as (1:(2:[])) – tuples: (t1,t2,t3) is a tuple of different types • (1, "this", 5.0) (1,2) • (Int, [Char], Float) (Int,Int)
Prolog
• Uses a theorem-proving approach to programming • Programs consist of: – a list of starting points (facts) – a list of rules giving relationships between facts – a goal
Prolog
• No explicit sequencing is needed – the interpreter attempts to construct a path to the goal using the rules and facts provided – if it reaches a dead end, it will backtrack and try a different sequence • No variables are needed – values are bound using unification
Prolog
• Example facts: male(albert).
male(edward).
female(alice).
female(victoria).
parents(edward,victoria,albert).
parents(alice,victoria,albert).
Prolog
• Example rules: sibling(X,Y) :- parent(X,A,B), parent(Y,A,B).
sister(X,Y) :- sibling(X,Y), female(Y).
• Example goal: ?- sister(edward,S).
– evaluation proceeds by trying to match this goal using the available rules and facts – variables are unified with matching values
Prolog
• Example evaluation: – sister(edward,S) matches sister(X,Y) where X=edward and Y=S, so...
– new goals: sibling(edward,S) and female(S) – sibling(edward,S) creates new goals: parent(edward,X,Y) and parent(S,X,Y)
Prolog
• Example evaluation: – parent(edward,X,Y) matches parent(edward,victoria,albert), so X=victoria and Y=albert – parent(S,victoria,albert) matches S=edward – female(edward) fails, so backtrack: undo the last match (S=edward) and try S=alice
Prolog
• Pattern matching used for lists: append([],X,X).
append([A|B],C,[A|D]) :- append(B,C,D).
• Example: append([1,2],[3,4],X) – X = [1,D] where append([2],[3,4],D) – X = [1,2,D'] where append([],[3,4],D') – X = [1,2,3,4] • Also: append(X,Y,[1,2,3,4]).
Prolog
• Data structures: – built-in support for integers and integer arithmetic – lists (similar to Haskell) – facts as tree structures, e.g. parent(edward,victoria,albert)
Prolog
• Evaluation sequence: – try to match rules sequentially – bind variables to values as part of this process – if a rule fails, backtrack (undo the last successful match) and try the next rule
Prolog
• Once a variable has been unified, it keeps the same value throughout the current application of the current rule – the same name may be bound to different values in different rules (names are local) – the same name may have a different value in a different instance of the same rule • Exception: the anonymous variable ‘_’
Prolog
• Prevent backtracking using a ‘cut’ – symbolised by ‘!’ – if X is matched successfully in X, !, Y failures in Y will not backtrack past the cut to search for another value for X parents(adam,0) :- !.
parents(eve,0) :- !.
parents(X,2).
Prolog
• More cut examples: factorial(N,F) :- N < 0, !, fail.
factorial(0,1) :- !.
factorial(N,F) : N2 is N-1, factorial(N2,F2), F is N*F2.
– ‘cut and fail’ in first rule prevents negative arguments – cut in second rule prevents attempts to use rule 3 to evaluate factorial 0
Prolog
• Input & output: – special predicates (e.g. write/1) which always succeed – I/O operation performed as a hidden side effect • Storing state information – use asserta/1 to add a rule – use retract/1 to retract a rule
Prolog
• State example: can_go(Place):- here(X), connect(X, Place).
can_go(Place): write('You can''t get there from here.'), nl, fail.
move(Place): retract(here(X)), asserta(here(Place)).
here(kitchen).
connect(kitchen,hall).
move(hall).
Complexity
• Languages have evolved to be able to cope with problems of increasing complexity – systems of upwards of 100KLOC are common in real life – early languages had problems with systems of this size
Complexity
• The coupling problem – the more one piece of software knows about another, the higher the coupling – can't make changes without affecting other parts of the system that are coupled to it – primary goal is to minimise or abolish coupling between components – analogy: standard connectors in hardware
Complexity
• Spaghetti programming – unstructured, use of
goto
statements – makes it hard to partition software • Can’t make assumptions about the properties of any block of code – can jump into the middle of any block from somewhere else
Complexity
• Structured programming – program composed of structured blocks – one entry point, one exit point – can reason about code within a block • Structured blocks decouple flow-of control issues – but not flow-of-data issues
Complexity
• Global variables – can be accessed from anywhere, so hard to reason about values of variables • Local variables – only accessible within a single block, so can reason about them – static variables retain their values from one execution of a block to the next
Complexity
• Subroutines (procedures, functions) – allows program to be decomposed into functional modules – can pass parameters as a way of avoiding using global variables for communication – reduces data coupling if global variables avoided
Complexity
• Separate compilation – allows code reuse through subroutine libraries – can compiler check that an external subroutine is being called correctly?
– requires cooperation from linker...
Complexity
• Fortran – separately compiled subroutines – no link-time checking – static & global variables only (so no recursion) – unstructured coding (e.g. no structured if statement)
Complexity
• Algol 60/68, Pascal – no separate compilation (non-standard language extensions needed) – procedures are nested within main program, so call checking is possible – structured programming possible – local / global variables, no static variables – recursion possible
Complexity
• C – local, global, static variables – no linkage checking – use of common ‘header files’ to allow compiler checking (but not foolproof)
Complexity
• Modules – related groups of procedures, variables etc.
– can selectively export some of the contents but can hide others – can create abstract data types – examples: Modula-2, Oberon, Ada (and also O-O languages)
Complexity
• Modules have a public interface – allows compile-time checking of calls – implementation is hidden (and can be changed) – interface specification forms a contract between a module and its users – inter-module coupling is reduced
Complexity
• Object oriented languages – modules become data types!
– inheritance, polymorphism • Problems with inheritance errors – incorrect signature when overloading operations (only C# fixes this one so far)
Complexity
• Modules/classes should be client-neutral – should not make any assumptions about their users – should allow code reuse • Exception handling – decouples error detection from error recovery
Complexity
• The goal: ‘software components’ – use like hardware components – buy off the shelf, ready tested, and assemble into a complete system • Generic programming – Ada generics, C++ templates – allows algorithms to be decoupled from the data types they operate on
Errors
• Error detection – at compile time (errors within a module) – at link time (errors resulting from mismatched calls) – at run time – at instantiation time (e.g. C++ templates) – at derivation time (errors in inheritance, inability to override/implement operations)
Errors
• Compile time errors – well known how to detect errors, but what errors does the language allow to be detected?
– strong typing allows type mismatches to be detected – preconditions and postconditions (e.g. Eiffel) allow for better static verification – variable-length parameter lists (e.g. C) reduce checkability
Errors
• Link-time errors – do separately-compiled modules include parameter information?
– C doesn’t do this – C++ used ‘name mangling’ to add type information to function names for use with standard linkers
Errors
• Run-time errors: – e.g. array bounds checking – use of exceptions to report errors and recover from them – run-time checks decrease program efficiency (extra code means more space and time) – leaving checks out in production code like leaving parachute behind after you get your pilot’s license
Errors
• Instantiation time errors: – happens when trying to reuse code written much earlier – code is generated automatically with generic parameters plugged in – if parameters don’t match the way the code uses them, the compilation fails – error messages usually relate to generic code, not what you wrote – Ada generics have precise contracts to avoid this problem
Errors
• Derivation time errors: – happens when deriving new classes in object-oriented languages – happens long after original class was written – you override an operation but the old one gets called (possible in C++, Ada) – you think you’re overriding but you get it wrong and add a new function (only C# handles this one)
Errors
• Concurrency – allows the use of ‘active objects’ executing independently – can partition program across multiple processors and/or machines – use RPC, CORBA, Ada partitions, ...
– occam and transputers
Errors
• Imperative languages have problems with concurrency – problem must be partitioned by hand (the programmer must identify concurrency) – lack of referential transparency means synchronisation needed when updating shared variables – sychronisation easy to omit or get wrong
Errors
• Other problems with concurrency: – errors can be time-dependent – hard to reproduce reliably – hard to diagnose – new failure modes (e.g. deadlock)
Summary
• Languages evolve at the whim of their designers – usually in the direction of increased expressiveness and safety, but not always – some features turn out to be bad, but only in hindsight – there is a dynamic between easy of use and safety (e.g. strong vs. weak typing) – programmers can be resistant to new ideas