Syntax: - University of Alberta

Download Report

Transcript Syntax: - University of Alberta

Syntax:
If words “are more like humans than machines”-
Let’s party!
What is syntax?
• Syntax is, as you know, the process which
governs the way in which words are
combined together.
• But to understand it, we need to start by
understanding functions
The nature of computation
• Syntax is a form of computation
• Computation is essentially a mapping: a  b
• In the simplest ‘computers’ (finite automaton), the mappings are
deterministic, from state to state
– State a  State b
• In more complex machines, we get non-deterministic mappings
depending on context (memory): the same state may map onto
to any one of several states due to memory, and that memory
may be under control of the machine itself
• Alan Turing (1936): Very simple machines can compute
anything computable
Functions
• A function is just a mapping from a specific input to a
specific output
– The input and the input don't have to be numbers
– NameProf(x) takes in the number of a course and maps it
onto the name of the person teaching that course
• So: NameProf(357)  Westbury
– RazeTheHouse(x), which takes a house as input, and
returns that house destroyed as output
Primitive Functions
• A primitive is a lowest function: the one that can't
be defined in terms of any other
• Let’s consider an old favorite: ‘+’
• If we want to define a non-primitive function
‘AddOne’ we can:
AddOne(x) = x + 1
• We haven’t added new functionality: we’ve just
re-named what we had in a way that is convenient
Functions of Functions
• Functions can call other functions, including themselves
(recursion)
• Let's define a function AddTwo, which adds two
• We already have AddOne
• We can just say define AddTwo as:
AddTwo  AddOne(AddOne(x))
 AddOne(x + 1)
 (x + 1) + 1
• We haven’t added new functionality: we’ve just
named something in a way that is convenient
Functions of Functions of Functions
• Let's say we want to define a function AddThree, which
adds three
• We already have AddTwo, and AddOne
• We can just say define AddThree as:
AddThree  AddTwo(AddOne(x))
• At some point we may get tired of this game: We are
wasting time and energy trying to name all these silly little
functions, AddOne, AddTwo, AddThree….Will it never
end?
Generalizing functions
• A more general solution would add ANY number to any
input
• But we already know how to do that, since we have
addition as a primitive: AddN(x,n)  x + n
• Notice the difference we had to introduce: we had to
add a second input or parameter
• Why? Because the way AddOne was defined had a
constant in it
• We just said "let that constant be a variable"- and
so we got a much more powerful function that
eliminated the need for thousand of other more
specialized functions: all the AddOne, AddSixteen,
AddSeventy etc.
The magic of parameters
• By adding one variable we got rid of an infinite
number of functions, collapsing them all into a
single function with two arguments
• What we noticed, in essence, is that all cases of
addition were similar- they could be all computed
in the same way we were computing our primitive,
+
• Parameterization can be traded off against
computation
Hey, what about language?
• This is the kind of functional collapse that
Chomsky wants to do
• He wants to show that many things that appear to
be different are minor variations of the same
function, just in the same way that AddOne and
AddThreeHundred are minor variations of the
same function
• He wants to do it in the same kind of way we did:
by saying, look, you have N functions here that
are really just 1 function, plus an extra parameter
How little can we get by with?
• The question becomes: What is the simplest representation
of the computation that is sentence-making?
• This breaks down into the related questions:
– What are the most primitive functions?
– What are their parameters?
• If we can identify a few primitive universal functions and
some universal parameters, we may find deep underlying
similarities between languages that appear on the surface
to be different (as multiplication might appear different
from addition at first sight)
What syntax is not
• One possible way syntax might work would be Markov
chaining: i.e. probabilistic word chaining
– Calculate the likelihood that one word follows another
(transition probability) , and then only select from those
words that actually have a probability > 0 of following a
word
– A frequentist approach
Two arguments against chaining
• Chomsky's initial claim to fame is that he claimed
to have proven that there is no possible way that
word-chaining devices could account for syntax
– Not everyone is convinced, but everyone does agree
that simple word-chaining devices won't work
• Chomsky basically had two main argument
against them:
– i.) Zero probability transitions
– ii.) Relational dependencies
i.) Zero probability transitions
• We can produce and understand transitions that have zero
probability (= have never been encountered before)
– i.e. 'colorless green' and 'sleep furiously' had probably
never been uttered before Chomsky said it, but we can
all agree that it is grammatical, therefore grammar
cannot be only transitions
– This means we can’t be chaining on words
– It also indicates the autonomy of syntax from semantics
• We can judge grammaticality of sentences
independently of their meaning
ii.) Relational dependencies
• Some sentences contain relational dependencies of a kind
that simply cannot be captured by transition probabilities
• For example, consider: "If I show you this sentence, then
you will understand the problem”
– there is a long-distance dependency from 'if' to 'then' that can
(provably) not be captured by a particular kind of transitioncalculating device called a finite state machine
– In normal language, we can say that the problem is simply than
transition devices don't have a memory, so they can't 'force' a later
transition to match an earlier one.
– An aside: There are ways to make transition devices deal with
these problems, but they require all sorts of very clunky machinery
(requiring hugely redundant encoding) that seems very implausible
ii.) Relational dependencies
• The problem gets even more complicated because we can
embed long-distance dependencies
– Consider: "If either I show you this sentence or I explain the
problem clearly, then you will understand what Chomsky's point
was.”
– Now we have a sentence we can all understand, but we have a
second dependency: the 'if' has to first close up the 'either' clause
and also remember that it is needing a 'then'.
• There is not necessarily a simple lexical marker: I can also
say "If I show you this sentence or I explain the problem
clearly, you will understand what Chomsky's point was."now there is no 'or' or 'then' to trigger the memory
– Listen to language, you'll see the point: such long-distance are not
all rare, but occur in many sentences and from a very early age.
ii.) Relational dependencies
• There is well known grammatically-correct sentence that
ends with 5 prepositions closing 4 embeddings, said by a
young child to his father:
"Daddy, what did you bring up that book that I
don't want to be read to out of up for?"
• By the time he gets to "read" the child has to remember the
following dependencies:
– i.) 'to be read' requires 'to’
– ii.) 'that book that' requires 'out of’
– iii.) 'bring' requires 'up’
– iv.) 'what' requires 'for’
And he does….!
Sentences aren’t beads on a string
• Chomsky's solution was one that many take for
granted now: it was to suggest that sentences are
not flat lists of words, but have a tree structure,
and that it is the not the individual words, but
parts of the tree that are the units of language
– i.e. syntactical constraints are not at the single word
level but at the role level, where a role may be played
by a multiword string or a single word
– Each element that can fill a role is called a constituent
A constituent
• An example is a NP (noun phrase), which is
defined in Chomsky's original tree notation as
(det) A* N
• This just means that it contains exactly one
optional determiner (like 'a', 'the', 'some', 'many')
plus any number of adjectives (including 0) plus a
noun.
• 'dog' is a noun phrase
• So is 'A big hairy rabid frightening nasty dog'
So what?
• When our units are defined at the
constituent level, instead of the word level,
we can easily understand how we can re-use
parts in different places as in 'A big hairy
rabid frightening dog bit me' and 'I gave the
big hairy rabid frightening dog a steak’
– it also impacts on the dependency problem,
because we can have trees that constitute a
'memory' for the whole sentence
So what?
• We can have functions (= rules) like:
• S  Either S or S
• S  If S then S
• This kind of self-referentiality- in which an object (here, a
sentence) is defined in terms of itself- is recursion
• Recursion allows for very tightly defined functions, which
simplify complex calculations by defining them in terms of
simpler cases.
A classic example: Factorial
Factorial(x) :
If x = 1  1
Otherwise  x * Factorial (x-1)
Calling each other.
• With recursion in language you can also calculate a very complex
output with very simple rules
S  Either S or S
S  If S then S
• With these two rules we can get sentences like:
"If either my big hairy frightening dog is rabid or my unrepaired
car brakes are faulty, then either I will be going to the scary
grey hospital this afternoon or I will be going mad.”
• This seems to match our ‘mentalese’: ‘A big hairy rabid
frightening dog’ is certainly a dog, and we want to be able to
move our attention around from the dog to the brake and hospital
without being ‘thrown off’ by the number of adjectives or
qualifying clauses attached to those things in the sentence.
Example
• "Tonight's program will discuss stress, exercise,
and sex with Celtic forward Scott Wedman, Dr.
Ruth Westheimer, and Dick Cavett".
• This can be VP  VP NP PP
– VP (verb phrase) = ‘will discuss’
– NP (nouns phrase) = ‘stress, exercise, and sex’
– PP (prepositional phrase)  P NP
• P = ‘with’
• NP = ‘Celtic forward Scott Wedman, Dr. Ruth
Westheimer, and Dick Cavett’
Example
• "Tonight's program will discuss stress, exercise, and sex
with Celtic forward Scott Wedman, Dr. Ruth Westheimer,
and Dick Cavett".
– This can also be VP  VP NP
– VP = ‘will discuss’
– NP  N PP [‘…sex with Dick Cavett…]
• N = ‘stress, exercise, and sex’
• PP  P NP
– P = ‘with’
– NP = ‘Celtic forward Scott Wedman, Dr. Ruth
Westheimer, and Dick Cavett’
How do we know what is what?
• Each part of speech is defined by the role it plays
– so a noun is anything that can go in the NP slot
• There are two main principles for understanding slots:
– i.) The head determines the meaning
– ii.) Slots determine what roles each element in a
sentence can play
i.) The head determines the meaning
• ‘Fox in socks’ is about a fox, not about socks
• ‘Flying to Rio before the taxman catches him’ is
about flying, not about catching
• There are hard rules in every language which
determine which component plays the head role
• We saw one English rule above: NP  N PP
– so, ‘sex with Dick Cavett’ is about a specific kind of
sex, not about a specific attribute of Dick Cavett.
– ‘with Dick Cavett’ is also a slot, called a modifier
ii.) The choreographing of roles
• Slots determine what roles each element in a
sentence can play
– "Ruth Westheimer discussed sex with Dick Cavett"
choreographs three things: the discusser (Ruth), the
object (sex), and the recipient (Cavett)
• Each one of these roles is called an argument to
make clear that they are being fed into a functionthat function is determined by the tree structure
– every end-point (branch) of the tree has to be filled, so
the number of branches = the number of arguments.
So what?
• When we start to think of things in terms of
trees with arguments, then we can start to
see some deep regularities in language
• For example, NP and VP turn out to be very
similar in their abstract structure…
• Tune in next time…