Analysing Word Concreteness and Abstractness in Dictionary Definitions Graham Clark, Stevan Harnad, Les Carr Intelligence, Agents, Multimedia Group Department of Electronics and Computer.

Download Report

Transcript Analysing Word Concreteness and Abstractness in Dictionary Definitions Graham Clark, Stevan Harnad, Les Carr Intelligence, Agents, Multimedia Group Department of Electronics and Computer.

Analysing Word Concreteness and Abstractness in Dictionary Definitions

Graham Clark, Stevan Harnad, Les Carr Intelligence, Agents, Multimedia Group Department of Electronics and Computer Science University of Southampton

1. Introduction

The Symbol Grounding Problem (Harnad 1990, Harnad 2002) indicates that vocabulary must be grounded in the real, physical world in order for the words to have meaning in one's mind. But when words have been grounded in this way, how can they develop into a full vocabulary? Looking at dictionaries which use controlled vocabularies to define all the words within them (all words used in the definitions are from a specified subset of the dictionary) could give some idea as to how new words can effectively be grounded by using a small set of pre-grounded terms. In this investigation, two corpora have been used, the Longman Dictionary of Contemporary English (Longman 1997) and the Cambridge International Dictionary of English (Cambridge 1995). A Web-based survey was conducted in order to categorise the words in the two controlled vocabularies as “concrete” or “abstract”. Concrete words are those which refer to things that can be seen, felt or touched, for example, "tree", "bird" or "flower". Abstract words are those which refer to things and properties of things that are more general or con ceptual, such as "goodness", "truth" or “abstractness”.

2. Survey and Definition Recursion

In the survey, participants were given twenty pairs of words, and asked to rate which one was the more abstract, by moving the slider towards it, as shown in Figure 1 below. Each word was scored against several others, and from these results, each controlled vocabulary was split into concrete and abstract subsets.

Fig 1: Example screenshot of concreteness/abstractness survey In order to effectively ground words whose referents have not been experienced in the physical world, one can use combinations of pre grounded words. For example, if “horse”, “stripes”, and relevent logical operators are already known, “zebra” can be accquired: “a horse with stripes”. Definitions can be recursed through in a tree structure, showing the words that can be efffectively grounded with a given starting set. The much-simplified diagram of a “recursion tree” below shows a path through dictionary definitions, starting at the word “chair”.

3. Parts of Speech

Figures 3-6 below show the part-of speech “make-up” of the concrete and abstract words from the controlled vocabularies of both corpora. The majority of concrete words are nouns – these can be easily physically pointed out to someone, and hence grounded in the real world. Abstract words cover a much wider range of parts-of-speech, so more would have to be “effectively grounded” through internal processes, perhaps similar to the definition recursion described previously.

66 28 4 1 324 390 29 2 54 8 39 2 adjective adverb conjunction determiner interjection noun predterminer preposition verb irregular prefix suffix 726 Fig 3: CIDE Concrete Words 249 647 Fig 4: CIDE Abstract Words 7 12 22 319 Fig 2: Example definition recursion tree starting at “chair” 22 5 Fig 5: LDOCE Concrete Words 494 Fig 6: LDOCE Abstract Words concrete, 825 concrete, 798 abstract, 1258 abstract, 1496 Fig 7: Proportion of concrete and abstract words in CIDE and LDOCE, respectively 83 14 24

4. Concreteness and Abstractness in Recursion

Abstract Concrete Unknow n 250 200 150 100 50 0 1 2 3 4 5

Recursion Tree Level

6 7 8 9 Fig 8: Mean number of wods per tree level for CIDE, starting with concrete words 250 200 150 100 50 0 1 2 3 4 5

Recursion Tree Level

6 7 8 9 Fig 9: Mean number of wods per tree level for CIDE, starting with abstract words 250 Five concrete and five abstract words were taken from each dictionary, and recursive definition trees were built. Figures 7-10 show that many more abstract words are used in definitions that concrete. Each point on the graphs represents the mean number of abstract, concrete or unknown words at each level of the tree. Unknown words account for those which are not present in the controlled vocabulary, or those which do not exactly match a headword. All words in the corpora were stemmed; this greatly reduced the count of unknown words.

200 150 100 The mean number of words at each tree level has been scaled to take into account the smaller proportion of concrete words to abstract.

50 0 1 2 3 4 5

Recursion Tree Level

6 7 8 9 Fig 10: Mean number of wods per tree level for LDOCE, starting with concrete words 250 200 150 100 50 0 1 2 3 4 5

Recursion Tree Level

6 7 8 9 Fig 11: Mean number of wods per tree level for LDOCE, starting with abstract words

5. Definition Length

The number of words in a definition (the definition length) is an indication of how many terms must be pre-grounded in order for it to be understood. Figures 11 and 12 show frequency distribution graphs of the definition length for the LDOCE and the CIDE. The frequencies have been scaled to take into acount the smaller proportion of concrete words to abstract.

abstract concrete 240 200 160 120 80 40 0 0 5 10 15 20

Definition Length

25 30 35 Fig 11: Definition Length Frequency Distribution for CIDE 40 240 200 160 120 80 40 0 0 5 10 15 20

Definition Length

25 30 35 Fig 12: Definition Length Frequency Distribution for LDOCE 40

6. References

Cambridge (1995).

Cambridge International Dictionary of English, CIDE+ edition (electronic version)

, Cambridge University Press.

Harnad (1990). The symbol grounding problem

. Physica

,

42

, 335-346.

Harnad (2002). Symbol grounding and the origin of language. In Scheutz, M. (Ed.)

Computationalism: New Directions.

MIT Press, 143-158.

Longman (1997).

Longman Dictionary of Contemporary English (LDOCE), 3 rd edition (electronic version)

, Addison Wesley Longman.