Transcript Document

Semantics, Syndication and Social
Networks: Mechanisms for Future
Structured Information Spaces
Hamish Cunningham (University of Sheffield)
Werner Haas (Johaneum Research)
Ant Miller (BBC)
Libby Miller (University of Bristol)
Ralph Traphoener (Empolis / Bertelsmann)
Paul Warren (British Telecom)
What’s the difference between
Mother Theresa and Tony Bliar?
http://gate.ac.uk/
http://nlp.shef.ac.uk/
Hamish Cunningham
Dept. Computer Science, University of Sheffield
Why semantic metadata?
1. Different types of metadata allow different types of search
(but also incur different costs and have different limits)
•
•
•
full text: "find me Nevsky in Bulgaria"
taxonomy / thesaurus / semantic annotation / ontology: "find me
churches in Eastern Europe"
E.g. BBC's INFAX taxonomic system: 66% of searches would fail if
only full text
2. The web promotes diversity but also fragmentation; there's
too much of it; less and less impact for curated data
•
In face of this cultural memory institutions need
•
•
•
Syndication and mediation (to pool outlets and multiply impact); this
means presentation-independent, multipurpose content
Users as assistants (to cut the cost of metadata); this can mean shared
conceptualisations of content
How do we get there?
3
The semantic web and why you
can't have it (yet)
• The semantic web is about a semantic layer for
interoperability, machine-readability, inference – ideal
for semantic libraries?
• Problems:
1. Construction and maintenance of shared
taxonomies, terminologies & ontologies is expensive
2. Annotation of content relative to them is v. expensive
3. How does a machine tell the difference between
"Mother Theresa is a Saint" and "Tony Blair is a
Saint"? (Beyond the shallow and the general we get
into typical AI problems, the contextual and shifting
nature of meaning, etc.) 4
Four promising directions
1. Use recommender systems to make the users into
curators’ assistants (who tells Google which page is
important? other web users do, by linking; also
Amazon)
2. Allow curators and users to DIY simple specific
ontologies and KBs (targetted adjuncts to general
models like CIDOC)
3. Use Information Extraction (IE) to populate semantic
models
4. Ride the next wave of social software and on-line
communities (Wikis, Bloggs, OSN, file sharing / P2P,
RSS/ATOM)
5
IT context: the Knowledge Economy
and Human Language
Gartner, December 2002:
• taxonomic and hierachical knowledge mapping and indexing
will be prevalent in almost all information-rich applications
• through 2012 more than 95% of human-to-computer
information input will involve textual language
A contradiction:
• to deal with the information deluge we need formal knowledge
in semantics-based systems
• our archived history is in informal and ambiguous natural
language
The challenge: to reconcile these two phenomena
6
HLT: Closing the Loop
(M)NLG
Human
Language
KEY
MNLG: Multilingual Natural Language Generation
OIE: Ontology-aware Information Extraction
AIE: Adaptive IE
CLIE: Controlled Language IE
Formal Knowledge
(ontologies and
instance bases)
OIE
(A)IE
Controlled
Language
CLIE
7
Semantic
Web;
Semantic
Grid;
Semantic
Web
Services
Information Extraction
• Information Extraction (IE) pulls facts and
structured information from the content of large
text collections.
• Contrast IE and Information Retrieval
• NLP history: from NLU to IE
• Progress driven by quantitative measures
• MUC: Message Understanding Conferences
• ACE: Advanced Content Extraction
• General Architecture for Text Engineering
(GATE): http://gate.ac.uk/
8
IE Example
“The shiny red rocket was fired on Tuesday. It is the
brainchild of Dr. Big Head. Dr. Head is a staff scientist at
We Build Rockets Inc.”
• NE: "rocket", "Tuesday", "Dr. Head“, "We Build
Rockets"
• CO:"it" = rocket; "Dr. Head" = "Dr. Big Head"
• TE: the rocket is "shiny red" and Head's
"brainchild".
• TR: Dr. Head works for We Build Rockets Inc.
• ST: rocket launch event with various participants
9
Ontology-based IE
XYZ was established on 03 November 1978
in London. It opened a plant in Bulgaria
in …
Ontology & KB
Company
Location
HQ
City
type
XYZ
partOf
Country
type
HQ
type
London
establOn
type
partOf
“03/11/1978”
UK
10
Bulgaria
A Necessary Trade-Off
Domain specificity vs. task complexity:
specificity
general
domain
specific
acceptable
accuracy
simple
bag-of-words
complexity
entities
11
relations
complex
events
Open information, defended communities
• Trend 1: seconds out, round 5: file sharing is about to go social
• Trend 2: the living room is about to be computerised
• What will happen when all your living room devices fold into a single PC?
• Bill Gates hopes you'll be running Windoze, but Consumer Electronics
firms bet on Linux & stable hardware (no viruses, no crashes, cheap, ...)
• What if these two trends combine? Ubiquitous on-line
communities centred on shared content, with a model of trust
• What if memory institutions provide means of organising,
explaining, interlinking the cross-over between modern popular
culture and the curated memory?
• Important because DRM is the beginning of the end of
civilisation as we know it (controls how you consume media you
buy; has the potential to be linked with censorship and with
invasive behaviour logging)
• you can't make digital objects behave like physical objects - unless you
totally control the hardware and the operating system
• if someone has control, then we may end up finding that someone has
given the contract for preserving our culture to Haliburton
12
Memory is not a luxury
•C21st: all the C20th mistakes but bigger & better?
•If you don’t know where you’ve been, how can
you know where you’re going?
•Libraries, museums, archives: ammunition in the
war on ignorance (more dangerous than
“terror”?)
•Ammunition is useless if you can’t find it: new
technology must make our history accessible to
all, for all our futures
13
Summary
•Cultural memory can benefit from semantic
metadata, presentation-independence and
repurposing
•Semantic web technology:
– no: it won’t make machines intelligent
– perhaps: simple specific models can work
•Four ways to cross the AI bridge: DIY models;
recommenders; IE; OSN + P2P
•This talk: http://gate.ac.uk/talks/ecdl-sept-2004.ppt
● Related projects:
•More: http://gate.ac.uk/
14