EUFORBIA The project (IAP 26505) Centre National de la Recherche

Download Report

Transcript EUFORBIA The project (IAP 26505) Centre National de la Recherche

Centre National de la Recherche
Scientifique
The EUFORBIA project (IAP 26505)
GENERAL ASPECTS
Gian Piero ZARRI
CNRS
44, rue de l’Amiral Mouchez
75014 PARIS
France
E-mail: [email protected]
EUFORBIA Open Meeting, Leatherhead 13/09/02
Slide 1
OUTLINE
• General principles;
• The EUFORBIA labels;
• The EUFORBIA ontology;
• The NKRL filtering techniques;
• The Milan Model;
• Conclusion.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 2
General Principles (1)
The Partners:
• Centre National de la Recherche Scientifique
(CNRS), co-ordinator, France (subcontractor: Maison
des Sciences de l’Homme, France);
• AXON, Instituto de Informação Normativa Avançada,
Portugal;
• Department of Computer Science of the University of
Milan, Italy;
• PIRA International, New Media Department, UK.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 3
General Principles (2)
THE EUFORBIA PROJECT:
characterised by a multi-strategy
approach (use of two conceptual
models, NKRL and the Milan Model)
according to the same objective
(creation of advanced filtering
techniques) and making use of co-
ordinated tools.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 4
General Principles (3)
Objectives of the project (from the original proposal):
…contribute to the production and use of new
generations of Internet filtering systems, more powerful
and flexible than the existing ones, and easier to adapt
to the cultural, political or religious differences… (these
systems)… :
 …should support a computer-effective description of
the semantic contents of Web sites that could be
simultaneously i) very precise and complete in the
description of the issues at stake in a given site; ii)
neutral as much as possible with respect to any
specific doctrine, ideology or value system;
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 5
General Principles (4)
 …should provide the users — both the individual
consumers or institutional users — with software
tools able to make use directly of the neutral
descriptions above to set up filtering policies and
filtering schemata according to the most different
cultural, political, religious etc. options.
A prototype, running software system …
will be realised by the consortium under the
control of an “EUFORBIA user group”, active
during the whole life span of the project.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 6
General Principles (5)
In short, the very idea:
We assume that an in-depth
description of the ‘semantic content’ of
Internet sites – which is impossible to
obtain with the traditional approaches –
should allow the implementation of
more sophisticated filtering strategies.
A two-years project: 1/1/2001 – 31/12/2002
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 7
The EUFORBIA labels (1)
The ‘very precise and complete’ and
‘neutral’ descriptions of the contents of
the sites is obtained by adding, during
the construction or at the moment of a
major restructuring of these sites,
EUFORBIA labels which make use of a
high-level knowledge representation
language, NKRL (Narrative Knowledge
Representation Language).
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 8
The EUFORBIA labels (2)
The ‘protocol of description’ adopted consists in
individuating three standard Sections within
an EUFORBIA label:
–
a description of the aims of the examined
Web site (this section could be considered as
the only really mandatory section);
–
a description of some characteristics of the
site that could be interesting to record;
–
a list of the sub-sections with a short NKRL
description of the main characteristics of each
of them.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 9
The EUFORBIA labels (3)
Fiesta Online — http://www.fiesta.com.uk
c10)
(ENUM c11 c12 c13) (the description consists of three parts, i.e., Sections)
c11)
OWN
SUBJ
OBJ
TOPIC
MODAL
fiesta_on_line_internet_site: (http://www.fiesta.com.uk)
property_
(SPECIF dedicated_to (SPECIF internet_posting
(SPECIF porno_image heterosexual_)))
(SPECIF photo_gallery clickable_colour_photo)
(the ‘aims’ Section: site is devoted to the posting of porno, heterosexual colour photos)
c12)
(COORD c14 c15)
(the ‘characteristics’ Section includes two items)
c14)
EXIST
over_18/21_warning: (fiesta_on_line_internet_site)
SUBJ
(the site is labelled with the usual ‘warning’ for adult use)
c15)
OWN
SUBJ
OBJ
TOPIC
fiesta_on_line_internet_site
property_
(COORD1 free_site
(SPECIF internet_edition (SPECIF fiesta_adult_magazine)))
(the site is a free one and is the on-line version of the Fiesta adult magazine)
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 10
The EUFORBIA labels (4)
c13)
(COORD c16 c17 c18 c19 …)
c16)
OWN
SUBJ
OBJ
TOPIC
(the ‘sub-sections’ Section includes several items)
(SPECIF internet_site_section fiesta_on_line_internet_site)
property_
(SPECIF labelled_as picture_gallery_1 readers_wives_1
one_for_the_ladies_1 i_confess_1 shop_1 link_1)
(the sub-sections have different labels)
c17)
OWN
SUBJ
OBJ
TOPIC
MODAL
(SPECIF readers_wives_1 (SPECIF internet_site_section
fiesta_on_line_internet_site))
property_
(SPECIF dedicated_to (SPECIF internet_posting (SPECIF
exhibitionist_porno_image (SPECIF woman_1
(SPECIF cardinality_ (SPECIF more_than 50))))))
(SPECIF photo_gallery (SPECIF clickable_colour_photo
(SPECIF sent_by (SPECIF individual_person_1
(SPECIF cardinality_ several_))))
(the sub-section ‘wives of the readers’ embodies the exhibitionist porno images of more than 50 women)
c18) BEHAVE SUBJ woman_1
c19) BEHAVE SUBJ
MODAL housewife_
(the women are housewives)
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
individual_person_1
MODAL (SPECIF reader_ fiesta_magazine)
(the senders are readers of the Fiesta magazine)
Slide 11
The EUFORBIA labels (5)
The software module for the creation of
the EUFORBIA labels has been implemented
thanks to the collaboration between the CNRS
(France) and AXON (Portugal) EUFORBIA
teams, taking inspiration from an analogous
module realised in another NKRL-based
project, CONCERTO (Esprit 29159)
As all the EUFORBIA software, this
module is realised in Java (JDK 1.3.1)
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 12
The EUFORBIA ontology (1)
The EUFORBIA ontology
An important point related to the construction of
the EUFORBIA labels concerns the set up of an
EUFORBIA ontology, common to the NKRL and
Milan model components of the project. This has
been obtained by adapting the existing NKRL
ontology (H_CLASS) to include terms pertaining to
the ‘pornography’, ‘violence’ and
domains.
‘racism’
This is a collaborative endeavour involving PIRA
International and CNRS.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 13
The EUFORBIA ontology (2)
Two steps:
• In a first one (‘lexical level’), Pira
international has collected, e.g., about 600
terms in the pornography domain, organised
according to Vickery’s ‘facets’ methodology.
• In a second one (‘conceptual level’), CNRS
has grouped under a unique conceptual label
terms that, even if different from a lexical
point of view, refer in reality to the same
‘concept’, e.g., ‘prostitute’ and ‘whore’.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 14
The EUFORBIA ontology (3)
Sexual acts
Sexual deviation
perversion
coitus
intercourse
copulation
fucking
bonking
humping
shagging
screwing
other, the
having it off
poke
lay
get laid
fooling around
anal sex
anal intercourse
coitus in ano
buggery
sodomy
bumming
browning
roger
old dirt road
back scuttle
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
missionary
dog fashion
knee trembler
Slide 15
The EUFORBIA ontology (4)
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 16
The NKRL filtering techniques (1)
NKRL filtering techniques: based on the
‘search patterns’ approach
Search patterns: formal NKRL
structures corresponding, in a sense,
to natural language queries
Search patterns supply the general
framework of information to be searched for,
by filtering or unification, within a knowledge
base of NKRL data structures (here, of
EUFORBIA labels).
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 17
The NKRL filtering techniques (2)
FUM (Filtering/Unification Module)
developed in the CONCERTO project, allows the
direct unification of an NKRL search pattern with
the knowledge base of NKRL structures (of
EUFORBIA labels).
This module already includes a first, simple level of inferencing.
The unification is executed taking into account the fact that a
‘generic concept’ in the search pattern can unify one of its
‘specific concepts’ (or an instance) in the target NKRL structure.
“Generic” and “specific” refer to the organisation of the NKRL
(Euforbia)ontology.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 18
The NKRL filtering techniques (3)
Using the FUM module in EUFORBIA: a rule stating
that the visioning of sites including racist symbolism will not be
accepted will prevent the uploading of the Stormfront White Pride
site on the basis of the unification of its left-hand side:
(?w
IS-NKRL-OCCURRENCE
:predicate EXIST
:SUBJ
(SPECIF visual_content racist_symbol)
:location of the SUBJ: internet_site
with information included in the ‘characteristics’ section of the
EUFORBIA label associated with the site:
c24) EXIST SUBJ (SPECIF graphical_image racist_symbol
(SPECIF cardinality_ few_)):
(stormfront_internet_site)
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 19
The Milan Model (1)
• The Milano Model is a content-based access
control mechanism, initially defined for Digital
Libraries (DLs), and well suited for the Web
environment.
• It generalises and makes more flexible the
approaches based on the PICS standard by
organising the important notions of the domain into
a hierarchy of concepts instead as a set of simple
keywords.
• It manages filtering policies with respect to user
characteristics.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 20
The Milan Model (2)
Main features:
•
content-based filtering of Web documents;
•
flexible specification of filtering policies, based on
the qualification of users rather than on user
identities;
•
positive and negative privileges at different
granularity levels;
•
exception management and policy propagation;
•
support for PICS content labels (in particular,
ICRA/RSACi content labels).
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 21
The Milan Model (3)
•
An extended version of the Milan Model has
been specified according to the
requirements of the EUFORBIA framework.
•
The Extended Milan Model makes use of the
EUFORBIA Conceptual Hierarchy and
applies filtering rules to the set of concepts
contained in the NKRL EUFORBIA labels.
•
The Extended Milan Model has been
implemented in a prototype system, using
Oracle 8.1.7 DBMS and Java.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 22
Conclusion
Main results obtained until now
(September 2002):
• Full new implementation of the Annotation
Manager (for the EUFORBIA labels);
• A full version of the EUFORBIA ontology
(including ‘pornography’, ‘violence’ and
‘racism’);
• A new Java version of the Milan model.
• A full version of the EUFORBIA/NKRL filtering
environment.
ESPRIT
29159
EUFORBIA
Open Meeting, Leatherhead 13/09/02
Slide 23