Webdamlog - Serge Abiteboul

Download Report

Transcript Webdamlog - Serge Abiteboul

Knowledge out
there on the Web
Serge Abiteboul
2014
Abiteboul - EDBT keynote, Athenes
1
Personal
knowledge
Knowledge
out
out
there
onthere
the Web
• Knowledge out there on the Web:
Video of the talk at the Royal Society
2014
Abiteboul - EDBT keynote, Athenes
2
Organization
1. The context
2. The personal information management system
1. The concept of Pims
2. Pims are coming
3. Advantages
3. From information to knowledge
4. The Webdamlog language
1. The language in brief
2. Probabilities
3. Access control
5. Conclusion: some research issues
2014
Abiteboul - EDBT keynote, Athenes
3
1. The context
2014
Abiteboul - EDBT keynote, Athenes
4
Data explosion
•
•
•
•
•
•
•
data: pictures, music, movies, reports, email, tweets, contacts, schedules…
social interactions: opinions, annotations, recommendation…
metadata: on photos, documents, music…
ontologies: Alice’s ontology and mapping with other ontologies
web localizations: friends account on FB, twitter, lists of blogs…
security: credentials on various systems
data in various organizations
– jobs, schools, insurances, banks, taxes, medical, retirement…
• data in various vendors
– amazon, retailers, netflix, applestore…
• data that software or hardware sensors capture
– with or without our knowledge
– web navigation, phone use, geolocation, "quantified self" measurements,
contactless card readings, surveillance camera pictures,
•
2014
…
Abiteboul - EDBT keynote, Athenes
5
Data dispersion
•
•
•
•
•
•
•
•
Laptop, desktop, smartphone, tablet, car computer
Residential boxes (tvbox), NAS, electronic vaults…
Mail, address book, agenda, todo-lists
Facebook, LinkedIn, Picasa, YouTube, Tweeter
Svn, Google docs, Dropbox
Government services
Business services
Also machine and systems from
– family, friends, associations, work
• Systems even unknown to the user
– third party cookies
2014
Abiteboul - EDBT keynote, Athenes
6
Data heterogeneity
Type: text, relational, HTML, XML, pdf…
Terminology/structure/ontology
Systems: MS, Linux, IOS, Android
Distribution
Security protocols
Quality: incomplete / inconsistent information
2014
Abiteboul - EDBT keynote, Athenes
7
Bad news
• Limited functionalities because of the silos
– Difficult to do global search, synchronization, task
sequencing over distinct systems…
• Loss of control over the data
– Difficult to control privacy
– Leaks of private information
• Loss of freedom
– Vendor lock-in
2014
Abiteboul - EDBT keynote, Athenes
8
Growing resentment
• Against companies
– Intrusive marketing, cryptic personalization and
business decisions (e.g., on pricing), and automated
customer service with no real channel for customers'
voices
– Creepy "big data" inferences
• Against governments
– NSA and its European counterparts
• Dissymmetry between what these systems know
about a person, and what the person actually
knows
2014
Abiteboul - EDBT keynote, Athenes
9
Future alternatives (for normal people)
1. Continue with this increasing mess
– Use a shrink to overcome frustration
2. Regroup all your data on the same platform
– Google, Apple, Facebook, …, a new comer
– Use a shrink to overcome resentment
3. Study 2 years to become a geek
– Geeks know how to manage their information
– Use a shrink to survive the experience
4. And, of course,
there is the Pims’ way
2014
Abiteboul - EDBT keynote, Athenes
10
2. The personal information
management system
2.1 Introduction
2014
Abiteboul - EDBT keynote, Athenes
11
The Pims
• Personal information management system
• What is a successful Web service today
– Some great software
– Some machines on which it runs
(and a business model)
• Separate the two facets
– Some company provides the software
– It runs on your machine
with another business model
2014
Abiteboul - EDBT keynote, Athenes
12
The Pims (1)
• The Pims runs software
– The user chooses the code to deploy on the server.
– The software is open source, a requirement for security.
• With the user's data
– All the user’s personal information
•
0n the user’s server(s)
–
–
–
–
The user owns it or pays for a hosted server
The server may be a physical or a virtual machine
It may be physically located at the user’s home (e.g., a tvbox) or not
It may run on a single machine or be distributed among several
machines
– The server is in the cloud, i.e., it can be reached from everywhere personal cloud
2014
Abiteboul - EDBT keynote, Athenes
13
The Pims: the 2 main issues
• Security
– Enforced by the Pims: guaranteed by the contract the user
has with the Pims
• Reasonably small piece of code; possible to verify it
– Enforced by the services running on it: open source so that
we don’t need to trust the providers of these systems
– A higher level of security than now
• The management
– Should be epsilon-work
– Should require little competence
– A company can be paid to do it (in the cloud)
2014
Abiteboul - EDBT keynote, Athenes
14
2. The personal information
management system
2.2 This is arriving
2014
Abiteboul - EDBT keynote, Athenes
15
It is becoming possible
• System administration is easier
– Abstraction technologies for servers
– Virtualization and configuration management tools.
• Open source is very active
– Open source technology more and more available
• Price of machines is going down
– A hosted-low cost server is as cheap as 5€/month
– Paying is no longer a barrier for a majority of people
Indeed I am sure you have friends already doing it
2014
Abiteboul - EDBT keynote, Athenes
16
Many people are working on it
• Many systems & projects
– Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits,
Connections, Seetrieve, Personal Dataspaces, or
deskWeb.
– YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud
• Some on particular aspects
– Mailpile for mail
– Lima for a Dropbox-like service, but at home.
– Personal NAS (network-connected storage) e.g.
Synologie
– Personal data store SAMI of Samsung...
• Many more
2014
Abiteboul - EDBT keynote, Athenes
17
Data disclosure movement
• Smart Disclosure in the US
• MiData in the UK
• MesInfos in France
Several large companies (network operators, banks,
retailers, insurers…) have agreed to share with a
panel of customers the personal data that they have
about them
2014
Abiteboul - EDBT keynote, Athenes
18
Big companies are interested
(1) Pre-digital companies
• E.g., hotels or banks
• Disintermediated from their customers by pure
Internet players such as Google, Amazon,
Booking.com, Mint.
• In Pims, they can rebuild direct interaction
• The playing field is neutral
– Unlike on the Internet where they have less data
• They can offer new services without
compromising privacy
2014
Abiteboul - EDBT keynote, Athenes
19
Big companies are interested
(2) Home appliances companies
• Many boxes deployed at home or in
datacenters
– Internet access provider "boxes”, NAS servers,
"smart" meters provided by energy vendors,
home automation systems, "digital lockers”…
• Personal data spaces dedicated to specific
usage
• Could evolve to become more generic
• Control of private Internet of objects
2014
Abiteboul - EDBT keynote, Athenes
20
2. The personal information
management system
2.3 Advantages
2014
Abiteboul - EDBT keynote, Athenes
21
Advantages
• User control over their data
– Who has access to what, under what rules, to do what
• User empowerment
– They choose freely services & they can leave a service
• Participation to a more “neutral” Web
– With the "network effects", the main platforms are
accumulating data/customers and distorting
competition
– The Pims bring back fairness on the Web
– Good practices are encouraged, e.g., interoperability,
portability
2014
Abiteboul - EDBT keynote, Athenes
22
Advantages – New functionalities
•
•
•
•
•
•
•
•
2014
Single identity/login
Semantic global search with (personal) ontology
Synchronization/backups across services
Access control management across services
Task sequencing across services
Exchange of information between “friends”
Connected objects control, a hub for the IoT
Personal big data analysis
Abiteboul - EDBT keynote, Athenes
23
3. From information to
knowledge
(aka let’s move a tad more technical)
2014
Abiteboul - EDBT keynote, Athenes
24
Machines prefer knowledge
• Integration of data & information sources
– It is easier to integrate knowledge than information
• Collaboration between services & devices
– It is easier for services to collaborate using
knowledge than with information
• Problem solving based on knowledge inference
2014
Abiteboul - EDBT keynote, Athenes
25
Humans as well
• The users of the system are human beings
– They want support for managing information
– But they are not geeks
– They don’t want to program
• To facilitate the interactions between humans and
machines,
We should use declarative languages !
2014
Abiteboul - EDBT keynote, Athenes
26
It all started with datalog
• Popular in the 90’s
• Some followers in 00’s
– A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman…
• Recent revival
– 2010 Oege de Moor’s workshop
@oxford
• Datalog 2.0
– 2010 Joe Hellerstein’s keynote
@pods
• Datalog Redux: Experience and Conjecture
– 2014 Frank Neven’s keynote
@icdt
• Remaining CALM in declarative networking
• Now featuring: Webdamlog
2014
Abiteboul - EDBT keynote, Athenes
27
Requirement 1: Distribution
• Different machines
• Different users
• We use the notion of principal here
– family@alice(Bob)
– agenda@Alice-iPhone(…)
– friends@Alice-FaceBook(…)
• A principal comes with identity and privileges
2014
Abiteboul - EDBT keynote, Athenes
29
Requirement 2: Privacy
• Control of who sees what in a distributed
environment
• Access control
• Should be clear from the first part of the talk
this is a most important issue
Tutorial on privacy by Nicolas
Anciaux, Benjamin Nguyen, Iulian Sandu Popa
– Today at 2:00
2014
Abiteboul - EDBT keynote, Athenes
30
The more I see, the less I know for sure.
John Lennon
Requirement 3: Probabilities
• We have to deal with negation
– Elvis was not French
• With negations, come contradictions
– Elvis Presley died in 1977; The King is alive
• There are different points of view
– Elvis’s music is the best; it stinks
• Measure uncertainty with probabilities
2014
Abiteboul - EDBT keynote, Athenes
31
So, what is the goal
• A datalog-style language with
distribution
access control
probabilities
We are lucky, there is such a language:
Webdamlog
2014
Abiteboul - EDBT keynote, Athenes
32
4. The Webdamlog language
(aka let’s be serious)
4.1 Webdamlog in brief
2014
Abiteboul - EDBT keynote, Athenes
33
Facts and rules
Facts are of the form R@p(a1,…,an)
– p is a principal, i.e., Serge, Serge’s-iPhone, Facebook/Serge, [email protected]
Rules are of the form
$R@$P($U) :- $R1@$P1($U1), ..., $Rn@$Pn($Un)
–
–
–
–
–
2014
$R, $Ri are relation terms
$P, $Pi are peer terms
$U, $Ui are tuples of terms
Safety condition
Also negations: ignored here
Abiteboul - EDBT keynote, Athenes
34
The semantics of rules
Classification based on locality and nature of
head predicates (intentional or extensional)
• Local rule at my-laptop: all predicates in the body of
the rules are from my-laptop
Local with local intentional head
datalog
Local with local extensional head
database update
Local with non-local extensional head messaging between peers
Local with non-local intentional head view definition
Non-local
general delegation
2014
35
Abiteboul - EDBT keynote, Athenes
Local rules with local head
Intensional local head
– datalog
[at my-iphone]
fof@my-iphone($x, $y) :- friend@my-iphone($x,$y)
fof@my-iphone($x,$y) :- friend@my-iphone($x,$z),
fof@my-iphone($z,$y)
Extensional local head
– database updates
[at my-iphone]
believe@my-iphone(“Alice”, $loc) :tell@my-iphone($p,”Alice”, $loc), friend@my-iphone($p)
2014
Abiteboul - EDBT keynote, Athenes
36
Local rules & non-local extensional head
Messaging between peers
$message@$peer($name, “Happy birthday!”) :today@my-iphone($date),
birthday@my-iphone($name, $message, $peer, $date)
Example
– today@my-iphone(3/25)
– birthday@my-iphone("Manon”, “sendmail”, “gmail.com”, 3/25)
– [email protected]("Manon”, “Happy birthday”)
2014
Abiteboul - EDBT keynote, Athenes
37
Local rules & non-local intentional head
View definition
boyMeetsGirl@gossip-site($girl, $boy) :girls@my-iphone($girl, $event),
boys@my-iphone($boy, $event)
• Semantics of boyMeetGirl@gossip-site is a join of relations
girls and boys from my-iphone
• Defines a view at some other peer
2014
Abiteboul - EDBT keynote, Athenes
38
Non-local rules
General delegation
(at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :girls@my-iphone($girl, $event),
boys@alice-iphone($boy, $event)
Example: girls@my-iphone(“Alice”, “Julia's birthday”)
– my-iphone installs the following rule at alice-iphone
boyMeetsGirl@gossip-site(“Alice”, $boy) :boys@alice-iphone($boy, “Julia's birthday”)
Useful to distribute work and exchange knowledge
2014
Abiteboul - EDBT keynote, Athenes
39
The thesis
The Web should turn into a distributed
knowledge base where peers share facts and
rules, and collaborate
The language Webdamlog is a first step towards
that goal
Missing
– Probabilities
– Access control
2014
Abiteboul - EDBT keynote, Athenes
40
4. The Webdamlog language
4.2 Probabilities
2014
Abiteboul - EDBT keynote, Athenes
41
Advertisement
Deduction with Contradictions in
Datalog
S.A., Daniel Deutch and Victor
Vianu
– Tomorrow, 11:00
2014
Abiteboul - EDBT keynote, Athenes
42
4. The Webdamlog language
4.3 Access control
2014
Abiteboul - EDBT keynote, Athenes
43
Requirements
Data access Users would like to control who can
read and modify their information
Data dissemination Users would like to control
how their data are transferred from one
participant to another
Application control Users would like to control
which applications can run on their behalf, and
what information these applications can
access.
2014
Abiteboul - EDBT keynote, Athenes
44
The general picture
• Coarse grain for extensional relations
– read access to the relation
• Fine grain for intensional relations
– read access to tuple t requires read access to the
tuples that lead to deriving t
• Delegation controlled in a sandbox
• Focus on read privilege here
2014
Abiteboul - EDBT keynote, Athenes
45
Read: default
• Extensional relations
– if you have read privilege to the relation
• Intensional relations
– if you have read privilege to the relation &
– if you can read all the tuples that have been used
to create this fact – provenance of the fact
2014
Abiteboul - EDBT keynote, Athenes
46
Fine grain access control
[at Bob] album@Alice($p,$f) :- photo@Bob($p,$f)
[at Sue] album@Alice($p,$f) :- photo@Sue($p,$f)
– album@Alice is intensional
– Both Bob and Sue contribute to it
– Peter who has read privilege to album@Alice and
photo@Bob only does not see the photos of Sue
2014
48
Abiteboul - EDBT keynote, Athenes
Paranoiac access control
[at Bob] album@Alice($p,$f) :photo@Bob($p,$f),
friends@Bob($f)
– Issue: you can read Bob’s photos only if you have
read privilege on friends@Bob that Bob wants to
keep private
2014
49
Abiteboul - EDBT keynote, Athenes
Declassification
[at Bob] photo@Alice($p,$f) :photo@Bob($p,$f),
[ hide friends@Bob($f) ]
– Hide: blocks the provenance from friends@Bob
– Bob declassify this data just for the evaluation of
this rule
– You can declassify only tuples you own ↦ grant
privilege
2014
50
Abiteboul - EDBT keynote, Athenes
Issues with non local rules
[at Bob]
message@Sue(“I hate you”) :- date@Alice(d)
aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x)
Ignoring access rights, by delegation, this results in
running
[at Alice]
message@Sue(“I hate you”) :- date@Alice(d)
aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x)
2014
51
Abiteboul - EDBT keynote, Athenes
Default solution: sand box
We run the rule at Alice in a Sandbox
• We use the access rights of Bob
So the second rule does not succeed in sending
secrets
• The message specifies that this is done at
Bob’s request
So requires authentication/signatures
Alternative: delegation without sandbox
2014
52
Abiteboul - EDBT keynote, Athenes
5. Conclusion:
some research issues
2014
Abiteboul - EDBT keynote, Athenes
53
Explaining
• Users want to understand the information
they see, the answers they are given
– In their professional/social life
• Difficulties
– Reasoning with large number of facts
– Information is often probabilistic and not public
– Requires knowing how the information was
obtained (its provenance)
2014
Abiteboul - EDBT keynote, Athenes
55
Serendipity
• You may hear by chance a
song that is going to totally
obsess you
• A librarian may suggest your
reading an article that will
transform your research
• A perfect search engine
• A perfect recommendation
system
• A perfect computer assistant
Such systems are boring
This is serendipity
They lack serendipity
Design programs that would introduce
serendipity in our lives
2014
Abiteboul - EDBT keynote, Athenes
57
Hypermnesia
Exceptionally exact or vivid memory,
especially as associated with
certain mental illnesses
For a user: We cannot live knowing
that any word, any move will leave
a trace?
For the ecosystem: We cannot store all
the data we produce – lack of
storage resources
Forgetting is Key to a Healthy Mind
Scientific American
Image: Aaron Goodman
A main issue is to select the information we
choose to keep
2014
Abiteboul - EDBT keynote, Athenes
58
Babel of human-machine-interaction
• Each time a user interacts with a data source,
does he have to use the ontology of that
source ?
• No!
• Instead of a user adapting to the ontologies of
the N systems he uses each day
• We want the N systems to adapt to the user’s
ontology
2014
Abiteboul - EDBT keynote, Athenes
59
Religion…science…machines
• Knowledge used to be determined by religion
• Knowledge used to be determined scientifically
• Knowledge will now be determined by machines?
• Decisions are increasingly made by machines
–
–
–
–
2014
Stock market (automatic trading)
Fully automated factory
Fully automated metros
Death penalty (killer drones)…
Abiteboul - EDBT keynote, Athenes
60
to the digital world!
• We will soon be living in a world surrounded
by machines that
– acquire knowledge and decide for us
• What will we do with that technology?
• Will we become smarter?
• Will we become master or slave of the new
technology?
2014
Abiteboul - EDBT keynote, Athenes
61
σας
ευχαριστώ
2014
Abiteboul - EDBT keynote, Athenes
62