backward-chaining + - Department of Computer Science

Download Report

Transcript backward-chaining + - Department of Computer Science

DS
RC
Data Science
Research Center
High Performance Distributed
Computing
Henri Bal
Vrije Universiteit Amsterdam
DS
RC
Outline
1. Development of the field
2. Highlights VU-HPDC group
3. Links to data science cycle
4. Conclusions
DS
RC
Developments
• Multiple types of data explosions:
– Big data: huge processing/transportation demands
– Complex heterogeneous data
LOFAR: ~15 PB/year
SKA: >300 PB/year,
exascale processing
Complex data
DS
RC
Developments
• Infrastructure explosion
– High complexity: heterogeneous systems with
diversity of processors, systems, networks
DS
RC
VU HPDC GROUP
• Bridge the gap between demanding
applications and complex infrastructure
• Distributed programming systems for
–
–
–
–
Clusters, grids, clouds
Accelerators (GPUs)
Heterogeneous systems (``Jungles”)
Clouds & mobile devices
• Applications: multimedia, semantic web,
model checking, games, astronomy,
astrophysics, climate modeling ….
DS
RC
Highlights VU-HPDC group
Solved Awari 2002
3rd Prize: ISWC 2008
AAAI-VC 2007
1st Prize: SCALE 2008
DACH 2008 - BS
1st Prize: SCALE 2010
DACH 2008 - FT
EYR 2011
Sustainability award
DS
RC
Links to data science cycle
Visual
Analytics
Perception
Cognition
Decision
Theory
Understand
and decide
Distributed reasoning
Distributed
Processing
Reasoning
Knowledge
representati
on
Large Scale
Databases
Store and
process
Software
Eng.
System /
Network
Eng.
Analyze
and model
Multimedia
Retrieval
Modeling
and
simulation
Information
Retrieval
Machine
Learning
DS
RC
Reasoning – Semantic Web
• Make the Web smarter by injecting meaning
so that machines can “understand” it.
o initial idea by Tim Berners-Lee in 2001
• Now attracted the interest of big IT
companies
DS
RC
Google Example
DS
RC
Google Example
DS
RC
Distributed Reasoning
• WebPIE: web-scale distributed reasoner
doing full materialization
• QueryPIE: distributed reasoning with
backward-chaining + pre-materialization of
schema-triples
• DynamiTE: maintains materialization after
updates (additions & removals)
 Challenge: real-time incremental reasoning on
web scale, combining new (streaming) data &
existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen
COMMIT/
DS
RC
•
•
•
•
•
Glasswing: MapReduce
on Accelerators
Use accelerators as a mainstream feature
Massive out-of-core data sets
Scale vertically & horizontally
Code portability using OpenCL
Maintain MapReduce abstraction
With: Ismail El Helw, Rutger Hofman
DS
RC
Glasswing Pipeline
• Overlaps computation, communication &
disk access
• Supports multiple buffering levels
DS
RC
Evaluation of Glasswing
• Glasswing uses CPU, memory & disk
resources more efficiently than Hadoop
• Compute-bound applications benefit
dramatically from GPUs
• Better scalability than Hadoop
• Runs on a variety of accelerators
• E.g. k-means clustering:
– 8.5× (1 node) vs.
15.5 × (64 nodes) vs.
107 × (GPU node)