Center for Data-intensive Systems

Download Report

Transcript Center for Data-intensive Systems

Database and
Data-Intensive Systems
Data-Intensive Systems
• From monolithic architectures to diverse systems


Dedicated/specialized systems, column stores
Data centers, web architectures, distributed architectures
• From business data to all data


Streaming and sensor data, semi-structured and unstructured data
Multidimensional data, temporal data, spatio-temporal data
• Examples






Clustering of high-dimensional data
Tracking and continuous queries for moving objects
Mobile service infrastructure
Location privacy
Spatio-textural search/hyper-local web search
Multimedia similarity search
• This is where much of our research “lives.”
Staff
•
•
•
•
Ira Assent, associate professor
Christian S. Jensen, professor
Vaida Ceikute, Ph.D. student
Xiaohui Li, visiting Ph.D. student
•
NN, Ph.D. student

GEOCROWD – indoor positioning and services infrastructure
NN, Ph.D. student

GEOCROWD – spatial web objects
NN, Ph.D. student

eData – Anomaly Detection in e-Science
NN, Ph.D. student

Streamspin
NN, Ph.D. student

WallViz
NN, Ph.D. student

REDUCTION
NN, Ph.D. student

REDUCTION
•
•
•
•
•
•
Graduate Course Portfolio: dDO
• Data management for moving objects (Q3)
• The course covers selected research advances in the
general area of indexing and update and query processing
for moving objects.


Moving object tracking
Specific indexing techniques




R-tree based indexing
B-tree based indexing
Techniques for the efficient handling of frequent updates
Techniques for range and k nearest neighbor query processing,
including one-time as well as continuous queries
Graduate Course Portfolio: MDDB
• Multidimensional databases (Q4)

Selected techniques for the management of multidimensionally
represented data

Multidimensional data and applications



Efficient handling: indexing and associated query processing




Data warehouses and data mining
Similarity search and query processing
Multistep similarity search
Indexing multidimensional data
Skyline query processing
Data mining techniques



Subspace clustering
Classification
Outlier detection
Graduate Course Portfolio: Index
• Indexing of disk-based data (Q1)

Indexing techniques for disk-based data for different types of data,
as well as their support for queries and updates







General overview over indexes and query processing
Spatial indexing structures
Space partitioning indexing structures
Indexes for high dimensional data
Metric approaches
Special techniques for complex data types
Coming up for the first time this fall
Graduate Course Portfolio: dDB2
• Database management systems (Q2)
• The course aims to give the participants a solid conceptual
foundation for making competent use of a database
management system.





Logical and physical query optimization and query processing
Concurrency control techniques
Database tuning
Central concepts and techniques in relation to supporting temporal
and multi-dimensional data
Coming up for the first time this fall
Projects
• Streamspin

Enable sites that are for mobile services what YouTube is for video



Easy mobile service creation and sharing
Advanced spatial and social context functionality
Be an open, extensible, and scalable service delivery infrastructure
• MOVE

Knowledge extraction from massive data about moving objects



Cross-cutting activities, showcases, and evaluation
Representation of movement data and spatio-temporal databases
Analysis of movement and spatio-temporal data mining
• WallViz

Collaborative analysis, joint decision making on wall-sized displays



scale to massive data collections
support ad-hoc queries
automatically provide entry points for analysis
8
Projects (2)
• GEOCROWD

Creating a Geospatial Knowledge World:

advance the state-of-the-art in collecting, storing, analyzing,
processing, reconciling, and publishing user-generated geospatial
information on the Web
• REDUCTION

Reducing the environmental footprint of fleets of vehicles



Optimizing the behavior of drivers
Supporting eco-routing of vehicles
Enabling transparency in multi-modal transportation
• eData

Robust analysis in the context of imperfect data in e-Science



Detect and correct anomalies effectively
on-line, interactive, lineage-preserving, and semi-automatic
Scalable algorithms
eData
How We Typically Work
• We target some real problem that we find interesting.
• We define the problem precisely.
• We develop a solution that is typically a data structure or
an algorithm, i.e., a concrete technique.
• To evaluate, we build prototypes.


These are built for the purpose of studying the properties of our
solutions.
We are often interested in performance, e.g., runtime, space
usage, communication cost.
• For some solutions we state formal properties that we then
prove, e.g., the correctness of a particular technique
• Brief: isolate and define problem, construct, then evaluate
Example 1: Spatial Web Querying
• Setting


Google: ~90 billion queries/month, ~20 billion with local intent.
We want to integrate exact locations of websites (for shops, bars,
etc.) and users into web querying.
• Queries


Results must match the query text and must be near the user.
Results of continuous queries must be updated as the user moves.
• Challenges?


Support such queries with low computation cost on the server and
with little communication between server and client.
• Solution


Invent an index that supports both text and location
Use a safe zone to reduce the communication between user and
server for continuous queries
Example 2: Fraud detection
• There are billions of financial transactions per minute
• How do we uncover fraud?



Scalability
In-time for reaction
Manageable results
• Possible solution sketch



Identify attributes of suspicious transactions
Sort incoming transactions into a tree-structure of historic data
When processing time is up, output degree of suspicion based on
similarity to valid or fraudulent historic data
Interested?
• Come talk to us!
• We currently have M.Sc. and PhD. thesis openings