Transcript Slide 1

CDT PROJECTS 2013-14
John Keane, Software Systems Group
[email protected]
1. Data Analytics / Big Data
2. Parallel & Distributed Systems
3. Decision Support Systems
HAPPY TO DISCUSS
Big Data Analytics (IBM funded)
With Nenadic
CHALLENGE
• Investigate:
– Applications: characteristics and predictability
– Data Analytic / Machine Learning Algorithms – relatively
simple so far
– Software: Map-Reduce, Hadoop
– Hardware: various platforms
Bio-medical data analytics
With Nenadic, Zeng, Stivaros (Consultant, RMCH)
• Adverse drug event detection (EU funded)
– Bayesian/Fuzzy association rules algorithms
CHALLENGE
– Compare/contract accuracy of prediction
• Clinical Outcome Mining (Christie Hospital)
– Data/text-based clinical records – better diagnose and predict
CHALLENGE
– Illness staging; multi-modal data; changes over time;
• Decision Support for Radiology (NIHR-funded)
– Decision aid to assist better description of scans
CHALLENGE
– Usability; Integration with existing tools; Link to literature
Itemset Mining Algorithms {baby nappies}->{beer}
• Colossal itemsets:
- Very high dimensional datasets
- Run-time increases exponentially as average row length increases;
•
Minimal unique itemsets (MUI) SUDA: Special Unique Detection
- “risky” records, those likely to be linked– 16 years old + widow
- Records of most concern have many, small MUIs
- SUDA s/w used by ONS, UK; licensed by Singaporean govt;
- Algorithm used by UN/World Bank International Household Survey
CHALLENGES:
•
•
•
•
Data structure to represent itemsets during search process
Search space pruning
Algorithm: bottom-up; top-down; hybrid;
Parallelism
Eco-service composition (EU funded)
with Mehandjiev, MBS
• Aims to determine conditions for achieving eco-friendly,
resilient and optimal service compositions on a
distributed cloud infrastructure
• Two service optimisation approaches deployed:
1. Global: analyses end-to-end interaction between
services
2. Local: computes local optimization by creating
dynamic service chains between service
provider/consumer
CHALLENGE
• Energy-efficient load balance and scheduling
HPC + Finance (EU funded, UK Government)
• High Frequency Trading
– Flash crashes: dramatic sudden drop in share price 
describe/predict
– Working paper: High Frequency Trading and Mini Flash Crashes
http://arxiv.org/abs/1211.6667
• HPCFinance
• New models of risk analysis (diverse data integration)
• Role of HPC in Finance and comparison of technologies
• Trade-off: accuracy, speed, cost comparison: Cloud; GPGPUs,
FPGA (Maxeler box)
CHALLENGES:
Data engineering;
Analytics;
Algorithms;
High performance;
Preference Elicitation from Pairwise Comparison
with Mikhailov, MBS; Siraj, COMSATS IIT, Pakistan
• Decision making is complex in presence of uncertainty and
insufficient knowledge.
• Aim to estimate preference using pairwise comparison: PC
used when unable to assign scores to available options;
judgements provided may be inconsistent
• Work has proposed consistency measures and prioritization
measures where revision not allowed.
• PriEsT tool now has sensitivity analysis -> best solution.
• CHALLENGES
– Evolutionary approach to multi-criteria DSS
– Work on preference elicitation model and tool
– Group decision making
– Bridge PriEsT and R (popular data mining tool) via XMCDA