Transcript Slide 1
CDT PROJECTS 2013-14 John Keane, Software Systems Group [email protected] 1. Data Analytics / Big Data 2. Parallel & Distributed Systems 3. Decision Support Systems HAPPY TO DISCUSS Big Data Analytics (IBM funded) With Nenadic CHALLENGE • Investigate: – Applications: characteristics and predictability – Data Analytic / Machine Learning Algorithms – relatively simple so far – Software: Map-Reduce, Hadoop – Hardware: various platforms Bio-medical data analytics With Nenadic, Zeng, Stivaros (Consultant, RMCH) • Adverse drug event detection (EU funded) – Bayesian/Fuzzy association rules algorithms CHALLENGE – Compare/contract accuracy of prediction • Clinical Outcome Mining (Christie Hospital) – Data/text-based clinical records – better diagnose and predict CHALLENGE – Illness staging; multi-modal data; changes over time; • Decision Support for Radiology (NIHR-funded) – Decision aid to assist better description of scans CHALLENGE – Usability; Integration with existing tools; Link to literature Itemset Mining Algorithms {baby nappies}->{beer} • Colossal itemsets: - Very high dimensional datasets - Run-time increases exponentially as average row length increases; • Minimal unique itemsets (MUI) SUDA: Special Unique Detection - “risky” records, those likely to be linked– 16 years old + widow - Records of most concern have many, small MUIs - SUDA s/w used by ONS, UK; licensed by Singaporean govt; - Algorithm used by UN/World Bank International Household Survey CHALLENGES: • • • • Data structure to represent itemsets during search process Search space pruning Algorithm: bottom-up; top-down; hybrid; Parallelism Eco-service composition (EU funded) with Mehandjiev, MBS • Aims to determine conditions for achieving eco-friendly, resilient and optimal service compositions on a distributed cloud infrastructure • Two service optimisation approaches deployed: 1. Global: analyses end-to-end interaction between services 2. Local: computes local optimization by creating dynamic service chains between service provider/consumer CHALLENGE • Energy-efficient load balance and scheduling HPC + Finance (EU funded, UK Government) • High Frequency Trading – Flash crashes: dramatic sudden drop in share price describe/predict – Working paper: High Frequency Trading and Mini Flash Crashes http://arxiv.org/abs/1211.6667 • HPCFinance • New models of risk analysis (diverse data integration) • Role of HPC in Finance and comparison of technologies • Trade-off: accuracy, speed, cost comparison: Cloud; GPGPUs, FPGA (Maxeler box) CHALLENGES: Data engineering; Analytics; Algorithms; High performance; Preference Elicitation from Pairwise Comparison with Mikhailov, MBS; Siraj, COMSATS IIT, Pakistan • Decision making is complex in presence of uncertainty and insufficient knowledge. • Aim to estimate preference using pairwise comparison: PC used when unable to assign scores to available options; judgements provided may be inconsistent • Work has proposed consistency measures and prioritization measures where revision not allowed. • PriEsT tool now has sensitivity analysis -> best solution. • CHALLENGES – Evolutionary approach to multi-criteria DSS – Work on preference elicitation model and tool – Group decision making – Bridge PriEsT and R (popular data mining tool) via XMCDA