CERCS - Creating System Solutions for Future Technologies

Download Report

Transcript CERCS - Creating System Solutions for Future Technologies

Center for Experimental Research in
Computer Systems
Spring 2007 IAB Meeting
Karsten Schwan, Calton Pu, Douglas Blough,
Sudhakar Yalamanchili
IUCRCERCS NSF Industry University Co-operative Research Center
Mission
Lead the innovation of new information and computing technologies, to construct the
interactive information systems of the future, and to create the intellectual capital that
can advance these technologies and fuel future advances.
Enterprise
Embedded
Scientific
.Grid
Remote access to
Information System
Information anytime, anywhere 
Timeliness!
Quality!
Security!
Robustness!
Strategic Thrusts - Highlights
– Scientific/Technical Computing -- Dynamic Data Management and
• GT: IHPCL Laboratory (e.g., new cluster machines, including substantial Intel
donations for education and for multicore computing initiative)
• DOE:ORNL, Sandia: High Performance I/O initiative; involvement with startups
• Cisco (MPI and IB QoS); Dell; HP/Intel (Gelato, Itanium donations); IBM (VMM power
management, Cell SUR grant), RNet communication processor design
• News: Multicore Focus (HP, IBM, Intel); Ongoing I/O Initiative, Virtualization in HPC
(ONRL, Sandia, UNM)
Strategic Thrusts - Highlights
– Enterprise Computing -- Autonomic/Adaptive and Service-Oriented
Systems:
• IBM, Intel, TCS (autonomic and critical enterprise systems, dynamic content
distribution/event-based systems, SOA, virtualization – hypervisor scaleout, I/O
virtualization, trusted passages, metering, power management, failure diagnosis
and fault containment)
• HP (deployment and management, system monitoring, risk-based control, stream
data mining)
• Worldspan (runtime behavior detection and QoI, virtualization)
• LogicBlox (Dynamic code generation for efficient data access)
• Delta, Raytheon (policy/performancerobustness, runtime behavior modeling)
• Cisco, Intel (network data services, heterogeneous multicore, IB network
virtualization)
• News/Outreach: IBM SUR (joint with Ohio State), OSU industry partners, OSU
IAB meeting, exploring new links: Benchmark, Earthlink, McKesson, NSF CRI
High Performance Computing
Real-time Decision
Tools
Optimization
Cluster Computing
Real-Time
Information Processing
Real-Time Information Transport
capture, display, transport,
filter, transform
Scalable Robust
Services
FAA
Flight
Data
Gate
Readers
Wide-area
Transport
Operational
Flight Displays
Passenger
paging and
response
Airport
LAN
Visualization
Storage
Databases
Airport
LAN
Real-time
Situation
Assessment
Baggage
Displays
Crew and
Equipment
Status
Baggage Status
Strategic Thrusts - Highlights
– Embedded Systems/Architecture
• Boeing (testing, software correctness)
• Intel (in-vehicle computing and lightweight methods for system
virtualization, system-level power management, computer architecture)
• Motorola (middleware for pervasive and mobile applications)
• Federal: pervasive applications (transportation, robotics), upcoming
Cyberphysical Systems program
• IBM, Intel (network processors and heterogeneous multicore)
• Sony (gaming applications)
• News: Korea program in Embedded Systems, Samsung educational
program, Robotics Center liason, Sensor (OSU) and MobiEMU
testbeds
throughput,
response times
image quality,
end-to-end delay,
jitter, loss rate
CERCS Personnel
• Faculty
– Mustaque Ahamad, Mostafa Ammar, Doug Blough, Constantinos
Dovrolis, Greg Eisenhauer, Richard Fujimoto, Ada Gavrilovska,
Alexander Gray, Mary Jean Harrold, Hsien-Hsin Lee, Wenke Lee,
Ling Liu, Gabriel Loh, Pete Manolios, Alex Orso, Henry Owen,
Santosh Pande, Milos Prvulovic, Calton Pu, Kishore
Ramachandran, Jay Ramanathan (Ohio State), Rajiv Ramnath
(Ohio State), George Riley, David Schimmel, Karsten Schwan, Olin
Shivers, Matthew Wolf, Hongyan Zha, Sudhakar Yalamanchili, Ellen
Zegura
• Research Staff
– Steve Ferenci, David Hilley
– Supported by DARPA, DOE, NSF, (CoC), (ECE)
• Associated Faculty/Researchers
– David Bader, Tucker Balch (Robotics), Patrick Bridges (UNM),
Robert Butera, Steve DeWeerth, Irfan Essa, Phil Hutto, Byron Jeff
(Clayton State), Scott Klasky (ORNL), Kang Li, Sung Kyu Lim,
Arthur Maccabe (UNM), Vincent Mooney, Jeff Nichols (ORNL),
Krishna Palem, Kalyan Perumalla (ORNL), Jeff Vetter (ORNL),
Patrick Widener (UNM)
Industrial Relations
• IUCR CERCS Center
– Contributors (GT): Boeing, Cisco, Delta, DOE, HP, IBM, Intel,
LogicBlox, TCS, Worldspan
– Industry Workshops and Industrial Advisory Board
• Joint initiatives - e.g., TIE grant with UFL, expansion to Ohio
State (joint curriculum/facility efforts), planned expansion to
UNM
• Internship Program
– Amazon, ATT, CISCO, Delta, (DoCoMo), DOE, Google, HP, IBM,
Intel, Microsoft, Motorola, NetApp, Radisys, TCS, VMWare,
Worldspan
• Evolving relationships:
– ATT, DoCoMo, Microsoft, Motorola, NetApp, Netronome, Raytheon,
RNet, VMWare, Xilinx
Overview - Current Industry Engagements
• One slide per ongoing project
• Federal projects (e.g., joint work with DOE ORNL)
elided, so, few HPC efforts described
• Same order: HPC, Enterprise, Embedded
HPC: Cisco - Infiniband-based Research
Gavrilovska, Schwan, Wolf
•
Mechanisms for delivering end-to-end QoS levels in challenging
settings:
– Multi-core nature of future HPC nodes
– I/O limitations in high-performance infrastructures
– End-to-end virtualized environments
•
Two main efforts under current investigations:
– Data virtualization: ‘Datatap’ mechanism on top of low-level IB verb
interface to (1) extract data from IB infrastructure, (2) middleware
mechanisms to support dynamic extensions for service-oriented
applications, (3) dynamic, resource-aware routing and data distribution to
meet application QoS requirements.
– Platform virtualization: (1) integrate x86-based virtualization solutions into
Infiniband settings, (2) develop mechanisms for end-to-end QoS for VM-toVM interactions by improved and dynamic resource management and
scheduling mechanisms.
Additional Effort :RNET – high end NIC for science applications
Enterprise: Elba Project – HP Labs
Calton Pu
•
•
•
Apply code generation techniques to
automate large system deployment,
measurement, evaluation, and
management
Collaborative work (1 faculty-Pu, 1
industry-Sahai, 6 PhD stud., 4 MS
stud., 1 undergrad.)
8 published papers in 2 years,
several more in the pipeline
(5)
Reconfiguration
(1)
Design
(4) Evaluation
& Analysis
Automated
System Mgmt
(2) Code
Generation
(3)
Deployment
Current work: (step 4 above)
Evaluation of 3-tier benchmark
(RUBiS) using generated scripts
(millions of lines of deployment,
measurement, and analysis scripts)
Enterprise:
Robust Delivery of Quality Data - Worldspan
Karsten Schwan, Mohamed Mansour, Jay Lofstead
Problem:
Complex GDS with potentially unanticipated behaviors
Example:
Variable search times due to caching effects
Solution:
Runtime behavior detection, model construction, and mitigation
Specific approach:
Mitigation via request reordering
Client 1
Client 2
Message queue
Clearinghouse
(CW)
Client 3
GDS
Airlines
Enterprise:
Runtime Behavior Diagnosis – Delta Air Lines
Sandip Agarwala, Mohamed Mansour, Karsten Schwan
• Investigation of multiple enterprise architectures
– Revenue Pipeline, delta.com, DNS
• Path detection in complex systems
• Autonomic workflows
• Monitoring and management in SOA systems (proposed
work)
Enterprise:
Collaboration with IBM Research
Ling Liu
• Distributed Systems and Software
– Dynamic Content Dissemination: Architectures and
Optimizations
– Collaborators: Arun Iyengar, Fred Douglis, Isabelle Rouvellou
• Event Streams and Security
– Sensor Stream Processing and Optimization (e.g., load
shedding, load balancing, motion adaptive indexing)
– Event Stream Mining
– Collaborators: Philip Yu, Bugra Gedik, Rong Chang
• Service Oriented Computing
– Secure publish-subscribe systems, Secure Event Dissemination
– Collaborators: Arun Iyengar, Liang Jie Zhang
13
Enterprise/HPC:
High Productivity Computing
Sudha
Yalamanchili
with
LogicBlox Inc.
Memory
Inputs Outputs InputsOutputs
Application
Kernel
Run-time
CPU
Local
Cache Memory
CPU
Local
Cache Memory
Kernel
CPU
Local
Cache Memory
ACC
ACC
FIFO
DMA
ACC
FIFO
DMA
FIFO
DMA
Network (e.g., Hypertransport)
• Stream computing programming model
– Kernels expressed in a declarative programming language
– Custom hardware for accelerating data intensive kernels
– Explicit interaction model: non-coherent shared memory
• Focus on applications such as retail forecasting and
data analytics
Enterprise/HPC:
Databus: Runtime Rule Generation - LogicBlox
Rules
apply
BinOp
apply
BinOp
apply
Lookup
apply
Iterator
FactBus Variables
bool
Test
double
•
•
apply
int
•
Write
int
•
apply
double
•
•
Fine grain retail data analysis (whatif calculations)
Rule-based declarative language
Compile down to interpreted
“FactBus”,
Rule objects and variable objects.
Apply rules top to bottom, fallback on
failure.
Use DCG for “Just-In Time”
compilation
Initial results, speedups of 3.
Additional improvements anticipated.
String
•
DCG subset
Greg Eisenhauer
DB
Enterprise/HPC:
Scalable Hypervisors - Intel
Karsten
Schwan
Enterprise:
Power Management in Virtualized Systems
Ripal Nathuji
Karsten Schwan
IBM + Intel
Coordinate virtualized system
management:
• Enable VM management
independence
• Decouple virtual and physical
resources for management
• Introduce “soft” scaling for
flexible management
Heterogeneity-aware
Allocation Policy
Leverage heterogeneity in:
• Performance capabilities
• Power efficiency of resources
• Power management support
VM 1
Dom0
VM 2
Dom0
Application
VPM Mechanisms
OS
VPM Channel
VMM
Platform HW
VPM Mechanisms
PM
Policy
PM
Policy
VM 3
Application
Application
OS
OS
VPM Channel
VMM
Platform HW
Enterprise:
Trusted Passages on Virtualized Platforms – Intel/NSF
Mustaq Ahamad
Greg Eisenhauer
Wenke Lee
Karsten Schwan
Overlay node1
Service
VM
Overlay node2
Guest
VM1
Guest
VM2
App.
Trust
Controller
Service
VM
Guest
VM1
Guest
VM2
App.
Trust
Controller
Host1
Host2
BE
FE
FE
BE
Hypervisor
networ
k
NIC
FE
FE
Hypervisor
networ
k
NIC
networ
k
Trusted
passage
Run trusted services across untrusted platforms:
• Trust models and trust controller mechanisms for evolving node trust
• Virtual Machines Monitoring and Introspection to support trust controllers
• Data Interception and Redirection as Remedial Measures
Embedded/Enterprise:
Aristotle Research Group (Mary Jean Harrold)
Testing Evolving Software (TCS) (with
Alex Orso)
Problem
Changes
•
require rapid modification and testing for
quick release
•
causing released software to have many
defects
Research Question
How can we test well (to gain
confidence in changes before
release of changed software)
MaTRIX
Computes conditions test
cases must satisfy to test
changes well
Fault Propagation for Safety (Boeing)
Problem
Critical avionics systems
•
now use integrated modular avionics
•
making fault analysis for the entire system
difficult
Research Question
How can we perform fault analysis at
the system-model level and make this
information accessible to developers?
FauPA
Propagates injected faults
forward to determine impact;
Traces faulty components
backward to find root cause
Embedded:
Correct Software Assemblies - Boeing