Developing Reliable Complex Software Systems in a Research

Download Report

Transcript Developing Reliable Complex Software Systems in a Research

Developing Reliable Complex
Software Systems in a Research
Environment
Christopher Mueller and Andrew
Lumsdaine
Open Systems Laboratory
Indiana University
Research Software Projects


Software is an essential research tool
Many projects use custom software







Data gathering and processing
Simulation
Analysis and visualization
Algorithm/protocol development
Glue for 3rd party applications
Many research areas are developing common
application frameworks
Software is often developed by a combination of
grad students, undergrads, Pis, collaborators, and
consultants using little or no process.
Evolution of an Application
• First application written in Fortran
• “A Model for Baconian Dynamics” *
F77
C
• Tom ports to C
• Adds command line parameters, makefile
• “An Application Framework for Baconian Dynamics”
• Jenny ports to F90
• Extends model
• “An Extended Model of Baconian Dynamics”
F90
• Brad ports to C++
• Models system using objects
• “An Object Oriented System for Dynamical Baconian Systems”
C++
C++
Java
• Jeremy (consultant) rewrites existing versions as C++ version
• Advanced template and object patterns lead to fast and
extensible code that is indecipherable by scientists
• Maria implements model in Java for the Grid
• Implements original model
• “A Scalable, Grid Enabled Toolkit for Baconian Systems”
• Baconian Dynamics predicts summer blockbusters
• Everyone wants a copy of the software
* Baconian Dynamics predicts the success of a movie using models based on the cast’s Bacon Numbers, i.e. how many
degrees of separation are between the actors and a movie staring Kevin Bacon
A Closer Look
F771

F772
C1 C3
C2
C++1
F901

F903 F902

F904
C++

Java
C++2
Version used for paper
Versions that advanced science

There were 6 major versions
 13 actual implementations
 5 Languages
2 major versions advanced the science
4 major versions were simply software
projects
All versions re-implemented basic features
The implementations used for the papers
were not always used for the next major
version
Research Software Crisis!
Problem: Research software applications are
difficult to develop and are costing researchers
time and money.
Solution: Separate Research and Development
and use a development model derived from
industrial software development.
Software Development
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
Software development is an iterative process
that consists of three main phases:
Business modeling, Application Development,
Maintenance
Business Modeling
Goal: Understand the main roles and procedures used in a research program

Business Modeling
Requirements
Design


Implement

Test
Deploy
Maintain


Identify Roles
 Researcher
 Support staff
 Developer
 PI
Identify projects
Identify common workflows
 Data processing pipelines
 Experimental protocols
Identify commonly used data
 Instrument data
 Reference data collections
 Parameter files
Identify physical resources
 Instruments
 Reagents
Identify computational resources
 Commercial software tools (e.g., Excel, Spotfire,
ChemDraw, etc)
 In-house software
 Web resources
Requirements and Design
Goal: Understand and agree upon the main features for the application and each iteration
Business Modeling

Requirements

Design
Implement
Test
Deploy
Maintain

Requirements will change as the project evolves
based on user feedback
Initial requirements should include only features
that are needed by users, not features that might
be needed in the future
The design fairly coarse grained, but identify all
the major components

Components that use unfamiliar technology
should be prototyped
Implementation and Testing
Goal: Implement the current iteration’s features
Business Modeling


Requirements

Design

Implement
This is where code is written
Unit tests are fine-grained tests that cover one or
two low level features
As the code is written, it is versioned. This
makes it possible to revert to older versions.
For in-house software, testing is generally
performed by the user and developer

Test

Deploy
Maintain
Short iterations and direct contact between
developers and users facilitate bug fixes
For scientific software, testing must include
validation, that is, confirming that the code
generates correct results
Deployment and Maintenance
Goal: Deliver the application to the users and continue to support it
Business Modeling

Deployment consists of two steps

Staging

Requirements


Design
Deployment

Implement

Test
Maintain


Application is installed on the users machines
After deployment, the development process is
repeated until the application is “complete” and
enters maintenance mode

Deploy
Application is installed in a production ‘sandbox’
Users test application
“Complete” is agreed upon by the developers and
users
No application is ever really complete, which leads
us to…
Maintenance accounts for roughly 60% of
software costs (time and money)

This is good! It means the application is being
used and improved
Software Tools
Business Modeling
Diagram software (Visio, etc), spreadsheets, word processors
Rapid prototyping tools (VB, Python),
Requirements
Design
Implement
Interpreted languages (Java, VB, Python)
Libraries/Components (numerical, plotting, instrument
communication)
Compiled languages (C/C++/Fortran)
Integrated Development Environments (IDEs)
Test
Debuggers
Bug/Feature tracking system
Deploy
Packaging Systems
Automated build system (nightly)
Maintain
Roles









(bold roles are essential)
End User
 Anyone who uses the software
Project Manager
 Coordinates development efforts, resolves conflicts, ensures project is moving along
 Note: This is the hardest job to fill
Lead Developer (Architect, Sr. Software Engineer)
 Experienced member of the team, understands technologies and is able to advise other developers
 Same responsibilities as developer
Developer
 Responsible for all aspects of a portion of the application (requirements, design, implementation,
testing)
Web developer
 Similar to a developer, but with a skill set targeted at designing and implementing Web sites and
applications
Database Administrator (DBA)
 Maintains and optimizes the database and helps developers design database applications
Technical Writer
 Develops tutorials and user manuals
Quality Assurance
 On projects that a released to a wide audience, a separate QA team is responsible for testing
System administrator
 Responsible for maintaining the computers, software licenses, file systems, and security policies
Keys to Success


Process is necessary but not sufficient
Developer/User Interaction


Neutral management


This keeps developers and users focused on the current problems
Put experienced developers in lead roles


This ensures the application evolves based on user’s needs and that requirements have a
chance to be adjusted
Implement what’s needed, not what might be needed


The project manager’s role is to keep things moving smoothly without getting in the way
Small, incremental deliverables


The more levels of communication required, the higher the chance that requirements will be
mis-communicated
You would never make an undergraduate a lead scientist
Mutual respect




The hierarchy and reward systems for software and science are different.
Scientists should treat developers as colleagues, not as servants
Developers should respect the ideals and institutions of science
Developers should be willing to understand the scientific field they are supporting
Benefits


Software Quality is improved
 Applications are not single-user prototypes
 Features are available to all researchers
 Developers are not distracted by classes, papers, etc
Research Process is improved
 Researchers can focus on research
 Development is not a bottleneck
 Reproducibility and Traceability

Easier to integrate new/visiting researchers
 Tools can be shared with a larger community
High-end software becomes possible
 Parallel and high-performance implementations
 Well designed user interfaces
 Visualization
 Databases
 Data mining
 Web applications/services


Reproduce old experiments, trace the data/process that led to a result
Implementing a Software Process


Step 1
 Train research staff about basic software processes
 Incorporate basic tools into the research environment
 Version control
 Unit tests/validation
 Bug/Feature Tracking
 Standard locations for deployed applications and data
 Assign development roles to research staff
 Make sure to separate “research” work and “development” work
Step 2
 Build a full time development staff as the projects grow
 Initial staff should include a lead developer and a project manager




Use project manager to coordinate research projects, too
A full time developer also helps track ‘institutional knowledge’ as students come and go
Additional staff can be added on a consulting, part time, or full time basis as needed
Step 3

Get back do doing what you love: science!
Costs and Funding



Good software is not cheap
Personnel Costs
 Lead developer: $70-100k
 expect $80k to keep a good developer around
 Developer: $40-100k, same as above (contract: $30-200/hr)
 Project Manager: $70-96k
 System administrator: $50-70k
 Database administrator: $70-110k
 Note that TCO is 1.5-2.5x base salary
Funding
 Share resources with collaborators, department
 Take advantage of university support services
 Systems, HPC, visualization, consulting
 Classes! (e.g., Software Carpentry)
 Write development costs and infrastructure directly into grants
 Look for software infrastructure grants
 Lobby!
Conclusions

Developing software is a complex process


Separating research and development can help
improve the quality of research software


Training can help understand and manage
complexity
Existing staff can do this to some extent, but outside
help is needed as projects expand
The funding climate needs to change to fully
support this

Software should be considered essential research
equipment, on par with microscopes, mass
spectrometers, and supercomputers
References




“The Mythical Man-Month”, Frederick P. Brooks, Jr.
“Peopleware: Productive Projects and Teams”, Tom
DeMarco & Timothy Lister
“Software Project Survival Guide”, Steve McConnell
“Facts and Fallacies of Software Engineering”, Robert
L. Glass
Questions?