Developing Reliable Complex Software Systems in a Research
Download
Report
Transcript Developing Reliable Complex Software Systems in a Research
Developing Reliable Complex
Software Systems in a Research
Environment
Christopher Mueller and Andrew
Lumsdaine
Open Systems Laboratory
Indiana University
Research Software Projects
Software is an essential research tool
Many projects use custom software
Data gathering and processing
Simulation
Analysis and visualization
Algorithm/protocol development
Glue for 3rd party applications
Many research areas are developing common
application frameworks
Software is often developed by a combination of
grad students, undergrads, Pis, collaborators, and
consultants using little or no process.
Evolution of an Application
• First application written in Fortran
• “A Model for Baconian Dynamics” *
F77
C
• Tom ports to C
• Adds command line parameters, makefile
• “An Application Framework for Baconian Dynamics”
• Jenny ports to F90
• Extends model
• “An Extended Model of Baconian Dynamics”
F90
• Brad ports to C++
• Models system using objects
• “An Object Oriented System for Dynamical Baconian Systems”
C++
C++
Java
• Jeremy (consultant) rewrites existing versions as C++ version
• Advanced template and object patterns lead to fast and
extensible code that is indecipherable by scientists
• Maria implements model in Java for the Grid
• Implements original model
• “A Scalable, Grid Enabled Toolkit for Baconian Systems”
• Baconian Dynamics predicts summer blockbusters
• Everyone wants a copy of the software
* Baconian Dynamics predicts the success of a movie using models based on the cast’s Bacon Numbers, i.e. how many
degrees of separation are between the actors and a movie staring Kevin Bacon
A Closer Look
F771
F772
C1 C3
C2
C++1
F901
F903 F902
F904
C++
Java
C++2
Version used for paper
Versions that advanced science
There were 6 major versions
13 actual implementations
5 Languages
2 major versions advanced the science
4 major versions were simply software
projects
All versions re-implemented basic features
The implementations used for the papers
were not always used for the next major
version
Research Software Crisis!
Problem: Research software applications are
difficult to develop and are costing researchers
time and money.
Solution: Separate Research and Development
and use a development model derived from
industrial software development.
Software Development
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
Software development is an iterative process
that consists of three main phases:
Business modeling, Application Development,
Maintenance
Business Modeling
Goal: Understand the main roles and procedures used in a research program
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
Identify Roles
Researcher
Support staff
Developer
PI
Identify projects
Identify common workflows
Data processing pipelines
Experimental protocols
Identify commonly used data
Instrument data
Reference data collections
Parameter files
Identify physical resources
Instruments
Reagents
Identify computational resources
Commercial software tools (e.g., Excel, Spotfire,
ChemDraw, etc)
In-house software
Web resources
Requirements and Design
Goal: Understand and agree upon the main features for the application and each iteration
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
Requirements will change as the project evolves
based on user feedback
Initial requirements should include only features
that are needed by users, not features that might
be needed in the future
The design fairly coarse grained, but identify all
the major components
Components that use unfamiliar technology
should be prototyped
Implementation and Testing
Goal: Implement the current iteration’s features
Business Modeling
Requirements
Design
Implement
This is where code is written
Unit tests are fine-grained tests that cover one or
two low level features
As the code is written, it is versioned. This
makes it possible to revert to older versions.
For in-house software, testing is generally
performed by the user and developer
Test
Deploy
Maintain
Short iterations and direct contact between
developers and users facilitate bug fixes
For scientific software, testing must include
validation, that is, confirming that the code
generates correct results
Deployment and Maintenance
Goal: Deliver the application to the users and continue to support it
Business Modeling
Deployment consists of two steps
Staging
Requirements
Design
Deployment
Implement
Test
Maintain
Application is installed on the users machines
After deployment, the development process is
repeated until the application is “complete” and
enters maintenance mode
Deploy
Application is installed in a production ‘sandbox’
Users test application
“Complete” is agreed upon by the developers and
users
No application is ever really complete, which leads
us to…
Maintenance accounts for roughly 60% of
software costs (time and money)
This is good! It means the application is being
used and improved
Software Tools
Business Modeling
Diagram software (Visio, etc), spreadsheets, word processors
Rapid prototyping tools (VB, Python),
Requirements
Design
Implement
Interpreted languages (Java, VB, Python)
Libraries/Components (numerical, plotting, instrument
communication)
Compiled languages (C/C++/Fortran)
Integrated Development Environments (IDEs)
Test
Debuggers
Bug/Feature tracking system
Deploy
Packaging Systems
Automated build system (nightly)
Maintain
Roles
(bold roles are essential)
End User
Anyone who uses the software
Project Manager
Coordinates development efforts, resolves conflicts, ensures project is moving along
Note: This is the hardest job to fill
Lead Developer (Architect, Sr. Software Engineer)
Experienced member of the team, understands technologies and is able to advise other developers
Same responsibilities as developer
Developer
Responsible for all aspects of a portion of the application (requirements, design, implementation,
testing)
Web developer
Similar to a developer, but with a skill set targeted at designing and implementing Web sites and
applications
Database Administrator (DBA)
Maintains and optimizes the database and helps developers design database applications
Technical Writer
Develops tutorials and user manuals
Quality Assurance
On projects that a released to a wide audience, a separate QA team is responsible for testing
System administrator
Responsible for maintaining the computers, software licenses, file systems, and security policies
Keys to Success
Process is necessary but not sufficient
Developer/User Interaction
Neutral management
This keeps developers and users focused on the current problems
Put experienced developers in lead roles
This ensures the application evolves based on user’s needs and that requirements have a
chance to be adjusted
Implement what’s needed, not what might be needed
The project manager’s role is to keep things moving smoothly without getting in the way
Small, incremental deliverables
The more levels of communication required, the higher the chance that requirements will be
mis-communicated
You would never make an undergraduate a lead scientist
Mutual respect
The hierarchy and reward systems for software and science are different.
Scientists should treat developers as colleagues, not as servants
Developers should respect the ideals and institutions of science
Developers should be willing to understand the scientific field they are supporting
Benefits
Software Quality is improved
Applications are not single-user prototypes
Features are available to all researchers
Developers are not distracted by classes, papers, etc
Research Process is improved
Researchers can focus on research
Development is not a bottleneck
Reproducibility and Traceability
Easier to integrate new/visiting researchers
Tools can be shared with a larger community
High-end software becomes possible
Parallel and high-performance implementations
Well designed user interfaces
Visualization
Databases
Data mining
Web applications/services
Reproduce old experiments, trace the data/process that led to a result
Implementing a Software Process
Step 1
Train research staff about basic software processes
Incorporate basic tools into the research environment
Version control
Unit tests/validation
Bug/Feature Tracking
Standard locations for deployed applications and data
Assign development roles to research staff
Make sure to separate “research” work and “development” work
Step 2
Build a full time development staff as the projects grow
Initial staff should include a lead developer and a project manager
Use project manager to coordinate research projects, too
A full time developer also helps track ‘institutional knowledge’ as students come and go
Additional staff can be added on a consulting, part time, or full time basis as needed
Step 3
Get back do doing what you love: science!
Costs and Funding
Good software is not cheap
Personnel Costs
Lead developer: $70-100k
expect $80k to keep a good developer around
Developer: $40-100k, same as above (contract: $30-200/hr)
Project Manager: $70-96k
System administrator: $50-70k
Database administrator: $70-110k
Note that TCO is 1.5-2.5x base salary
Funding
Share resources with collaborators, department
Take advantage of university support services
Systems, HPC, visualization, consulting
Classes! (e.g., Software Carpentry)
Write development costs and infrastructure directly into grants
Look for software infrastructure grants
Lobby!
Conclusions
Developing software is a complex process
Separating research and development can help
improve the quality of research software
Training can help understand and manage
complexity
Existing staff can do this to some extent, but outside
help is needed as projects expand
The funding climate needs to change to fully
support this
Software should be considered essential research
equipment, on par with microscopes, mass
spectrometers, and supercomputers
References
“The Mythical Man-Month”, Frederick P. Brooks, Jr.
“Peopleware: Productive Projects and Teams”, Tom
DeMarco & Timothy Lister
“Software Project Survival Guide”, Steve McConnell
“Facts and Fallacies of Software Engineering”, Robert
L. Glass
Questions?