Transcript EC26_Classic_Disasters - Software Engineering II
University of Southern California
Center for Systems and Software Engineering
Software Classic Disasters
CS 577b Software Engineering II Supannika Koolmanojwong April 4, 2011
University of Southern California
Center for Systems and Software Engineering
Outline
• • •
IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 2
University of Southern California
Center for Systems and Software Engineering
IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices
R. Ryan Nelson , MIS Quarterly Executive Vol. 6 No. 2 / June 2007 • •
Retrospectives by project postmortems or post implementation reviews
99 retrospectives conducted in 74 organizations over the past 7 years
•
“Insanity: doing the same thing over and over again and expecting different results.” — Albert Einstein
© 2011 USC-CSSE 04/04/2011 3
University of Southern California
Center for Systems and Software Engineering
10 of the most infamous IT project failures
• • • •
Large magnitude Over $100 million One-half come from the public sector
– –
wasted taxpayer dollars lost services the other half - the private sector
– – –
billions of dollars in added costs lost revenues lost jobs.
© 2011 USC-CSSE 04/04/2011 4
University of Southern California
Center for Systems and Software Engineering
1. Internal Revenue Service (IRS)1999
• • • • •
PROJECT:
–
Business Systems Modernization;
–
Launched in 1999 to upgrade the agency’s IT infrastructure and more than 100 business applications $8 billion modernization project , team of vendors a complex project overwhelms the management capabilities of both vendor and client. the most expensive systems development “fiasco” in history, with delays costing the U.S. Treasury tens of billions of dollars per year. ability to collect revenue, conduct audits, and go after tax evaders was severely compromised © 2011 USC-CSSE 04/04/2011 5
University of Southern California
Center for Systems and Software Engineering
2. Federal Aviation Administration, 1996
•
PROJECT: Advanced Automation System (AAS); FAA’s effort to modernize the nation’s air traffic control system.
• •
Estimated to cost $2.5 billion ( $1.5 billion is wasted) Numerous delays and cost overruns, which were blamed on both the FAA and the primary contractor, IBM.
• Technical complexity of the effort, bad resource estimation, ineffectively requirements control • "For example, they wanted the system to have only
3 seconds
of downtime a year. But to get the data to prove that requirement had been met would have taken about 10 years” (later on change to 5 minutes downtime) • Instead of admitting the problem, IBM turned AAS into a research project • The project collapsed
04/04/2011 © 2011 USC-CSSE 6
University of Southern California
Center for Systems and Software Engineering
3. Federal Bureau of Investigation, 20004
•
PROJECT: “Trilogy;” Four-year, $500M overhaul of the FBI’s antiquated computer system.
•
Ill-defined requirements, changed dramatically after 9/11 (agency mission switched from criminal to intelligence focus)
• $170 million project was abandoned altogether • 400 problems with early versions of the troubled software, but never told the contractor • The bureau went ahead with a $17 million testing program even the software would have to be scrapped
04/04/2011 © 2011 USC-CSSE 7
University of Southern California
Center for Systems and Software Engineering
4. McDonalds, 2001
•
PROJECT: “Innovate;” Digital network for creating a real-time enterprise
• •
planned to spend $1 billion over five years Objective:
to better serve customers by using information and communications technologies to monitor the quality of products and services •
Executives in company headquarters would have been able to see how soda dispensers and frying machines in every store were performing, at any moment.
• • Would need $1billion for infrastructure, and $zillions to maintain and upgrade
After two years and $170M, the fast food giant threw in the towel. 04/04/2011 © 2011 USC-CSSE 8
University of Southern California
Center for Systems and Software Engineering
5. Denver International Airport 1994
• • •
PROJECT: Baggage-handling system.
It took 10 years and at least $600 million to figure out big muscles, not computers, can best move baggage The baggage system, designed and built by BAE Automated Systems Inc., launched, chewed up, and spit out bags so often that it became known as the “baggage system from hell.” © 2011 USC-CSSE 04/04/2011 9
University of Southern California
Center for Systems and Software Engineering
6. AMR Corp., Budget Rent A Car Corp., Hilton Hotels Corp., Marriott International Inc, 1992
•
PROJECT: “Confirm;” Reservation system for hotel and rental car bookings
•
After four years and $125 million in development, when it became clear that Confirm would miss its deadline by as much as two years.
•
Was supposed to be a leading edge comprehensive travel industry reservation program combining airline, rental car and hotel information
•
Major problems surfaced when Hilton tested the system, then 18 months delay and the problems could not be resolved 04/04/2011 © 2011 USC-CSSE 10
University of Southern California
Center for Systems and Software Engineering
7. Bank of America, 1988
• •
PROJECT: “MasterNet;” Trust accounting system.
hardware problems caused the Bank of America (BofA) to lose control of several billion dollars of trust accounts.
•
All the money was eventually found in the system, but all 255 people in the entire Trust Department were fired, as all the depositors withdrew their money.
•
This is a classic case study on the need for risk assessment, including people, process, and technology-related risk.
•
BofA spent $60M to fix the $20M project before deciding to abandon it altogether. BofA fell from being the largest bank in the world to No. 29
•
CRACK stakeholders problems, bad modular design, focusing in competing with competitors-but ready for transition 04/04/2011 © 2011 USC-CSSE 11
University of Southern California
Center for Systems and Software Engineering
8. Kmart, 2000
• • •
PROJECT: IT systems modernization $1.4 billion IT modernization effort aimed at linking its sales, marketing, supply, and logistics systems.
•
18 months later, cash-strapped Kmart cut back on modernization, writing off the $130 million it had already invested in IT.
•
Four months later, it declared bankruptcy
• Failing to allocate enough money and manpower to not clearly establishing the IT project's relationship to the organization's business
04/04/2011 © 2011 USC-CSSE 12
University of Southern California
Center for Systems and Software Engineering
9. London Stock Exchange, 1993
• • •
PROJECT: “Taurus;” Paperless share settlement system.
£800 million, original budget £6 million Abandoned after 10 years of development
• By Vista Concepts, US, for database management. Although being very good for on-line real time processing, it could not handle distributed data processing or batch processing •
LSE tried to modify Vista by rewriting almost 60% of it, hence hidden bugs and long delays
• Grew from
a settlement only system
, to become a full
“share registration and transfer system”.
© 2011 USC-CSSE 04/04/2011 13
University of Southern California
Center for Systems and Software Engineering
10. Nike, 2000
• • • •
PROJECT: Integrated enterprise software $400 million installing ERP, CRM, and SCM —the full complement of analyst blessed integrated enterprise software. Caused major inventory glitch, over produced some shoe models and under produced others profits drop by $100 million © 2011 USC-CSSE 04/04/2011 14
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes
• • • • • •
Behind schedule Add more people Want to speed up development Cut testing A new version of OS becomes available during the project, Time for an upgrade! Key contributors aggravating the rest of the team? Wait until the end of the project to fire him!
© 2011 USC-CSSE 04/04/2011 15
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: People
• • • •
Undermined motivation
–
productivity and quality Individual capabilities of the team members or the working relationships Failure to take action to deal with a problem employee Adding people to a late project
–
pouring gasoline on a fire © 2011 USC-CSSE 04/04/2011 16
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: Process
• • • •
BDUF – Big Design Up Front Underestimate, overly optimistic schedules, under scoping it, undermining effective planning, and shortchanging requirements determination and/or quality assurance
–
Poor estimation also puts excessive pressure on team members, leading to lower morale and productivity.
Insufficient risk management contractor failure - outsourcing and offshoring © 2011 USC-CSSE 04/04/2011 17
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: Product
• • • • • • • •
FAA’s modernization effort, where the goal was 99.99999% reliability, which is referred to as “the seven nines.” Requirements gold-plating Feature creep
–
average project experiences about a +25% change in requirements over its lifetime.
Developer gold-plating - new technology that are required in the product.
Research-oriented development Silver-bullet syndrome Overestimated savings from new tools or methods Switching tools in the middle of a project © 2011 USC-CSSE 04/04/2011 18
University of Southern California
Center for Systems and Software Engineering
A Meta-Retrospective of 99 IT Projects
• • • •
process mistakes (45%), people mistakes (43%) product mistakes (8%) or technology mistakes (4%).
–
project managers should be experts in managing processes and people.
Scope creep didn’t make the top ten mistakes
–
As long as project manager pays attention to it Contractor failure has been climbing in frequency in recent years If the project managers had focused their attention on better estimation and scheduling, stakeholder management, and risk management , they could have significantly improved the success of the majority of the projects studied.
© 2011 USC-CSSE 04/04/2011 19
University of Southern California
Center for Systems and Software Engineering
Avoid classic mistakes through best practices
1. Avoiding Poor Estimating and/or Scheduling
–
Cost overrun, 1994-180%, 2003-43%,
– –
Schedule overrun, 2000- 63%, 2007-82%.
• • •
cone of uncertainty by multiplying the “most likely” single-point estimate by the optimistic factor lower bounds - optimistic estimate upper bounds - pessimistic estimate.
–
Capital One
• • • •
100% cushion - beginning of the feasibility phase 75% cushion in the definition phase 50% cushion in design 25% cushion at the beginning of construction © 2011 USC-CSSE 04/04/2011 20
University of Southern California
Center for Systems and Software Engineering
Avoiding Poor Estimating and/or Scheduling
•
Valuable approaches to improving project estimation and scheduling
–
Timebox development
•
shorter, smaller projects are easier to estimate,
–
creating a work breakdown structure
•
to help size and scope projects
–
retrospectives
•
to capture actual size, effort and time data for use in making future project estimates
–
a project management office to maintain a repository of project data over time.
© 2011 USC-CSSE 04/04/2011 21
University of Southern California
Center for Systems and Software Engineering
Avoiding Ineffective Stakeholder Management
• •
ineffective stakeholder management is the second biggest cause of project failure Have to know
– – – –
who has influence over others who has direct control of resources stakeholder level of interest stakeholder degree of support/resistance © 2011 USC-CSSE 04/04/2011 22
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Risk Management
• •
risk identification, analysis, prioritization, risk-management planning, resolution, and monitoring.
Methods/ tools
– – – –
a prioritized risk assessment table a top-10 risks list, interim retrospectives appointing a risk officer © 2011 USC-CSSE 04/04/2011 23
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Planning
•
Ensure the followings
– – – –
Clear roles and responsibilities Resource allocation Schedule / timeline Follow project policies, plans, and procedures © 2011 USC-CSSE 04/04/2011 24
University of Southern California
Center for Systems and Software Engineering
Avoiding Shortchanging Quality Assurance
• • •
When a project falls behind schedule, the first two areas that often get cut are testing and training .
Cut corners by eliminating test planning, eliminating design and code reviews, and performing only minimal testing Suggestions:
–
agile development, joint application design sessions, automated testing tools, and daily build-and-smoke tests.
© 2011 USC-CSSE 04/04/2011 25
University of Southern California
Center for Systems and Software Engineering
Avoiding Weak Personnel and/or Team Issues
• •
get the right people assigned to the project from the beginning Between 1999 and 2006, the retrospectives reported an increasing number of problems with distributed, inter-organizational, and multi-national teams.
–
reduction in face-to-face team meetings, time zone barriers, and language and cultural issues © 2011 USC-CSSE 04/04/2011 26
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Project Sponsorship
• •
Not only getting top management support, but identifying the
right sponsor
From the beginning !!!
04/04/2011 © 2011 USC-CSSE 27
University of Southern California
Center for Systems and Software Engineering 04/04/2011 © 2011 USC-CSSE 28
University of Southern California
Center for Systems and Software Engineering
Outline
• • •
IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 29
University of Southern California
Center for Systems and Software Engineering
Hurricane Katrina
04/04/2011 © 2011 USC-CSSE 30
University of Southern California
Center for Systems and Software Engineering
Recovering IT in a Disaster: Lessons from Hurricane Katrina
Iris Junglas, Blake Ives, MIS Quarterly Executive Vol. 6 No. 1 / Mar 2007
• • • • •
August 29, 2005 - Hurricane Katrina destroyed a data center and communications infrastructure at the Pascagoula and Gulfport, Mississippi, operations of the Ship Systems sector of Northrop Grumman Corporation Also put a second data center out of commission in a shipyard near New Orleans 20,000 employees in Ship Construction Caused over US$1 billion in damage for the company Brought two of the nation’s largest shipyards to a standstill © 2011 USC-CSSE 04/04/2011 31
University of Southern California
Center for Systems and Software Engineering
Recovering IT in a Disaster
• • •
How to adapt when the business continuity plan; inadequate public infrastructure Reexamine our processes for preparing disaster plans Processes for assessing preparedness and response after a disaster or a near-disaster.
© 2011 USC-CSSE 04/04/2011 32
University of Southern California
Center for Systems and Software Engineering
Northrop Grumman Corporation
•
Products : electronics, aerospace, and shipbuilding
•
Customers: government and commercial customers worldwide
•
Major business:
–
Ship construction - large military vessels
– – – –
04/04/2011 Revenue: US$5.7 billion in 2005 Customers: DoD and Navy 12,900 employees at Mississippi; 7,100 employees at the New Orleans © 2011 USC-CSSE 33
University of Southern California
Center for Systems and Software Engineering
Preparation for Hurricane
• •
Hurricane is nothing new to ship industry
– –
September 04 – Hurricane Ivan July 05 - Hurricane Dennis A bigger one is heading in
–
August 05
•
11 people dead, over US$1billion in damage in Florida © 2011 USC-CSSE 04/04/2011 34
University of Southern California
Center for Systems and Software Engineering
Preparation for Hurricane
• • • • •
Data
–
Data backups were sent to Iron Mountain (information management services)
–
Double back up in Dallas Servers
– –
power off wrapped in plastic New backup generator – in secure location Only one extranet alive (crucial the Navy and DoD) Human
–
Left the area 04/04/2011 © 2011 USC-CSSE 35
University of Southern California
Center for Systems and Software Engineering
The storm smashed
• • • •
NGC facilities are on the storm’s path Communication failed Extensive damage to shipyard and nearby communities Emergency command center – at Dallas office – newly assembled emergency team is formed © 2011 USC-CSSE 04/04/2011 36
University of Southern California
Center for Systems and Software Engineering
• • • •
Damages
Collect digital images of damages At Mississippi, lost
–
1,500 PC, 200 servers, 300 printers, 600 data input devices, and hundreds of two-way radios.
– –
communications closets, routers, switches, fiber and copper cables and wires. LAN / WAN / MAN – no longer worked At New Orleans
– –
Infrastructures are there AC systems are not working, hence servers are automatic shutdown A week after the storm, communication lines are down again due to cars are driving over them © 2011 USC-CSSE 04/04/2011 37
University of Southern California
Center for Systems and Software Engineering
First thing first
•
Not about restoring computer systems, but restoring human resources
•
But most of the 20,000 employees were out of contact
•
Tools
–
Press releases
–
Corporate web site (67,000 hits in the weeks after the storm )
–
Toll-free call in number
•
Payroll through Wal-Mart and Western Union © 2011 USC-CSSE 04/04/2011 38
University of Southern California
Center for Systems and Software Engineering
Restoring IT infrastructure
• • • • •
Electronic communication – nonexistent due to public communication infrastructure Communication through Black Berry can be used intermittently Two-way radios, walkie-talkies Key members using satellite phones
–
Required line-of-sight access to satellites Later on, use wireless communication © 2011 USC-CSSE 04/04/2011 39
University of Southern California
Center for Systems and Software Engineering
Building new data center
• • •
Hardware acquisition Incompatibilities between software and new hardware environment Inaccessible or difficult to find system documentation, e.g. license keys, server names, addressing schemes, login IDs © 2011 USC-CSSE 04/04/2011 40
University of Southern California
Center for Systems and Software Engineering
Restoring data and applications
• • • •
Some firms found that their back up data is partially unreadable For NGC, 2 backups : iron mountain and Dallas Lost some data on desktops or local machines Two weeks after Katrina – had a new data center; essential systems are up and running © 2011 USC-CSSE 04/04/2011 41
University of Southern California
Center for Systems and Software Engineering
Disaster preparedness
• • •
Common mistake : prepare for disasters specific to their domain
– – –
financial institutions prepare for IT failures, hospitals for pandemics airliners for technical failures and sabotages. An alternative approach : consider a broader spectrum of disaster types, such as the generic disaster
–
economic, information, physical, human resource, reputation, psychopathic, and natural disasters Identify common characteristics of each disaster categories, then construct the plan © 2011 USC-CSSE 04/04/2011 42
University of Southern California
Center for Systems and Software Engineering
IT disaster preparedness framework
• • • provide generic objectives and measurements, guidelines for establishing IT disaster preparedness, • emphasize developing an IT continuity plan, identifying and allocating critical resources, executing a business impact analysis, and maintaining, testing and training of the plan
COBIT
(Control Objectives for Information and Related Technology) – For operational IT and business managers – Focus on three core elements of IT governance: IT as an asset, IT related risks, and IT control structures.
ITIL
(IT Infrastructure Library) – focus is to improve the efficiency and effectiveness of IT services delivered to customers within the enterprise – de facto standard for IT service management.
04/04/2011 © 2011 USC-CSSE 43
University of Southern California
Center for Systems and Software Engineering
IT disaster preparedness framework
COBIT
(Control Objectives for Information and Related Technology)
ITIL
(IT Infrastructure Library)
04/04/2011 © 2011 USC-CSSE 44
University of Southern California
Center for Systems and Software Engineering
Lesson Learned
1.
2.
Keep Data and Data Centers Out of Harm’s Way Don’t Assume the Public Infrastructure Will Be Available 3. Plan for Civil Unrest 4. Assume Some People Will Not Be Available 5. Leverage Your Suppliers as Critical Team Members © 2011 USC-CSSE 04/04/2011 45
University of Southern California
Center for Systems and Software Engineering
Lesson Learned
6. Expect the Unexpected 7. Get Prepared – Crisis portfolio 8. Establish a Strong Leadership Position 9. Empower Decision Makers on the Team 10.Exploit Fresh-Start Opportunities
© 2011 USC-CSSE 04/04/2011 46
University of Southern California
Center for Systems and Software Engineering
Outline
• • •
IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 47
University of Southern California
Center for Systems and Software Engineering
Worst Practices
Capers Jones, "Our Worst Current Development Practices,"
IEEE Software
, vol. 13, no. 2, pp. 102-104, Mar. 1996 •
Project failures
– –
terminated because of cost or schedule overrun experienced schedule or cost overruns in excess of 50 percent of initial estimates
–
resulted in client lawsuits for contractual noncompliance © 2011 USC-CSSE 04/04/2011 48
University of Southern California
Center for Systems and Software Engineering
Worst Practice #1
No historical software-measurement
• •
Lack of historical data makes stakeholders blind to see the realities of software development Need to check on schedule, cost, progress, performance © 2011 USC-CSSE 04/04/2011 49
University of Southern California
Center for Systems and Software Engineering
Worst Practice #2
Rejection of accurate estimates
•
No accurate estimate is the root cause for the rest of the worst practices including:
–
inability to perform return-on-investment calculations
–
susceptibility to false claims by tool and method vendors
–
software contracts that are ambiguous and difficult to monitor.
© 2011 USC-CSSE 04/04/2011 50
University of Southern California
Center for Systems and Software Engineering
Worst Practice #3 & 4
Failure to use automated estimating tools and automated planning tools.
•
50 commercial software-cost estimating tools
–
Checkpoint, COCOMO, Estimacs, Price-S, or Slim
•
100 project-planning tools on the market
–
Microsoft Project, Primavera, Project Manager’s Workbench, or Timeline
•
Combination of estimating and planning tools leads to accurate and realistic outcomes not easily overridden by clients or executive © 2011 USC-CSSE 04/04/2011 51
University of Southern California
Center for Systems and Software Engineering
Worst Practices
• • •
5 & 6 - Excessive, irrational schedule
pressure and creep in users’ requirements 7 & 8 - Failure to monitor progress and to perform risk management
–
“90 percent completion” 9 & 10 - Failure to use design reviews and
code inspections.
© 2011 USC-CSSE 04/04/2011 52