EC26_Classic_Disasters - Software Engineering II

Download Report

Transcript EC26_Classic_Disasters - Software Engineering II

University of Southern California

Center for Systems and Software Engineering

Software Classic Disasters

CS 577b Software Engineering II Supannika Koolmanojwong April 4, 2011

University of Southern California

Center for Systems and Software Engineering

Outline

• • •

IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 2

University of Southern California

Center for Systems and Software Engineering

IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices

R. Ryan Nelson , MIS Quarterly Executive Vol. 6 No. 2 / June 2007 • •

Retrospectives by project postmortems or post implementation reviews

99 retrospectives conducted in 74 organizations over the past 7 years

“Insanity: doing the same thing over and over again and expecting different results.” — Albert Einstein

© 2011 USC-CSSE 04/04/2011 3

University of Southern California

Center for Systems and Software Engineering

10 of the most infamous IT project failures

• • • •

Large magnitude Over $100 million One-half come from the public sector

– –

wasted taxpayer dollars lost services the other half - the private sector

– – –

billions of dollars in added costs lost revenues lost jobs.

© 2011 USC-CSSE 04/04/2011 4

University of Southern California

Center for Systems and Software Engineering

1. Internal Revenue Service (IRS)1999

• • • • •

PROJECT:

Business Systems Modernization;

Launched in 1999 to upgrade the agency’s IT infrastructure and more than 100 business applications $8 billion modernization project , team of vendors a complex project overwhelms the management capabilities of both vendor and client. the most expensive systems development “fiasco” in history, with delays costing the U.S. Treasury tens of billions of dollars per year. ability to collect revenue, conduct audits, and go after tax evaders was severely compromised © 2011 USC-CSSE 04/04/2011 5

University of Southern California

Center for Systems and Software Engineering

2. Federal Aviation Administration, 1996

PROJECT: Advanced Automation System (AAS); FAA’s effort to modernize the nation’s air traffic control system.

• •

Estimated to cost $2.5 billion ( $1.5 billion is wasted) Numerous delays and cost overruns, which were blamed on both the FAA and the primary contractor, IBM.

• Technical complexity of the effort, bad resource estimation, ineffectively requirements control • "For example, they wanted the system to have only

3 seconds

of downtime a year. But to get the data to prove that requirement had been met would have taken about 10 years” (later on change to 5 minutes downtime) • Instead of admitting the problem, IBM turned AAS into a research project • The project collapsed

04/04/2011 © 2011 USC-CSSE 6

University of Southern California

Center for Systems and Software Engineering

3. Federal Bureau of Investigation, 20004

PROJECT: “Trilogy;” Four-year, $500M overhaul of the FBI’s antiquated computer system.

Ill-defined requirements, changed dramatically after 9/11 (agency mission switched from criminal to intelligence focus)

• $170 million project was abandoned altogether • 400 problems with early versions of the troubled software, but never told the contractor • The bureau went ahead with a $17 million testing program even the software would have to be scrapped

04/04/2011 © 2011 USC-CSSE 7

University of Southern California

Center for Systems and Software Engineering

4. McDonalds, 2001

PROJECT: “Innovate;” Digital network for creating a real-time enterprise

• •

planned to spend $1 billion over five years Objective:

to better serve customers by using information and communications technologies to monitor the quality of products and services •

Executives in company headquarters would have been able to see how soda dispensers and frying machines in every store were performing, at any moment.

• • Would need $1billion for infrastructure, and $zillions to maintain and upgrade

After two years and $170M, the fast food giant threw in the towel. 04/04/2011 © 2011 USC-CSSE 8

University of Southern California

Center for Systems and Software Engineering

5. Denver International Airport 1994

• • •

PROJECT: Baggage-handling system.

It took 10 years and at least $600 million to figure out big muscles, not computers, can best move baggage The baggage system, designed and built by BAE Automated Systems Inc., launched, chewed up, and spit out bags so often that it became known as the “baggage system from hell.” © 2011 USC-CSSE 04/04/2011 9

University of Southern California

Center for Systems and Software Engineering

6. AMR Corp., Budget Rent A Car Corp., Hilton Hotels Corp., Marriott International Inc, 1992

PROJECT: “Confirm;” Reservation system for hotel and rental car bookings

After four years and $125 million in development, when it became clear that Confirm would miss its deadline by as much as two years.

Was supposed to be a leading edge comprehensive travel industry reservation program combining airline, rental car and hotel information

Major problems surfaced when Hilton tested the system, then 18 months delay and the problems could not be resolved 04/04/2011 © 2011 USC-CSSE 10

University of Southern California

Center for Systems and Software Engineering

7. Bank of America, 1988

• •

PROJECT: “MasterNet;” Trust accounting system.

hardware problems caused the Bank of America (BofA) to lose control of several billion dollars of trust accounts.

All the money was eventually found in the system, but all 255 people in the entire Trust Department were fired, as all the depositors withdrew their money.

This is a classic case study on the need for risk assessment, including people, process, and technology-related risk.

BofA spent $60M to fix the $20M project before deciding to abandon it altogether. BofA fell from being the largest bank in the world to No. 29

CRACK stakeholders problems, bad modular design, focusing in competing with competitors-but ready for transition 04/04/2011 © 2011 USC-CSSE 11

University of Southern California

Center for Systems and Software Engineering

8. Kmart, 2000

• • •

PROJECT: IT systems modernization $1.4 billion IT modernization effort aimed at linking its sales, marketing, supply, and logistics systems.

18 months later, cash-strapped Kmart cut back on modernization, writing off the $130 million it had already invested in IT.

Four months later, it declared bankruptcy

• Failing to allocate enough money and manpower to not clearly establishing the IT project's relationship to the organization's business

04/04/2011 © 2011 USC-CSSE 12

University of Southern California

Center for Systems and Software Engineering

9. London Stock Exchange, 1993

• • •

PROJECT: “Taurus;” Paperless share settlement system.

£800 million, original budget £6 million Abandoned after 10 years of development

• By Vista Concepts, US, for database management. Although being very good for on-line real time processing, it could not handle distributed data processing or batch processing •

LSE tried to modify Vista by rewriting almost 60% of it, hence hidden bugs and long delays

• Grew from

a settlement only system

, to become a full

“share registration and transfer system”.

© 2011 USC-CSSE 04/04/2011 13

University of Southern California

Center for Systems and Software Engineering

10. Nike, 2000

• • • •

PROJECT: Integrated enterprise software $400 million installing ERP, CRM, and SCM —the full complement of analyst blessed integrated enterprise software. Caused major inventory glitch, over produced some shoe models and under produced others profits drop by $100 million © 2011 USC-CSSE 04/04/2011 14

University of Southern California

Center for Systems and Software Engineering

Classic Mistakes

• • • • • •

Behind schedule Add more people Want to speed up development Cut testing A new version of OS becomes available during the project, Time for an upgrade! Key contributors aggravating the rest of the team? Wait until the end of the project to fire him!

© 2011 USC-CSSE 04/04/2011 15

University of Southern California

Center for Systems and Software Engineering

Classic Mistakes: People

• • • •

Undermined motivation

productivity and quality Individual capabilities of the team members or the working relationships Failure to take action to deal with a problem employee Adding people to a late project

pouring gasoline on a fire © 2011 USC-CSSE 04/04/2011 16

University of Southern California

Center for Systems and Software Engineering

Classic Mistakes: Process

• • • •

BDUF – Big Design Up Front Underestimate, overly optimistic schedules, under scoping it, undermining effective planning, and shortchanging requirements determination and/or quality assurance

Poor estimation also puts excessive pressure on team members, leading to lower morale and productivity.

Insufficient risk management contractor failure - outsourcing and offshoring © 2011 USC-CSSE 04/04/2011 17

University of Southern California

Center for Systems and Software Engineering

Classic Mistakes: Product

• • • • • • • •

FAA’s modernization effort, where the goal was 99.99999% reliability, which is referred to as “the seven nines.” Requirements gold-plating Feature creep

average project experiences about a +25% change in requirements over its lifetime.

Developer gold-plating - new technology that are required in the product.

Research-oriented development Silver-bullet syndrome Overestimated savings from new tools or methods Switching tools in the middle of a project © 2011 USC-CSSE 04/04/2011 18

University of Southern California

Center for Systems and Software Engineering

A Meta-Retrospective of 99 IT Projects

• • • •

process mistakes (45%), people mistakes (43%) product mistakes (8%) or technology mistakes (4%).

project managers should be experts in managing processes and people.

Scope creep didn’t make the top ten mistakes

As long as project manager pays attention to it Contractor failure has been climbing in frequency in recent years If the project managers had focused their attention on better estimation and scheduling, stakeholder management, and risk management , they could have significantly improved the success of the majority of the projects studied.

© 2011 USC-CSSE 04/04/2011 19

University of Southern California

Center for Systems and Software Engineering

Avoid classic mistakes through best practices

1. Avoiding Poor Estimating and/or Scheduling

Cost overrun, 1994-180%, 2003-43%,

– –

Schedule overrun, 2000- 63%, 2007-82%.

• • •

cone of uncertainty by multiplying the “most likely” single-point estimate by the optimistic factor lower bounds - optimistic estimate upper bounds - pessimistic estimate.

Capital One

• • • •

100% cushion - beginning of the feasibility phase 75% cushion in the definition phase 50% cushion in design 25% cushion at the beginning of construction © 2011 USC-CSSE 04/04/2011 20

University of Southern California

Center for Systems and Software Engineering

Avoiding Poor Estimating and/or Scheduling

Valuable approaches to improving project estimation and scheduling

Timebox development

shorter, smaller projects are easier to estimate,

creating a work breakdown structure

to help size and scope projects

retrospectives

to capture actual size, effort and time data for use in making future project estimates

a project management office to maintain a repository of project data over time.

© 2011 USC-CSSE 04/04/2011 21

University of Southern California

Center for Systems and Software Engineering

Avoiding Ineffective Stakeholder Management

• •

ineffective stakeholder management is the second biggest cause of project failure Have to know

– – – –

who has influence over others who has direct control of resources stakeholder level of interest stakeholder degree of support/resistance © 2011 USC-CSSE 04/04/2011 22

University of Southern California

Center for Systems and Software Engineering

Avoiding Insufficient Risk Management

• •

risk identification, analysis, prioritization, risk-management planning, resolution, and monitoring.

Methods/ tools

– – – –

a prioritized risk assessment table a top-10 risks list, interim retrospectives appointing a risk officer © 2011 USC-CSSE 04/04/2011 23

University of Southern California

Center for Systems and Software Engineering

Avoiding Insufficient Planning

Ensure the followings

– – – –

Clear roles and responsibilities Resource allocation Schedule / timeline Follow project policies, plans, and procedures © 2011 USC-CSSE 04/04/2011 24

University of Southern California

Center for Systems and Software Engineering

Avoiding Shortchanging Quality Assurance

• • •

When a project falls behind schedule, the first two areas that often get cut are testing and training .

Cut corners by eliminating test planning, eliminating design and code reviews, and performing only minimal testing Suggestions:

agile development, joint application design sessions, automated testing tools, and daily build-and-smoke tests.

© 2011 USC-CSSE 04/04/2011 25

University of Southern California

Center for Systems and Software Engineering

Avoiding Weak Personnel and/or Team Issues

• •

get the right people assigned to the project from the beginning Between 1999 and 2006, the retrospectives reported an increasing number of problems with distributed, inter-organizational, and multi-national teams.

reduction in face-to-face team meetings, time zone barriers, and language and cultural issues © 2011 USC-CSSE 04/04/2011 26

University of Southern California

Center for Systems and Software Engineering

Avoiding Insufficient Project Sponsorship

• •

Not only getting top management support, but identifying the

right sponsor

From the beginning !!!

04/04/2011 © 2011 USC-CSSE 27

University of Southern California

Center for Systems and Software Engineering 04/04/2011 © 2011 USC-CSSE 28

University of Southern California

Center for Systems and Software Engineering

Outline

• • •

IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 29

University of Southern California

Center for Systems and Software Engineering

Hurricane Katrina

04/04/2011 © 2011 USC-CSSE 30

University of Southern California

Center for Systems and Software Engineering

Recovering IT in a Disaster: Lessons from Hurricane Katrina

Iris Junglas, Blake Ives, MIS Quarterly Executive Vol. 6 No. 1 / Mar 2007

• • • • •

August 29, 2005 - Hurricane Katrina destroyed a data center and communications infrastructure at the Pascagoula and Gulfport, Mississippi, operations of the Ship Systems sector of Northrop Grumman Corporation Also put a second data center out of commission in a shipyard near New Orleans 20,000 employees in Ship Construction Caused over US$1 billion in damage for the company Brought two of the nation’s largest shipyards to a standstill © 2011 USC-CSSE 04/04/2011 31

University of Southern California

Center for Systems and Software Engineering

Recovering IT in a Disaster

• • •

How to adapt when the business continuity plan; inadequate public infrastructure Reexamine our processes for preparing disaster plans Processes for assessing preparedness and response after a disaster or a near-disaster.

© 2011 USC-CSSE 04/04/2011 32

University of Southern California

Center for Systems and Software Engineering

Northrop Grumman Corporation

Products : electronics, aerospace, and shipbuilding

Customers: government and commercial customers worldwide

Major business:

Ship construction - large military vessels

– – – –

04/04/2011 Revenue: US$5.7 billion in 2005 Customers: DoD and Navy 12,900 employees at Mississippi; 7,100 employees at the New Orleans © 2011 USC-CSSE 33

University of Southern California

Center for Systems and Software Engineering

Preparation for Hurricane

• •

Hurricane is nothing new to ship industry

– –

September 04 – Hurricane Ivan July 05 - Hurricane Dennis A bigger one is heading in

August 05

11 people dead, over US$1billion in damage in Florida © 2011 USC-CSSE 04/04/2011 34

University of Southern California

Center for Systems and Software Engineering

Preparation for Hurricane

• • • • •

Data

Data backups were sent to Iron Mountain (information management services)

Double back up in Dallas Servers

– –

power off wrapped in plastic New backup generator – in secure location Only one extranet alive (crucial the Navy and DoD) Human

Left the area 04/04/2011 © 2011 USC-CSSE 35

University of Southern California

Center for Systems and Software Engineering

The storm smashed

• • • •

NGC facilities are on the storm’s path Communication failed Extensive damage to shipyard and nearby communities Emergency command center – at Dallas office – newly assembled emergency team is formed © 2011 USC-CSSE 04/04/2011 36

University of Southern California

Center for Systems and Software Engineering

• • • •

Damages

Collect digital images of damages At Mississippi, lost

1,500 PC, 200 servers, 300 printers, 600 data input devices, and hundreds of two-way radios.

– –

communications closets, routers, switches, fiber and copper cables and wires. LAN / WAN / MAN – no longer worked At New Orleans

– –

Infrastructures are there AC systems are not working, hence servers are automatic shutdown A week after the storm, communication lines are down again due to cars are driving over them © 2011 USC-CSSE 04/04/2011 37

University of Southern California

Center for Systems and Software Engineering

First thing first

Not about restoring computer systems, but restoring human resources

But most of the 20,000 employees were out of contact

Tools

Press releases

Corporate web site (67,000 hits in the weeks after the storm )

Toll-free call in number

Payroll through Wal-Mart and Western Union © 2011 USC-CSSE 04/04/2011 38

University of Southern California

Center for Systems and Software Engineering

Restoring IT infrastructure

• • • • •

Electronic communication – nonexistent due to public communication infrastructure Communication through Black Berry can be used intermittently Two-way radios, walkie-talkies Key members using satellite phones

Required line-of-sight access to satellites Later on, use wireless communication © 2011 USC-CSSE 04/04/2011 39

University of Southern California

Center for Systems and Software Engineering

Building new data center

• • •

Hardware acquisition Incompatibilities between software and new hardware environment Inaccessible or difficult to find system documentation, e.g. license keys, server names, addressing schemes, login IDs © 2011 USC-CSSE 04/04/2011 40

University of Southern California

Center for Systems and Software Engineering

Restoring data and applications

• • • •

Some firms found that their back up data is partially unreadable For NGC, 2 backups : iron mountain and Dallas Lost some data on desktops or local machines Two weeks after Katrina – had a new data center; essential systems are up and running © 2011 USC-CSSE 04/04/2011 41

University of Southern California

Center for Systems and Software Engineering

Disaster preparedness

• • •

Common mistake : prepare for disasters specific to their domain

– – –

financial institutions prepare for IT failures, hospitals for pandemics airliners for technical failures and sabotages. An alternative approach : consider a broader spectrum of disaster types, such as the generic disaster

economic, information, physical, human resource, reputation, psychopathic, and natural disasters Identify common characteristics of each disaster categories, then construct the plan © 2011 USC-CSSE 04/04/2011 42

University of Southern California

Center for Systems and Software Engineering

IT disaster preparedness framework

• • • provide generic objectives and measurements, guidelines for establishing IT disaster preparedness, • emphasize developing an IT continuity plan, identifying and allocating critical resources, executing a business impact analysis, and maintaining, testing and training of the plan

COBIT

(Control Objectives for Information and Related Technology) – For operational IT and business managers – Focus on three core elements of IT governance: IT as an asset, IT related risks, and IT control structures.

ITIL

(IT Infrastructure Library) – focus is to improve the efficiency and effectiveness of IT services delivered to customers within the enterprise – de facto standard for IT service management.

04/04/2011 © 2011 USC-CSSE 43

University of Southern California

Center for Systems and Software Engineering

IT disaster preparedness framework

COBIT

(Control Objectives for Information and Related Technology)

ITIL

(IT Infrastructure Library)

04/04/2011 © 2011 USC-CSSE 44

University of Southern California

Center for Systems and Software Engineering

Lesson Learned

1.

2.

Keep Data and Data Centers Out of Harm’s Way Don’t Assume the Public Infrastructure Will Be Available 3. Plan for Civil Unrest 4. Assume Some People Will Not Be Available 5. Leverage Your Suppliers as Critical Team Members © 2011 USC-CSSE 04/04/2011 45

University of Southern California

Center for Systems and Software Engineering

Lesson Learned

6. Expect the Unexpected 7. Get Prepared – Crisis portfolio 8. Establish a Strong Leadership Position 9. Empower Decision Makers on the Team 10.Exploit Fresh-Start Opportunities

© 2011 USC-CSSE 04/04/2011 46

University of Southern California

Center for Systems and Software Engineering

Outline

• • •

IT Project Management: Infamous Failures, Classic Mistakes, and Best Practices Recovering IT in a Disaster: Lessons from Hurricane Katrina Top 10 Worst Practices © 2011 USC-CSSE 04/04/2011 47

University of Southern California

Center for Systems and Software Engineering

Worst Practices

Capers Jones, "Our Worst Current Development Practices,"

IEEE Software

, vol. 13, no. 2, pp. 102-104, Mar. 1996 •

Project failures

– –

terminated because of cost or schedule overrun experienced schedule or cost overruns in excess of 50 percent of initial estimates

resulted in client lawsuits for contractual noncompliance © 2011 USC-CSSE 04/04/2011 48

University of Southern California

Center for Systems and Software Engineering

Worst Practice #1

No historical software-measurement

• •

Lack of historical data makes stakeholders blind to see the realities of software development Need to check on schedule, cost, progress, performance © 2011 USC-CSSE 04/04/2011 49

University of Southern California

Center for Systems and Software Engineering

Worst Practice #2

Rejection of accurate estimates

No accurate estimate is the root cause for the rest of the worst practices including:

inability to perform return-on-investment calculations

susceptibility to false claims by tool and method vendors

software contracts that are ambiguous and difficult to monitor.

© 2011 USC-CSSE 04/04/2011 50

University of Southern California

Center for Systems and Software Engineering

Worst Practice #3 & 4

Failure to use automated estimating tools and automated planning tools.

50 commercial software-cost estimating tools

Checkpoint, COCOMO, Estimacs, Price-S, or Slim

100 project-planning tools on the market

Microsoft Project, Primavera, Project Manager’s Workbench, or Timeline

Combination of estimating and planning tools leads to accurate and realistic outcomes not easily overridden by clients or executive © 2011 USC-CSSE 04/04/2011 51

University of Southern California

Center for Systems and Software Engineering

Worst Practices

• • •

5 & 6 - Excessive, irrational schedule

pressure and creep in users’ requirements 7 & 8 - Failure to monitor progress and to perform risk management

“90 percent completion” 9 & 10 - Failure to use design reviews and

code inspections.

© 2011 USC-CSSE 04/04/2011 52