Creating the Dev/Test/PM/Ops Supertribe: From Visible Ops To DevOps

Download Report

Transcript Creating the Dev/Test/PM/Ops Supertribe: From Visible Ops To DevOps

Creating the Dev/Test/PM/Ops
Supertribe:
From Visible Ops To DevOps
Gene Kim, CISA, TOCICO Jonah
Velocity Conference
June 15, 2011
@RealGeneKim, [email protected]
Where Did The High Performers Come
From?
@RealGeneKim, [email protected]
Higher Performing IT Organizations Are More
Stable, Nimble, Compliant And Secure
 High performers maintain a posture of compliance
 Fewest number of repeat audit findings
 One-third amount of audit preparation effort
 High performers find and fix security breaches faster
 5 times more likely to detect breaches by automated control
 5 times less likely to have breaches result in a loss event
 When high performers implement changes…




14 times more changes
One-half the change failure rate
One-quarter the first fix failure rate
10x faster MTTR for Sev 1 outages
 When high performers manage IT resources…
 One-third the amount of unplanned work
 8 times more projects and IT services
 6 times more applications
Source: IT Process Institute, 2008
@RealGeneKim, [email protected]
Common Traits of High Performers
Culture of…
Change management



Integration of IT operations/security via problem/change management
Processes that serve both organizational needs and business objectives
Highest rate of effective change
Causality


Highest service levels (MTTR, MTBF)
Highest first fix rate (unneeded rework)
Compliance and continual reduction of
operational variance




Production configurations
Highest level of pre-production staffing
Effective pre-production controls
Effective pairing of preventive and detective controls
Source: IT Process Institute
@RealGeneKim, [email protected]
Visible Ops: Playbook of High
Performers
• The IT Process Institute has been
studying high-performing
organizations since 1999
– What is common to all the high
performers?
– What is different between them
and average and low performers?
– How did they become great?
• Answers have been codified in
the Visible Ops Methodology
• The “Visible Ops Handbook” is
now available from the ITPI
www.ITPI.org
@RealGeneKim, [email protected]
2007: Three Controls Predict 60% Of
Performance
• To what extent does an organization define,
monitor and enforce the following?
– Standardized configuration strategy
– Process discipline
– Controlled access to production systems
Source: IT Process Institute, 2008
@RealGeneKim, [email protected]
The Darkest Moment In My
Journey
@RealGeneKim, [email protected]
Tough Love From Ari Balogh
@RealGeneKim, [email protected]
Why Was I So Unsatisfied With The
State Of IT Practice?
• IT operations work continued to be viewed as tactical
• Information security and compliance programs were sucking all the
air out of the room (due to scoping problems)
• The activation energy for successful improvement programs was
still too high
• The IT operations issues overshadowed by development
– Issues are amplified 10x in production: outages, findings, lawsuits
– Technical debt builds up over time
– IT operations is often the constraint in the organization
• Linkage of IT performance to business performance not obvious
enough
• “Why doesn’t the business care? I found the pump handle!”
@RealGeneKim, [email protected]
Seeing The Bigger Problem
Operations Sees…
• Fragile applications are prone to failure
• Long time required to figure out “which
bit got flipped”
• Detective control is a salesperson
• Too much time required to restore service
• Too much firefighting and unplanned work
• Planned project work cannot complete
• Frustrated customers leave
• Market share goes down
• Business misses Wall Street commitments
• Business makes even larger promises to
Wall Street
Dev Sees…
• More urgent, date-driven projects
put into the queue
• Even more fragile code put into
production
• More releases have increasingly
“turbulent installs”
• Release cycles lengthen to amortize
“cost of deployments”
• Failing bigger deployments more
difficult to diagnose
• Most senior and constrained IT ops
resources have less time to fix
underlying process problems
• Ever increasing backlog of
infrastructure projects that could fix
root cause and reduce costs
• Ever increasing amount of tension
between IT Ops and Development
These aren’t IT Operations problems…
These are business problems!
@RealGeneKim, [email protected]
The Dreaded Disease
IT Operations Constipatus (noun)
Occurs when IT Operations
creates fatal blockages in project
flow. Creates blinding pain in
Dev organization.
Blockage worsens with chronic
break/fix and
security/compliance work, and
when technical debt is never paid
off.
Causes host to lose energy,
become unable to achieve
organizational goals. Dangerous
Phototo
credit:
CEOs.
http://www.flickr.com/photos/keenepubliclibrary/2435790649/
@RealGeneKim, [email protected]
DevOps Can Break A
Core Chronic Conflict In IT *
• Every IT organization is pressured to
simultaneously:
– Respond more quickly to urgent business needs
– Provide stable, secure and predictable IT service
Words often used to describe ITIL process owners:
“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned
with the business, immature, shrill, perpetually focused on irrelevant technical
minutiae…”
12
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written
extensively on the theory and practice of identifying and resolving core, chronic conflicts.
@RealGeneKim, [email protected]
Framed This Way, Help Can Come
From A Surprising Place
• The VP Application Development will often have the following
complaints:
– IT Operations is the bottleneck
– We complete the code, but it takes too long for IT Operations to
get the code into production
– Environments are never available when we need them
– Releases often cause chaos and disruption to all the other
production services
– Turbulent installs have become the norm: 30 min installs take 3
days
– Due to slow OS upgrades, applications delayed by 2 quarters
– We are always late getting features to market
@RealGeneKim, [email protected]
A Reframed IT Operations Problem
Statement
• Increase flow from Dev to Production
– Increase throughput
– Decrease WIP
• Our goal is to create a system of operations that allows
– Planned work to quickly move to production
– Ensure service is quickly restored when things go wrong
• How does this relate to Visible Ops?
–
–
–
–
We focused much on “unplanned work”
What’s happening to all the planned work?
At any given time, what should IT Ops be working on?
Now we are focusing on the flow of planned work
@RealGeneKim, [email protected]
What These Breakthroughs Look
Like
@RealGeneKim, [email protected]
Goal #1: Decrease Cycle Time Of
Releases
•
•
•
•
•
•
•
•
•
Create determinism in the release process
Move packaging responsibility to development
Release early and often
Decrease cycle time
– Reduce deployment times from 6 hours to 45 minutes
– Refactor deployment process that had 1300+ steps spanning 4 weeks
Never again “fix forward,” instead “roll back,” escalating any deviation
from plan to Dev
Verify for all handoffs (e.g., correctness, accuracy, timeliness, etc…)
Ensure environments are properly built before deployment begins
Control code and environments down the preproduction runways
Hold Dev, QA, Int, and Staging owners accountable for integrity
@RealGeneKim, [email protected]
Goal #2: Increase Production Rigor
• Define what work is and where work can come from
• Protect the integrity of the work queue (e.g., are checks being
written than won’t clear?)
• To preserve and increase throughput, elevate preventive projects
and maintenance tasks
• Document all work, changes and outcomes so that it is repeatable
• Ops builds Agile standardized deployment stories, to be completed
after Dev sprints are complete
• Maintains adequate situational awareness so that incidents could
be quickly detected and corrected
• Standardize unplanned work and escalations
• Always seeking to eradicate unplanned work and increase
throughput
Lean Principle: “Better -> Faster -> Cheaper”
@RealGeneKim, [email protected]
The Prescriptive DevOps Cookbook
• Capture and codify how to start and finish successful
DevOps transformations
– Create isomorphic mapping between plant floors and IT
shops
– Co-authoring with Patrick DeBois, Mike Orzen, John Willis
– Describe in detail how to replicate the transformations
describe in “When IT Fails: The Novel”
• Goals
– How does IT Operations become a dependable partner
– How does Dev become a dependable partner
– How does Dev and Ops work together to solve business
problems (and Infosec, too)
@RealGeneKim, [email protected]
The Prescriptive DevOps Cookbook
• I am seeking fellow travelers who want to capture
and codify the best known methods, patterns/antipatterns, recipes and case studies of how to
implement successful DevOps-style transformations.
@RealGeneKim, [email protected]
The Theory of Constraints Approach To
Visible Ops
• Dr. Goldratt wrote The Goal in
1984, describing Alex’s challenge
to fix his plant’s cost and due date
issues within 90 days
• Some tenets that went against
common wisdom:
– Every flow of work has a
constraint/bottleneck
– Any improvement not made at the
bottleneck is merely an illusion
– Fallacy of cost accounting as
operational management tool
@RealGeneKim, [email protected]
When IT Fails: The Novel
Day 1
• Steve Masters, CEO
• Dick Landry, CFO
• Parts Unlimited
$4B revenue/year
@RealGeneKim, [email protected]
When IT Fails: The Novel
Day 2
• Bill Palmer, VP IT Operations (promoted)
– Wes Davis, Director, Distributed Systems
– Patty McKee, Director, IT Service Support Services
• The payroll outage
– All salaried employees will get paid, but not the hourlies
– CISO put in tokenization application in the factories,
breaking database query that uses SSN
– IT Ops thought it was a SAN firmware upgrade failure
– All HR apps go down
– CFO is on front page of news, apologizing to community
@RealGeneKim, [email protected]
When IT Fails: The Novel
Day 4
• Chris Allers, VP Application Development
• Sarah Moulton, SVP Retail Products
• “We can deploy by next week by cutting some
corners, but IT Ops is in the way… again…”
• “Bill, your team lacks a sense of urgency. We
must go. We’ve already bought the newspaper
ads – they’re bought, paid for and being
printed…”
@RealGeneKim, [email protected]
When IT Fails: The Novel
Day 3
• Nancy Mailer, Chief Audit Executive
• John Pesche, CISO
• IT Operations has 980 IT general control
deficiencies on critical financial systems,
potentially dooming financial statement to having
a footnote. Needs management response in 1
week.
• Bill grapples with who to put on the project. 1 yr
of work, just to fix issues, even without Phoenix.
@RealGeneKim, [email protected]
The Goal For IT: Day 10
• The Deployment
• Database conversion, the point of no return,
taking 1000x longer.
• In store POS won’t come up by Sat 8am,
maybe by next Tuesday
• Emptying shopping cart shows last successful
order credit card #
@RealGeneKim, [email protected]
Call To Action
• If you’re interested in reviewing early versions
of “When IT Fails: The Novel,” email me.
• If you’re interested in helping build or review
the DevOps Cookbook, email me.
• I’m [email protected]
• Thank you for allowing me to join your tribe!
@RealGeneKim, [email protected]
Resources
• From the IT Process Institute www.itpi.org
– Both Visible Ops Handbooks
– ITPI IT Controls Performance Study
• “Lean IT” by Orzen and Bell
– Winner of the Shingo Prize 2011
• “Inspired: How To Create Products That
Customers Love” by Cagan
• “Continuous Delivery: Reliable Software
Releases through Build, Test, and
Deployment Automation” by Humble,
Farley
• Follow Gene Kim
– @RealGeneKim
– mailto:[email protected]
– http://realgenekim.me/blog
@RealGeneKim, [email protected]
@RealGeneKim, [email protected]
About Gene Kim
• I’ve spent the last 12 years studying high performing IT
organizations, trying to understand:
– What do they have in common?
– What is present in successful transformations, absent in
unsuccessful transformations?
– How do we lower the activation energy required to create the
transformations?
• Founder and former CTO of Tripwire, Inc.
• Co-author of Visible Ops Handbook, Security Visible Ops
Handbook
• Active researcher
– Co-founder of IT Process Institute
– Committee member of Institute of Internal Auditors
– Leader of PCI Security Standards Council Scoping SIG
@RealGeneKim, [email protected]