Transcript Slide 1

Planning and Auditing Your
Firm’s Capacity Planning Efforts
By Ron “The Hammer” Kaminski
[email protected]
[email protected]
Foreign speaker rules
• Please feel free to stop me to ask any
questions
– Raise your hand or clap if I am going too fast
or if my Mississippi accent becomes
impossible for y’all to understand
• This is not rude, and I will not take it that
way
– The paper and all slides will be furnished to
my hosts
Introduction
• Over the past 20 years, I’ve started and expanded capacity planning
groups at dozens of firms, my most recent is now 15 months old
– You learn things in that process
– CMG is the place to share this information
– I look forward to your presentation on this topic in a few years!
• Today’s goal is to give you “planning and audit points” that you can use to
review how you do capacity planning, and maybe persuade you that other
methods might be more productive, or at least worth a shot!
• There will also be “How to” information, that may have you adding some
“to do” items to your list
• If you have a question, ask it!
– I like nothing better than surfing off on a tangent that helps the class
• Story Times!
• New risks
© Ron Kaminski 2010, All Rights Reserved
3
Introduction
• In the next few hours, we will cover
–
–
–
–
–
Defining your mission
Picking the right vendor partners
Going “Extra-Product”
Avoiding the “IT Mindset Traps”
The politics of capacity planning in organizations, the
key factor in your eventual success, or failure
– Reporting, what you should and surprisingly should
not do
– Classic capacity planning question descriptions and
proper answering techniques
© Ron Kaminski 2010, All Rights Reserved
4
Introduction
• In the next few hours, we will cover
– How clouds and “software as a service” will still need
capacity tracking and planning tools, and what new kinds
you will need
– Modeling when all of the cards are stacked against you, or
“Tricks of the trade”
– Goals to work towards
– An audit list to compare to your systems
• Capacity planning done well can change the fortunes of
a company and help all of our careers. Come sharpen
your methods and learn tricks that will make you part
of your firm’s future productive assets, and not an
expense to be controlled
© Ron Kaminski 2010, All Rights Reserved
5
Ron’s Rules
• You can ask anything, at any time
– Sometimes the answer is coming up soon in the examples, and in that
case I’ll tell you so
• Quick Survey
– Does anyone here already have…
• A network queuing theory based modeling package?
• Regular, automated process and workload pathology detection?
• Fast web reporting of resource consumption by business useful workloads?
• By the end of this talk, I hope that you will realize that workload
characterized views of consumption, web accessible, over business
useful time spans are a must have part of the best run IT shops
– Lets see why…
Defining your mission
• Every site has their own “Hot button!” issues
– “We are buying a new $23 million computer room every 6 months!”
• Attack server sprawl with data, not words
– “I don’t know why we hired a capacity planner, we just…”
– “Our critical applications are slowing down!”
• Use relative response times and historical information to show why
– Chargeback used to be a big draw but it has really faded away in the
post .com world
• It shows you when you are talking to an old vendor
– The ITIL push and reality when facing outsourcing or “ZOG”
• ITIL takes a back seat to cost control, at least in the states
– “We need better reporting!”
• Be careful to be holistic in what you deliver, cover every thing that they can
buy, historically and ideally with business cycle peaks
• When you start hearing terms like “focus on business priority” and “really look
at travel expenses” realize that cost cutting is in your future and report in ways
that enable them to cut power and machines
© Ron Kaminski 2010, All Rights Reserved
7
Defining your mission
• You might think that all that variation would lead to very
different solutions, and you’d be wrong!
– All effective capacity planning systems are based on having:
•
•
•
•
•
Efficient data collection, regrouping, reduction and storage
Effective graphical reporting of business meaningful spans of time
Components of workload response time that lead to diagnosis
Solving the desire for answers to “What if…?” questions
Problematic consumption diagnosis, reporting and ticketing
– Some capacity planning product “features” marketed by vendors to
the naïve are actually seldom used in the real world, and for good
reasons
• Linear Trending, when what you really need is business cycle discovery and
planning
– The retail cycle at grocery chains and web payment system vendors
• Real Time Monitors, when you might want to go home or on vacation some day.
Remember, problems happen 24 X 7, and humans won’t be watching “twitch
monitors” that consistently. - The mission control room story
• Top 10 is often used to focus a newbie on peak consumption, which may all be valid
© Ron Kaminski 2010, All Rights Reserved
8
Defining your mission
• Who is doing the reporting?
– Vendor supplied reports
• Tend to be single metric
• Often don’t include contextual information
• Are often “generate on demand” and therefore any useful span of
time takes beyond the allowable attention span
• Often have serious contextual clarity problems
– Workloads change colors as
» the number present changes
» You switch machines
» Use black outlines that swamp the colors for small workloads
– The “I’m only using vendor reports this time” and hit count story
• Can take unimaginable resources to produce
– Set yourself a consumption budget and manage to it
– You want to trade more bonds? Stop looking at it!
• May focus on reporting “right now” data rather than long term useful
decision support information
• Seldom contain “disturbance to the status quo” notation capabilities
© Ron Kaminski 2010, All Rights Reserved
9
Defining your mission
• Who is doing the reporting?
– Write your own reports
• Can be anything that you dream up (and can deliver the code for)
• There are multiple “free” languages and infrastructure to pick
from
– We’ve used perl, PHP, java and a whole lot more
• Can be tailored for your firm’s decision maker’s specific needs
• Can use “generate ahead” and other techniques to speed web
reporting
• Writing your own can also have “down sides”
– Staff turnover and the “Who is going to maintain this ___?” issues
– Some staff are not gifted visual communicators
– If the information used changes formats, (and over time they all do)
someone is going to have to maintain that stuff
© Ron Kaminski 2010, All Rights Reserved
10
Defining your mission
• What do you want to present?
– Workload characterized subdivisions of consumption over
time?
– Long term historical context for decision makers over
multiple natural business cycles?
– Information subdivided into audience specific groupings
for ease of use by subgroups
– Integration into your firm’s
• CMDB
• Ticketing systems
• Software development life cycle
– Totals over time
• The spark lines counter-argument
© Ron Kaminski 2010, All Rights Reserved
11
Why sparklines of totals can be really useful
• These are
sparklines of
total CPU
used,
Average CPU
used and the
average CPU
used by all
nodes in that
O/S
• Is there one
in particular
that draws
your eyes to
it, that wants
you to probe
deeper?
© Ron Kaminski 2010, All Rights Reserved
12
Why sparklines of totals can be really useful
•
•
If you are like me, ustca102 has you wondering, “What made it step up like that?
On our system, clicking on the tiny sparkline brings up a “zoomed in” image, which
really gets you wondering:
•
Clicking on that graphic brings up our normal web reporting system:
© Ron Kaminski 2010, All Rights Reserved
13
Why sparklines of totals can be really useful
© Ron Kaminski 2010, All Rights Reserved
14
Why sparklines of totals can be really useful
• OK, sometimes totals are useful
– Sometimes they can draw your eye to issues
– They can quickly dispel rumors that “All of our
machines are maxed out!”
• For example, our applications specialists were
consistently maintaining that all of their machines were
barely big enough to make month end, and they would
argue mightily whenever we might suggest that there
was room for consolidation
• I brought the chart on the next slide to the next
meeting, and suddenly their tune changed…
© Ron Kaminski 2010, All Rights Reserved
15
Why sparklines of totals can be really useful
© Ron Kaminski 2010, All Rights Reserved
16
Why sparklines of totals can be really useful
• What happened after the meeting?
– In the next 9 months, using extremely conservative criteria, we
• Virtualized
• Retired
230 machines ($1,521,000)
55 machines ($ 390,553)
– “Oh! You can just turn that off!”, or, “See steam come out of the operations
folk’s ears” stories
• Planned
• Potential
10 machines ($ 40,000)
28 machines ($ 112,000)
– We then plan on going back over with slightly less conservative criteria
and finding a couple million more
– We will also be doing more “application stacking” where it makes
more sense
• Sort of makes capacity planning tools look cheap, doesn’t it?
© Ron Kaminski 2010, All Rights Reserved
17
Why sparklines of totals can be really useful
• A DBA pal of mine asked for a review of memory on a box, asking for an
increase to add caching and improve performance
– I didn’t really detect a memory shortage:
© Ron Kaminski 2010, All Rights Reserved
18
Why sparklines of totals can be really useful
• Still, people don’t usually mention issues unless there is an
underlying cause. So, as a capacity planner, you have to
always look deeper and always check all of the following:
–
–
–
–
–
CPU
Disk I/O
Memory
Network
Response time for key workloads
• If you don’t always check everything, something can
sneak by
– Here is what I found when I followed the “always check
everything” rule
• When I looked at CPU, I saw:
© Ron Kaminski 2010, All Rights Reserved
19
Why sparklines of totals can be really useful
© Ron Kaminski 2010, All Rights Reserved
20
Why sparklines of totals can be really useful
© Ron Kaminski 2010, All Rights Reserved
21
Update!
• They’ve since added 2
more CPUs and the issue
continues unabated
– Some issues are not based
in physics and data!
© Ron Kaminski 2010, All Rights Reserved
22
New, new update,
Just for St. Louis!
© Ron Kaminski 2010, All Rights Reserved
23
New, new update,
Just for India!
In the end, someone looked at
what was running, and decided
most was waste!
Look at what happened after
Feb 22nd!
© Ron Kaminski 2010, All Rights Reserved
24
Why sparklines of totals can be really useful
• Now you see several reasons see why longer term
sparklines can be pretty useful
– Do you currently have ways to generate them?
– If not, do you want to get ways to generate them?
– Don’t you all think that your vendor ought to provide
them, in group and zoomed in formats?
• So lets start asking them to…
• Do you also see why you should always check
everything and then sit back and ask yourself:
– “If I had asked that question and then got this response,
what would I ask next?”
© Ron Kaminski 2010, All Rights Reserved
25
Defining your mission
•
–
Anticipate the “next questions” and always answer
them before being asked
The unanswered “next question” can be a huge time
waster
•
often a stall technique used by the politically astute
–
–
•
•
It raises temporary doubt in your findings, and builds their case
for swift purchase, before you answer their question
often a way for the old guard to show that they still are the “top
dogs” to management
Impatient or frightened management might run off and buy
something!
The undeclared war between Project Managers and
Capacity Planners
• The “project manager weasel who never lost” story
© Ron Kaminski 2010, All Rights Reserved
26
Defining your mission
–
If you are going to shoot down someone’s
hypothesis that lack of CPU was the cause of a
problem, you’d better find out what really caused
the problem before the meeting!
– Your goal:
• One meeting or phone call per issue!
– They may say “We just want a quick and dirty answer”
but they never really do!
– Always cover at least:
•
•
•
•
•
CPU
Memory
Disk I O
Workload response time changes
For web-centric systems, network distances and loads
© Ron Kaminski 2010, All Rights Reserved
27
Defining your mission
•
Cultural differences are real and might affect your
workload choices
–
Some cultures avoid direct blame or information that would
cause someone to “lose face”
– Any workloads are better than none
– The “No personal pronouns” story
•
Be consistent!
–
Always use the same groupings on all similar nodes
• Use the same colors if you can!
–
–
–
Reduce the burden on your audience
Multiply the value of your workload creation efforts
Use consistent precedence order to decide where to put a
process that meets the criteria to be in several different
workloads
© Ron Kaminski 2010, All Rights Reserved
28
Defining your mission
• Whatever you decide:
– Track your own tools usage!
• There are multiple great freeware web usage reports that will tell you if
folks are using or snoozing your data (We use webilizer:
http://www.mrunix.net/webalizer/ )
• Unviewed information is wasted time and efforts
– Use speed tests
• If there are multiple ways to do something (CSV files versus a Performance
database) code for both and have a race
– Will your web users want the slower one?
– The capacity planning reporting challenge story
– Don’t settle, always seek new audiences and better reports
• Add new functions
– Sadly, there is no shortage of bad vendor reporting on expensive
infrastructure
» Anyone here ever seen a great graphical historical display in business
useful terms of SAN information or LAN usage by segment?
– Your firm may have business specific information that might be really useful to
decision makers if overlaid on or graphically reported near with IT resource
consumption
© Ron Kaminski 2010, All Rights Reserved
29
Our site’s web usage:
© Ron Kaminski 2010, All Rights Reserved
30
Our site’s web usage:
© Ron Kaminski 2010, All Rights Reserved
31
Our site’s web usage:
© Ron Kaminski 2010, All Rights Reserved
32
Our shared long term mission
• When you innovate and come up with new report ideas, share
them at CMG!
– Or at least send me examples in mail and I’ll do it for you! 
– Share code in this or other user groups that make sense
• We should all work together in user groups, public forums, on the
web, etc., to push all of our vendor partners to address these needs
– The more they do for us, the less we carry the “home brew code”
weight
• We should also all work to reduce the volume, impact and long
term storage requirements of our solutions
– I have yet to encounter a vendor that isn’t carrying around a lot of
extra metrics in the bowels of their systems that will never be used
• We should have a CMG sponsored “help wanted” section for
capacity planning specialist positions in the various countries
© Ron Kaminski 2010, All Rights Reserved
33
Picking the right vendor partners
• I believe that all capacity planning efforts should have tools
that include:
– Efficient resource usage and process consumption collectors
– Network queuing theory based “what if…?” modeling based on
workloads, not total consumption
• The bulge trap
– Efficient, speedy web-based historical consumption data display
• Ideally your chosen vendor would
– support most or all of your differing operating systems and
devices
– have ample training and consultants available, there is nothing
better than a co-pilot when you are starting out
– participates in and supports CMG!
© Ron Kaminski 2010, All Rights Reserved
34
Picking the right vendor partners
• In the not too distant future, the best vendors should be:
– Offering efficient “low impact” “cloud deployable wrappers”
that run with your applications in a cloud
– “We don’t have to worry, its in a cloud” is nonsensical
• Are you going to generate fake transactions and time them?
• When you get a long time back, or significant variance, are you
going to have enough information to know why? I think that in
time people will realize this need, and want it in their contracts
• Don’t you want to know the overhead of encryption and
decryption in the process, and it’s response time effects?
• Stupidity is infinitely scalable, as long as you aren’t getting the
bill
– If nobody cares to make their code efficient, because they just send it
to the cloud, how good is that code going to be?
– Will it be running on the same machine as you tested?
– Will it impact your users?
© Ron Kaminski 2010, All Rights Reserved
35
Picking the right vendor partners
• In the not too distant future, the best vendors should be:
– Offering efficient “low impact” “cloud deployable wrappers”
that run with your applications in a cloud (continued)
• The internet will continue to grow logarithmically
– So those clouds could get mighty full, mighty quick
– How do you want to find out that it is too full?
» Do you want your customers telling you?
» Or do you want your own reports based on scientifically accurately
collected consumption data?
• Social media sites are becoming valuable business tools
– Businesses “tweet” and have Facebook pages!
– Do you think that a free application originally designed to let 14 year olds
share photos is designed for high performance business needs?
– How will you be sure?
© Ron Kaminski 2010, All Rights Reserved
36
Picking the right vendor partners
• In the not too distant future, the best vendors should be:
– Thinking about SaaS user tools as well, Sure, SaaS vendors
maintain the code and pay if it is a hog, but are they:
• running maintenance activities like backups and virus cans that slow
things down right during prime time for Australia in your globally
distributed firm?
• suffering from office hours peaks of consumption that impact your
user’s response times?
• Taking outages to horizontally scale that might impact your firm’s
ability to ship product?
– Without your own data, you will never know
• What responsibility do you have to your firm’s users?
• Why is this network queuing theory based modeling stuff
so important?
– Let’s understand what it means and then see an example…
© Ron Kaminski 2010, All Rights Reserved
37
Modeling Norms
• Most modeling packages assume
a Poison or Chi-squared
distributions of the arrival rate of
transactions
• Some simpler, yet often quite
elegant systems like Dr. Neil
Gunther’s PDQ modeling just use
a quadratic and forget the tails
– They aren’t all that different
despite what we modeling
junkies might say!
• Don’t focus on the distribution
selected, focus on whether they
use queuing theory models and
give you relative response times
© Ron Kaminski 2010, All Rights Reserved
Quadratic
Poisson
38
Why network queuing theory based modeling?
• These concepts are also
often illustrated with simple
queue graphics like the one
at the right
• An important implied
assumption is that all
requests are served, none
are lost
• Response time is the sum of
Queuing Time plus Service
Time
Arriving
Transactions
© Ron Kaminski 2010, All Rights Reserved
Queuing
Time
Service Completed
Time Transactions
Response
Time
39
Why network queuing theory based modeling?
– where:
– Q = Expected Queue
– U = Utilization
• Response time is the sum
of Queuing Time plus
Service Time
Expected Q=U/(1-U) Queues
Expected Queues
12
10
8
6
4
2
0
0%
10% 20%
30% 40% 50% 60%
70% 80% 90%
% Utilization
Expected Response Time
Total Response Time
• Methods do differ, but
queues for interactive
workloads are usually
computed based on load
percentage using a
formula like:
Q = U/(1-U)
12
10
8
6
4
2
0
% Utilization
CPU Service Time
© Ron Kaminski 2010, All Rights Reserved
Expected Queues
40
Why network queuing theory based modeling?
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Response Time Componants
3
2.5
2
1.5
1
0.5
CPU Service Time
© Ron Kaminski 2010, All Rights Reserved
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0
0:00
– Just pick a basis, the ratio is
the important part!
CPU Utilization
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Relative Response Times
• So, as a workload
competes for resources
throughout a day, it’s
response time is likely to
vary
• Computed relative
response times show us
both the variations and
the reason
• The Y Axis metric does not
matter!
Expected CPU Queueing Time
41
Why network queuing theory based modeling?
• A workload’s typical
transaction is likely to rely
on several resources
• Imagine a workload running
on a machine with four
CPUs, six disks and some
network IO on one card
• Note that when
technologies differ, service
times can differ
© Ron Kaminski 2010, All Rights Reserved
Workload
Transaction
Response
Time
42
Why network queuing theory based modeling?
• But it also might make demand
for I/Os faster and really slow
down the warehouse at 3:00 AM
too, so you better address the
I/O issue!
14
12
10
8
6
4
2
0
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
– When do you suspect that the
backups are running?
– Would a CPU upgrade help
daytime response?
Relative Response Time Componants
16
Relative Response Time
• Now do you see where a
graph like this can come
from?
• If the warehouse folks are
complaining about response
times at 3:00 AM, should
you upgrade the CPU?
CPU Service
Network Service
IO Wait
© Ron Kaminski 2010, All Rights Reserved
IO Service
CPU Wait
Network Wait
43
Picking the right vendor partners
• In my experience, network queuing theory based tools
move folks quickest to actionable answers
– Once you understand relative response times, most issues are
quick and easy to diagnose
• If a new vendor harps on linear “trending” graphics and
projections, don’t expect them to be around for very long
• If a monitoring or other product vendor keeps adding “and
you can use this for capacity studies” it is probably because
the salesperson heard that you were looking for capacity
planning tools!
– Stick with network queuing theory based packages and you
won’t go wrong!
– Dozens of “And we can do capacity planning too!” stories
© Ron Kaminski 2010, All Rights Reserved
44
Ron Goes Off on VMware
• VMware is not a capacity solution
• VMware is a “symptom” of now capacity
management
© Ron Kaminski 2010, All Rights Reserved
45
Ron Goes Off on VMware
• VMware is the single biggest indictment of the poor
way most firms have done capacity planning in the
Windows space
– The lack of workload characterized views of consumption
is why folks bought a server for each functional part
– “We don’t want to stack multiple applications on one
server! So we VMware them!
• …which is just stacking with the added joy of paying for not only
extra copies of the OS and tools, but $900+ for VMware as well
• And in the end, the code is running on the same box!
– VMware’s “so called” capacity planning tool is proof that
they never attended a CMG!
• It is as near useless as any marketed tool that I have ever seen, but
at least it is expensive…
© Ron Kaminski 2010, All Rights Reserved
46
Going “Extra-Product”
• Once you get used to your vendor’s product, if you are like me,
you’ll start wishing for more functions tailored to your specific
needs
– In the old days, a grey haired expert would whip out a spreadsheet or
other mathematical package and start creating some “home-brew”
solution
– I use perl and GD:Graphics, PHP, java script and anything else that I can
think of, you can use what makes sense to you
– Check out old CMG papers, they are laced with great ideas
• In other words, don’t feel limited to what your vendor does “out of
the box”
– Find buddies that use the same vendor and start sharing ideas and
code
– Things that you will see later in this presentation are shared among
dozens of firms and they wouldn’t live without them
– You don’t have to agree 100%, take what fits best and leave the rest
© Ron Kaminski 2010, All Rights Reserved
47
Going “Extra-Product”
• There are a whole group of us running many of the extensions
that we’ve developed over time
– Some of our extensions have made it into some products, but
nowhere near enough of them!
• We probably get 50% of our firm’s benefit from the tools from
our own extensions
• We regularly meet with the vendors and implore them to add
the features that we like
• Having more singing from the same hymnal might just get
through to them!
• Come join us! The best ideas might be in your head! Share!
© Ron Kaminski 2010, All Rights Reserved
48
Avoiding the “IT Mindset Traps”
• Capacity planners come in several flavors, because people from
several different camps end up in this role
– Scientists - Scientifically minded users of network queuing theory tools
and simulation models that want to subdivide consumption into
different behavioral groups and analyze them
– Application specialists – application subject matter experts who
“know the application” are trusted by management, and care deeply
about it’s success. They often come from the application side of the
firm
– Old Timers – They know everybody, have worked on everything and
have connections a and favors to call in to get things done. They often
come from the operations side of the firm
• Each of these can be successful, but some are more prone to
certain behaviors that can limit your capacity planning effectiveness
and raise the costs of doing it
• Lets look at the typical pros, cons and peccadilloes of each
© Ron Kaminski 2010, All Rights Reserved
49
The Scientists
• The Scientist capacity planner
– loves to get data from everywhere and everything that they can
– Willingly tackles huge tasks as long as there is a possible learning
benefit
– Will constantly tweak the automation to be able to get yet more data
– Will go “extra product” and build tools for specific functions without
fear, because they are used to building things from scratch and being
successful
• Pros
– No fear, they view no problem as intractable and are sure that if they
can get real data into a scientifically designed framework, business
useful learning will result
– No agenda, all applications and systems are equally important to
them, they will not lobby for one application to get resources instead
of another, preferring instead a rising tide that raises all boats
– Willing to try new methods and tools in search of solutions
© Ron Kaminski 2010, All Rights Reserved
50
The Scientists
• Cons
– Scientists can be viewed as “remote” or “doesn’t know the business”
by some in management, particularly application development
– They may want some really expensive and/or tricky software, and on
every machine, and these tools produce copious amounts of data that
needs to be processed, graphed and stored
– The volume of tools and special case software that they accumulate
over time can be hard to support by others
– Good ones are relatively rare, ones that can teach/mentor others are
extremely scarce
• Mindset Traps
– Scientists can go off on tangents, they really need a manager who can
• Help them get the most productive subset of tools working first
• translate their outputs into terms understandable to the business
• help keep them focused on what the business deems most valuable
– Their pursuit of the “one scientifically superior way” left unchecked
can lead to ongoing high costs
© Ron Kaminski 2010, All Rights Reserved
51
The Application Specialist
• The application specialist in the capacity planning role
– Will often drop everything else to don their fire-fighter jacket and
“save the firm” by working on emergencies
– Will rely strictly on simple O/S tools and minimal data, often just totals
because ‘that was all we needed when we started this thing, and look
how far we’ve come”
– Seldom tracks historical consumption data over time, or if they do,
seldom presents it in a format that is easily understood by others
• Pros
– They really do know the application, the folks who are powerful, and
they have a lot of chips at the bargaining table when it comes time to
get things negotiated
– Their application specific knowledge can really come in handy when
strange behaviors are noticed
– Their continuing drive to make an application succeed and the lengths
that they go to are often very favorably viewed by non-technical
management
© Ron Kaminski 2010, All Rights Reserved
52
The Application Specialist
• Cons
–EGO!
• Our conference rooms are named after comic book super heroes!
© Ron Kaminski 2010, All Rights Reserved
53
The Application Specialist
• Cons
– Their self confidence can lead to large egos, they dismiss opposing
views of how to address issues other than “the way that we’ve always
done it”
– Their extreme willingness to join in every fire-fight eats a lot of time
and delays the deployment of tools and systems (like long term
historical consumption tracking) that would help others understand
and make better decisions
– Tend to enjoy being the “go to guy” and thus seldom share the basis
for their decisions
• This is sometimes covering up the fact that the basis for their decisions is gut
feel, not data
– They will commit in public forums where management is present to
supporting the scientists to get some application specific technical
need, and then fail to do so in a timely manner, if ever
– They really know their silo, but they are very uncomfortable when
asked to go outside of it
© Ron Kaminski 2010, All Rights Reserved
54
The Application Specialist
• Mindset traps
– These folks career successes have been built on “thinking on their
feet” as issues occur, so they seldom take the time to build data
collection and reporting structures that lead to well informed
decisions
• “When you need to know something, just ask me.”
• They may even resist or delay deployment of capacity planning systems, calling
them “costly, unnecessary and not our application’s highest priority”
• They will resist changes to their sacred “architectures” from the 1980s
– They can be initially really interested in capacity planning information
about their application, and use it to point out the positive impacts of
their past decisions and successes
• …but don’t expect them to mention immense over capacity
– Often their interest stops immediately at the edge of their application
• When there are issues larger than one application, they view it as their duty to
“defend their applications turf” and will move to segregate the environments
into “us” and “them” groupings that need not share any infrastructure
– They think that “The vendor will tell us when to…”
© Ron Kaminski 2010, All Rights Reserved
55
The Old Timers
• The old timers in the capacity planning role
–
–
–
–
–
Are a calming presence in meetings
Have stories of a time when we faced something similar
Have the best jokes
Know and address the VPs as ‘Phil” and “Sandy”
Have capacity tracking systems that tend to the super-inclusive, when
asked, they alone can root out data about darn near anything, but
they have to be asked
• Pros
– They have the trust and respect of nearly everyone, because everyone
has worked successfully with them over time
– When they need tools or space to get or keep their data , they just go
ask “Phil” or “Sandy”
– Are among the few to have worked on many of the systems, not just
one or two, and so they understand deeply the inter-reliance of many
of the systems and how an issue in one can manifest elsewhere
© Ron Kaminski 2010, All Rights Reserved
56
The Old Timers
• Cons
– Old timers are often tired of learning. They seldom want to embrace
radical new methods when they are retiring in a few years
– Old timers are survivalists, or they wouldn’t be old timers. They have a
great political sense of when “not to rock the boat” and “who not to
mess with” that can prevent or delay the introduction of useful new
information
• Mindset Traps
– They approach capacity planning like they approached most of the IT
issues that they’ve faced in their long careers
• “Let’s start with a database with thousands of metrics!
• You never know what will come in handy”, so resist deleting them while disk
can still be purchased
– Their reporting systems evolved over a long time, hence can be
hopeless for someone new to decipher or change
• They can be based on large tables of numbers that only a select few can
successfully use
© Ron Kaminski 2010, All Rights Reserved
57
Avoiding the “IT Mindset Traps”
• So what do we do?
– How do we get the “pros” of each type and minimize the downsides?
• You must build a “matrix-ed” team containing some of each type
– The team concept must have support from the highest levels
– It must have priority from each of their respective management
– They must be charged with:
• enabling the scientists to integrate new tools into the environment
• getting graphical reporting working that management can understand
• maintaining just enough information to provide long term historical context for
decisions, but no more
– Sometimes, you’ll have to bring in outside expertise, and the only way
that will succeed is to have “friends in high places”
• It is critical to put this under an excellent manager
– Each of the three types have useful and less useful behavior patterns
– You need a manager that all can respect, who doesn’t try to be the
expert, rather one who coaches each to be part of a well functioning
whole
© Ron Kaminski 2010, All Rights Reserved
58
The politics of capacity planning
in organizations
• Organizational politics are often the key factor in your capacity
planning group’s eventual success or failure
• Long experience has taught many of us the importance of
– Friends in high places
• Try to get the capacity planning issue instigated by a knowledgeable VP or at
least a director
• Often a major initial stumbling block is even getting permission to install
collectors on production systems, much less the physics of actually doing it,
and there is nothing better than having their bosses boss saying, “Yes, you
must do this, it is a priority”
– Determining and rating the skills and power balances in your
organization, usually by O/S
– Managerial chaos can be a severe issue
– Diagnosing and surmounting the barriers to success
• Describing the type
• Their common barriers and techniques to surmount them
© Ron Kaminski 2010, All Rights Reserved
59
Identifying and surmounting barriers
• Barrier: The “not invented here” über-geek
– Identification clues
• Often are early members of a firm
• Usually position themselves as masters of several related technologies, but
can be rather sparse on details
• The younger the firm, the more often you find them, internet firms in high
growth areas are full of them
• They are convinced that “If we didn’t need it then, we don’t need it now!”
– Their typical barrier methods
• “This is not an organizational priority”
• “This collector code is not proven on our sensitive production systems”
– Techniques to surmount their barriers
• Friends in high places compel them
• Share credit for successes with them to their management
• Involve them in the model setup, ideally model along side them, letting them
suggest probable growth steps
© Ron Kaminski 2010, All Rights Reserved
60
Identifying and surmounting barriers
• Barrier: “The high priests of the old tool set”
– Identification clues
• They like “twitch monitoring” and often have built an extensive installation of
them with impressive sounding names like “The war room” or “mission
control”
– Whenever you enter it during non-emergencies, notice how few people are actually
using the displays
• They prefer current “totals” like total CPU because they’ve never had
consumption by business identifiable sub-groupings
• They react to brief workload peaks by demanding upgrades
– Their typical barrier methods
• Stalling. They ask streams of technical questions, and each answer that you
give prompts another
• Requests to integrate, new capacity tools must feed information to their “war
room”
– Techniques to surmount their barriers
• Ask them to put long term, workload characterized consumption on their
displays
• Have them tasked to help address pathologies automatically detected (that
their monitors did not seem to surface)
© Ron Kaminski 2010, All Rights Reserved
61
Identifying and surmounting barriers
• Barrier: The application architects
– Identification clues
• They rigorously defend their current multi-node spread as vital for
– The organization
– Uptime
– Scalability
• 90% of their machines will be empty or nearly so
• The architecture was set in stone a decade ago, and is designed to solve the
issues of that time, miniscule PCs
– Their typical barrier methods
• Lecturing you on how their way is the “only way”
– “Don’t you realize that these are business critical systems?” is used to justify
all manner of excessive purchasing
– They will lecture you on availability and scalability at the drop of a hat
– Techniques to surmount their barriers
• Show them the serious speedups possible by collapsing application layers onto
fewer machines and removing network time from chatty applications
• Ask them for estimates on just how much more their application will need to
scale, given that it is 7 years old and already in use firm wide?
© Ron Kaminski 2010, All Rights Reserved
62
Identifying and surmounting barriers
• Barrier: The entrenched fire fighting squad
– Identification clues
• They offer to work with you, but not today as there is an emergency
• They position themselves as “the experts” in an application
• They are hyper-sensitive to any changes in the environment, they view them as
“dangerous”
• “Our conference rooms are named after comic book super heroes!” revisited, when you
fly in to interview, everyone is fighting a fire
– Their typical barrier methods
• They position themselves as “must have” team members and then are never
• Beware their commitments to make data or specifics available, they will often be “too
busy” later to do it in a timely manner if at all
– Techniques to surmount their barriers
• Agree to work with them as valued members of the team, then ignore them in your plans
as they will always be too busy to help anyway
• Never trust them to come through with a key item, always plan for another way to get
what they promise that does not involve them
• Over time, train them that many of the “time consuming fires” that they fight are simple
pile ups of multiple pathologies that won’t bite if addressed in a timely manner
© Ron Kaminski 2010, All Rights Reserved
63
Identifying and surmounting barriers
• Barrier: The overwhelmed, outsource-able and scared
– Identification clues
• They have single functions, often somewhat amorphous, and difficult to tag a
dollar value on
• They are not in politically savvy management’s structures
– Their typical barrier methods
• They stall, seemingly frightened to take on any task without exact instructions
from their management
• The view tasks related to capacity planning as “Not their priority”
• They view all new functions as threats
• They seem to ignore all information not generated by their own function
– Techniques to surmount their barriers
• These are politically weak people in politically weak areas, stay away from
them so as not to have to rely on them
• If forced to work with them, work with their manager to emphasize that
capacity planning is an important priority that they cannot stall
• Help the good ones get out of that group
© Ron Kaminski 2010, All Rights Reserved
64
Identifying and surmounting barriers
• Barrier: “This is a database server only” DBAs
– Identification clues
• They claim that “In order to save the firm database license money, we are
concentrating the databases from multiple applications on just a few servers”
and “nothing else can run on these servers”
– Their typical barrier methods
• Outright refusal to try collapsing micro-applications onto database servers
• Claim remaining capacity on the 1/3 used database server is “for growth” but
are real hard to pin down for specifics, usually because there aren’t any
– Techniques to surmount their barriers
• Try to get them to allow/install only a certain small percentage of application
code on their machines due to “a network emergency”. That seems tiny and
reasonable.
– Use a number like 10% to 20%. They don’t need to know that that was all of the
applications that you ever dreamed of doing.
• Show them how your automated process pathology code works, to ease their
fears about rogue applications eating their machines alive and harming other
applications
• Praise them to their boss as “innovative and balanced problem solvers”
© Ron Kaminski 2010, All Rights Reserved
65
Identifying and surmounting barriers
• Barrier: Lying, manipulative project leaders
– Identification clues
• You are originally asked to model 400 users from a sample of 30. Later
they say, “Oh no! We meant 1000 users!”
– Their typical barrier methods
• Some project leaders view themselves as risk minimizers. Sadly, they
often feel that 60% excess hardware is a proper sized “cushion”, so
they inflate their usage estimate 60% to make the modelers justify
excess hardware for them
• They took 3 extra months to get all these whacky features in, way past
their deadline, but now time is an emergency and they need their
results immediately or they just need to buy hardware right away
because they have no time to test properly
– Techniques to surmount their barriers
• Speed. You can model this stuff far faster than they can get a load test
to work without half of those whacky features blowing up
• Ask more people for how many users really are going to be there
© Ron Kaminski 2010, All Rights Reserved
66
Identifying and surmounting barriers
• Barrier: Enthusiastic but “We went to Load Runner
Class and we absolutely have to to run huge
saturation load tests” drones
– Identification clues
• They don’t understand mesa tests and modeling is all that is
needed. Even if you can get a decent mesa test out of them, they
still want to do a saturated load test anyway
• They REALLY BELIEVE two seemingly counter intuitive things:
1.
2.
Your operations group must run out and buy exactly the machine
and memory that they dreamed up from dubious research for
their tests
They do not have to run against realistic data volumes with similar
indexes and size as intended production. They will NEVER create a
statistically relevant data source. They will frankly state: “It is
impossible!”
© Ron Kaminski 2010, All Rights Reserved
67
Identifying and surmounting barriers
• Barrier: Enthusiastic but “We went to Load Runner
Class and we absolutely have to to run huge
saturation load tests” drones
– Their typical barrier methods
• No matter how many times you say not to, they will always strive
to ramp up users at the start and ramp down afterward. Get ready
to lose your first and last measurement periods
• If you can get a realistic transaction mix from them, they will still
strive to run them too fast
– The 30 second contract review, 8 hours a day story
– Techniques to surmount their barriers
• Always question their user think times, then adjust your model to
deal with the silliness that you uncover. Maybe 20% of the samples
that I get have realistic transaction arrival rates, so beware
• Be consistent, over a series of tests you will wear them down, or
get them fired 
© Ron Kaminski 2010, All Rights Reserved
68
A mail message to a new fleet of “Load
Runner” enthused contractor drones
The purpose of load tests can be manifold, to test functionality, capacity, and “feel”. Modeling
based on a sample does the same things and more, and usually much faster and cheaper.
If you choose to run a load test, be sure to run a “realistic transaction mix” with the expected
blend of all commands, not just one kind. If you are limited to simulating a subset of intended
loads by physics (we don’t recommend simulating above 20 users per load running PC for accuracy)
we can then take that load and model much higher ones and any alternate hardware that you
might dream of.
We have these caveats to improve accuracy:
1.
Perform the tests on real, not virtual, servers for measurement accuracy
2.
Run a proper “mesa test” for sampling which includes:
A.
B.
C.
D.
E.
Make sure that the CPP group has a collector on your intended test machine days before the test
Start your test precisely on an hour boundary
Do not, repeat, DO NOT “ramp up” or “ramp down” users. Just start and go, 20 users per load
runner box will not overwhelm anything. Ramping is not required for models, indeed it is wrong to
do it.
Stop precisely on an hour boundary
Send mail to us telling us
I.
II.
III.
IV.
how many users you simulated
The precise timings
How many more users we should add in the models
Anything else pertinent
© Ron Kaminski 2010, All Rights Reserved
69
A mail message to a new fleet of “Load
Runner” enthused contractor drones
3.
4.
5.
The purpose of the test is to produce a flat topped “mesa” of usage that depicts your
users acting normally. A graph of CPU consumption should look like a rectangle with a
flat steady top, nowhere near saturated. We then take that sample of happy users
unconstrained and model what hardware is needed for more happy unconstrained
users.
Do a “practice run” several days before your real test to flush out issues and tell us so
we can see how well you followed mesa instructions
DO NOT do any of the following, which will waste your time, ruin the data and cause
rework
A.
B.
C.
D.
DO NOT “ramp up” or “ramp down” usage at the start or end of your tests. It just makes us throw out that
data
DO NOT try to “saturate the machine”. The models will find that saturation load, don’t waste your time.
Concentrate on producing an unsaturated load of happy users getting great response times
DO NOT try to simulate hundreds of users from one PC with one network card. It will fail or worse, produce
incorrect data leading to massive errors
DO NOT create loads with unrealistically fast “think times”. If the user is likely to do a transaction, then wait
5 minutes reading it or processing it, then set the inter-transaction wait time to 5 minutes, not 30 seconds.
Remember, your goal is to be realistic, not to have high unrealistic loads.
Mesa tests may seem odd at first, but in time you will learn to love mesa tests and their time and
cost savings to projects. After a few of them, you’ll never load test the old way ever again.
Questions? Please ask, or invite us to your team meetings for a confab!
© Ron Kaminski 2010, All Rights Reserved
70
The politics of capacity planning
in organizations
• How to win friends and influence people in the operations group
– Set up “being on the capacity planning team” as an aspiration goal, a
promotion path, for the operations folks
– Try to find an operations or O/S expert at the top of their game and
get them assigned to the capacity planning effort
• These are often the best acolytes and really take well to capacity planning
– As the operations staff start to use the capacity planning reporting and
pathology detection systems
• Praise their efforts and successes to management
• Coach their failures privately
– Get them (and their management) to realize that keeping process
pathology counts down reduces emergencies and call-outs, and
greatly contributes to system stability
– Train them on the tools so they start to use them and build new skills
• If the only users of the capacity planning reports are on the capacity
planning team, you are doing something wrong!
© Ron Kaminski 2010, All Rights Reserved
71
The politics of capacity planning
in organizations
• How to win friends and influence people in the application
development group
– In addition to the barriers presented previously, you may also
encounter
• The earnest improver, who takes the time to learn about new technologies
and tries to integrate their benefits into their software development lifecycle
• The non-technical manager, who may never understand all of the math and
formulas, but who will be far better at the political skill required for success
• External vendors whose future profits hinge on success
– Try to become an asset to each of these groups
• make sure that they see you as a willing partner in their success
• work late on their models
• help them succeed and get the resources that they need when they need
them
– Send mail when you work early, late or on the weekends (and CC your
boss of course), it shows that you are really trying to help
© Ron Kaminski 2010, All Rights Reserved
72
The politics of capacity planning
in organizations
• How to win over and influence your boss
– There are several types of bosses
• The experienced true believer
• The unbeliever
• The unconvinced cost counter
– There are techniques to deal with each
• Your goal is to convert the last two into the first one!
– Keeping all happy will involve deploying collectors, generating
workload characterized historical consumption web pages and
“What if..?” models of future consumption
• The key is to survive long enough to
– get a proper network queuing theory model based software
purchased in sufficient quantity to make a difference
– Get some applications leadership on your side
– keep the last two from canning you before you start to get
meaningful results on a large scale
© Ron Kaminski 2010, All Rights Reserved
73
The experienced true believer
• Usually you have worked with or for this boss before, so they
already know
–
–
–
–
How expensive the tools can be, so they are not shocked
What a reasonable time for results is
How to help enable your success
What battles to fight, and what battles to avoid
• My last 4 gigs have been for someone who I had either consulted
for or worked for
– Delivering results delivers career options for you!
• Characteristics of the experienced believer
– Patience
– Helps get the software quickly
– Helps break through organizational politics to get your collectors
quickly deployed
– Projects confidence in meetings with other management
© Ron Kaminski 2010, All Rights Reserved
74
The unbeliever
• These folks (often with a development background) are distrustful
of fancy methods like network queuing theory
– This is often based on an insecurity, they don’t understand complex
tools and thus distrust them
– Have made their career by betting on simple solutions and
extrapolating linearly
– Are often in their position due to management turmoil
• In several gigs I’ve had non-believers in the management structure
above me
• Characteristics of the non-believer
– Initial open contempt of scientific capacity planning methods
– Demand results before they help you get collectors in place to answer
it with a historical basis
– Often will throw CPU and memory at disk I/O slowness
– Can be turned, but wow, it sure takes patience!
© Ron Kaminski 2010, All Rights Reserved
75
The unconvinced cost counter
• These can be great bosses in time, because like scientists, they demand
proof before supporting you, but once they have it, they will be true
believers
• They either have no experience with sophisticated capacity planning, or
have had running the group forced on them by higher ups who have
• Characteristics of the unconvinced cost counter
– Repeated references early in the process to how much your group and your
software costs, and lots of implying that savings results had better surpass
that soon
– Caution early on, so they will spend the time with other departments getting
them to go along with you
– Thrive on informational updates, so show steady progress
• You don’t have to be perfect, just constantly getting better
– You’ll know when they switch to true believers when
• They start buying you more licenses!
• They stop complaining about costs
• The “We need to show results!” to “Do you need more licenses?”
conversion
© Ron Kaminski 2010, All Rights Reserved
76
Reporting
• There are a lot of tragically bad business graphics and
especially capacity planning reports out there. Issues
include:
– Graphics that distort the viewers perceptions
•
•
•
•
Quasi-3d
Black outlines around bar charts
Non-calendar displays of long spans of time
No color consistency
– Foolish consistency may be the hobgoblin of little minds, but it is also the
key to getting management to use your site for decision making (don’t
pay attention to “little minds” and “management” appearing in the same
sentence…)
• Lots of chrome, little content
– Tufte: “Question every pixel. Basically, any pixel that isn’t conveying new
data, get rid of it!”
© Ron Kaminski 2010, All Rights Reserved
77
Reporting
• Other issues that limit effectiveness
– Multi-page reports that nobody ever reads
• If your answer is so complex that it requires that much evidence, start
over on a new one
• They paid $10,000! It has to hit the desk with a thud!
• The “same thud” lives on!
– Relying on the untrained user to wade in and find the answers
themselves
• Some you can train, most no
• If any correlation of graphics requiring memory is needed, forget it
– Ron’s Position:
• Non-web presentations in general are useless relics of a bygone age.
Most of your reader’s data comes in hyperlinked form, so get with it or
be left behind
– Web reports of all nodes in the firm
• Most users really appreciate ways to see only their span of control
© Ron Kaminski 2010, All Rights Reserved
78
Reporting
• There are also some “Must have’s”
– Automated context that graphically
highlights when something is out of the
ordinary (managers love this stuff)
– Automated business and hardware
context, ideally driven by your CMDB,
that include
•
•
•
•
•
Hardware and software specifics
Business Purpose
Business owner
Primary and backup technical contacts
Ideally a text description of it’s business
function
• Other helpful links
© Ron Kaminski 2010, All Rights Reserved
79
The Zen of Great Reporting
• Seek minimalism in all parts of it
• Reduce graphic clutter
• Reduce user perceived complexity
– Workload color consistency is a simple “must-have”
• Reduce user choices and actions
– If the user needs to know 4 things to make a decision, they had
better be close on the same web page
• Add extra information that lets the user more fully
understand odd behaviors and situations
– Sorting it by date is nice too
• Don’t restrict yourself to measured quantities
– Workload response time detail is one of the most powerful
graphics that you can use
© Ron Kaminski 2010, All Rights Reserved
80
Reporting Examples
© Ron Kaminski 2010, All Rights Reserved
81
Reporting Examples (UNIX)
© Ron Kaminski 2010, All Rights Reserved
82
Reporting Examples (Windows)
© Ron Kaminski 2010, All Rights Reserved
83
Reporting Examples (Windows)
Tangent, Multiple Memory Leaks
• Here is an example of
a rather severe
repeating set of
memory leaks
– See the saw-toothing
memory?
– See the climbing
Commit Bytes in a
different sequence?
© Ron Kaminski 2010, All Rights Reserved
84
Reporting Examples (Windows)
Tangent, Multiple Memory Leaks
• When you dig
deeper, you can
see memory totals
by process owner
– People often want
to “blame
someone”
– Alas sometimes
the “Someone” is
harder to pin
down by just
username
© Ron Kaminski 2010, All Rights Reserved
85
Reporting Examples (Windows)
Tangent, Multiple Memory Leaks
• When you dig deeper,
we can see the
individual process
names leaking
– In time you’ll find the
best way to keep
them unique, we use
process start
date/time and PID
– You can show these
to the Fake_Name
vendor and then it is
hard for Fake_Name
to deny a memory
leak
– I believe that java is
Finnish for “memory
leak”
© Ron Kaminski 2010, All Rights Reserved
86
Reporting Examples (Windows)
Tangent, Multiple Memory Leaks
• Well it is hard to
deny a leak, but
some Fake_Name
vendor might
want raw data,
so…
– Since you
already have it,
put out some
csv files to be
easily mailed to
the vendor,
eliminating one
of their stall
tactics
© Ron Kaminski 2010, All Rights Reserved
87
Reporting Examples (Windows)
Tangent, Multiple Memory Leaks
• The right way to convey the message
– We detected the issue, and sent mail to the application owner, stating
• The exact processes with the issue
• They can expect to keep crashing every day or so until they get the vendor to
fix it
• Offers to help with data or technical calls
• We get no response at all
• Three weeks later, we get a request to add memory to the
machine…
– The owner “Can’t get the vendor to respond quickly” and wants to
reduce outage counts in the mean time
• Don’t get mad…
– Stay positive and helpful in tone, they are just trying to help their
users have less outages…
– but continue to urge them to turn up the heat on their vendors, but do
it in a nice way…
© Ron Kaminski 2010, All Rights Reserved
88
Reporting Examples
© Ron Kaminski 2010, All Rights Reserved
89
Reporting Examples
© Ron Kaminski 2010, All Rights Reserved
90
New! Reporting Examples Windows
© Ron Kaminski 2010, All Rights Reserved
91
New! Reporting Examples UNIX
© Ron Kaminski 2010, All Rights Reserved
92
Reporting Examples
© Ron Kaminski 2010, All Rights Reserved
93
Classic capacity planning question descriptions
and proper answering techniques
•
•
Capacity issues are usually an emergency to someone
Roughly 93% of the requests for upgrades are nonsensical if you have any
historical workload based resource consumption information
– So you have to say no in a way that makes the evidence clear
•
What to expect when you say no:
– The 5 stages of grief (also called the Kübler-Ross model)
http://en.wikipedia.org/wiki/K%C3%BCbler-Ross_model
•
•
•
•
•
•
Denial
Anger
Bargaining
Depression
Acceptance
Always give them a way to succeed along with your “no”, remember that may they
still have a real problem!
– “No, you don’t need CPU or memory, but you are doing 5500 I/Os a second to your slow,
locally attached C drive
• Can you turn down logging?
• Can you send those I/Os to fast SAN or RAM drives?
• Can you get help from your DBA pals?
– “No, you don’t need more CPUs, you need to fix those looping processes.”
© Ron Kaminski 2010, All Rights Reserved
94
Classic capacity planning question descriptions
and proper answering techniques
• Here is the pattern for this next section:
–
–
–
–
Real quotes from the users (disguised, slightly)
The evidence
The answer
What happened
• I want some interaction on these, if you did it better, speak up!
Share! That is what CMG is for!
• These graphs used in the examples are all homebrew perl and
GD:Graphics, and they are used at several firms
– Yes I will share the code if you want it, but sheesh, you can do better!
• You are going to want some form of screen graphics capture tool
– I use freeware ZScreen, downloadable from many sources, it is fabulous
© Ron Kaminski 2010, All Rights Reserved
95
Classic capacity planning question descriptions
and proper answering techniques
• User quote
– “We are keeping these machines
rather heavily loaded.” but they
won’t tell you why
• The evidence
© Ron Kaminski 2010, All Rights Reserved
96
Classic capacity planning question descriptions
and proper answering techniques
• The answer
– It turns out that this
application was on
three nodes, two
heavily used and one
lightly used
• They wanted a review
of each
– Is ustca027 too
empty?
– Is ustwa007 too full?
– Is ustca031 too full?
• Let’s use Relative
Response Time by
hour to answer them
© Ron Kaminski 2010, All Rights Reserved
97
Is ustwa007 too full?
© Ron Kaminski 2010, All Rights Reserved
98
Is ustca031 too full?
© Ron Kaminski 2010, All Rights Reserved
99
Classic capacity planning question descriptions
and proper answering techniques
• What happened
– The users are initially shocked to see that the capacity planners, whom
the view as machine stealers for VMs, are recommending that they get
more hardware!
– Once they started to understand relative response time graphs, they
became quite sophisticated at moving workloads around
– You’ll know that you’ve converted them when they e-mail you asking if
their IO_Wait could be solved if they split them over more drives or
better RAID choices
• The morals of the story
– Any vendor can show totals
– Favor vendors that show workload characterized historical views of
consumption
– Favor vendors that can show you workload relative response times, so
that your answers make sense to the business
© Ron Kaminski 2010, All Rights Reserved
100
Classic capacity planning question descriptions
and proper answering techniques
• We started getting warnings from our
automated checks:
10/03/23 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used
up to 392.920% of an available 400% from 2010/03/23 at 0200 until 2300.
10/03/26 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used
up to 394.572% of an available 400% from 2010/03/26 at 0000 until 2300.
10/03/27 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used
up to 396.000% of an available 400% from 2010/03/27 at 0000 until 2300.
10/03/28 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used
up to 392.920% of an available 400% from 2010/03/23 at 0300 until 2300.
• The evidence (here’s what the sparkline
looked like):
© Ron Kaminski 2010, All Rights Reserved
101
Classic capacity planning question descriptions
and proper answering techniques
• More evidence:
© Ron Kaminski 2010, All Rights Reserved
102
Classic capacity planning question descriptions
and proper answering techniques
• My initial suspicions were ‘Code improvement opportunities”
so I contacted my DBA pals:
© Ron Kaminski 2010, All Rights Reserved
103
Classic capacity planning question descriptions
and proper answering techniques
• Those CPU graphs with response time increases due
to CPU_Wait when they hit the “knee in the curve”:
© Ron Kaminski 2010, All Rights Reserved
104
Classic capacity planning question descriptions
and proper answering techniques
• The answer from my DBA pals:
© Ron Kaminski 2010, All Rights Reserved
105
Classic capacity planning question descriptions
and proper answering techniques
• What happened (the changes went in on Mar 29th):
© Ron Kaminski 2010, All Rights Reserved
106
Classic capacity planning question descriptions
and proper answering techniques
• What about the charts Ron?
© Ron Kaminski 2010, All Rights Reserved
107
Classic capacity planning question descriptions
and proper answering techniques
• Things to learn from this example:
– Not all code “innovations” work as efficiently as desired
• SQL developed in far flung places for even farther flung places is
especially suspect
• “When the answer is correct, the code is done”, well maybe not…
– Not all innovations will go through a rigid capacity planning
review
• You need either automated warnings or to take the time to scan
thousands of graphs often to detect and correct these
• You need fast graphical evidence to get fast reactions
– You need to go out of your way to be nice to DBAs, they will
save your firm millions if you let them, and if you only ring them
up when there is real evidence of mayhem
• Always ask their boss to praise their efforts, those memos come in
handy at review time
© Ron Kaminski 2010, All Rights Reserved
108
Classic capacity planning question descriptions
and proper answering techniques
• Many of you will
be deploying
virtual terminal
environments to
hundreds of
users
– What if
something
goes a little
wrong?
• The evidence:
© Ron Kaminski 2010, All Rights Reserved
109
Classic capacity planning question descriptions
and proper answering techniques
• The answer:
– We started ticketing suspicious CPU consuming VMware slices on Feb
3rd
– Most of it was Bezier curve screen savers! We banned them
• What happened:
– We got back more than half of our VMware farm!
© Ron Kaminski 2010, All Rights Reserved
110
Classic capacity planning question descriptions
and proper answering techniques
User quote:
I was wondering if we could get the memory increased on our Exchange 2007 CAS
servers USTCAX100 and USTWAX100? Right now both servers are running 4.25GB
and I would like to move them to 8GB. We are seeing performance issues with
those servers and we are noticing that RAM usage is at 80%-90% or higher all of
the time. Users are starting to notice this with Communicator. Due to the fact that
it can’t get a response quick enough from CAS, it is putting an exclamation point
on the communicator alerting them to address book issues. If we are not able to
increase the memory, the only other option would be to add more CAS servers in
the environment to balance the load.
We also are going to be increasing the load on these servers with the 2000 users
we will be adding to the North America environment from the XYZ Co. acquisition
and moving South American users to North America servers.
Please let me know if this is feasible or not?
© Ron Kaminski 2010, All Rights Reserved
111
Classic capacity planning question descriptions
and proper answering techniques
The evidence:
• First, look to see if anything
has gone wrong recently
They might be reacting to a
recent problem, but don’t
stop there
© Ron Kaminski 2010, All Rights Reserved
112
Classic capacity planning question descriptions
and proper answering techniques
The evidence:
• Looking deeper , we don’t see
a memory shortage, (there is
evidence of a slight leak)
paging is very low,
CommitBytes isn’t anywhere
near CommitLimit, but …
• CPU seems in short supply,
and the CPU Wait component
of relative response time is
huge
• Their short term performance
issue is due to CPU shortage,
not memory!
© Ron Kaminski 2010, All Rights Reserved
113
Classic capacity planning question descriptions
and proper answering techniques
The Answer:
• Along with the graphs from
the previous page (and getting
them to address the lsass
loop) we added two virtual
processors to this VMware
slice
• Note that if you disagree with
their solution, give them an
alternative that fixes present
issues
• We may give them more
memory later, when they’ve
earned it
© Ron Kaminski 2010, All Rights Reserved
114
Classic capacity planning question descriptions
and proper answering techniques
What happened:
• The CPU Wait
disappeared immediately
• The user’s immediate
issues were solved
• The users now know that
decisions will be based on
evidence, the results will
be real, and they like it!
• Hardware in use for a
growing application will
grow, but slowly
© Ron Kaminski 2010, All Rights Reserved
115
Classic capacity planning question descriptions
and proper answering techniques
Sometimes your own systems detect
problems, so answer in a way that
provides all required information
Hey folks, there is still one more issue, with imjpmig
process, the Input Method Editor, which lets you use
Japanese characters. It is looping regularly:
10/01/15 LOOP_PROBLEM: 3444 running imjpmig CPU
looped from Jan 15 04:59:54 until Jan 15 23:54:53 and
may still be looping.
10/01/16 LOOP_PROBLEM: 3444 running imjpmig CPU
looped from Jan 16 00:07:48 until Jan 16 23:54:58 and
may still be looping.
10/01/21 LOOP_PROBLEM: 5344 running imjpmig CPU
looped from Jan 21 13:59:59 until Jan 21 23:54:58 and
may still be looping.
10/01/22 LOOP_PROBLEM: 5344 running imjpmig CPU
looped from Jan 22 00:01:27 until Jan 22 23:54:56 and
may still be looping.
10/01/23 LOOP_PROBLEM: 5344 running imjpmig CPU
looped from Jan 23 00:01:25 until Jan 23 23:54:53 and
may still be looping.
I changed the workload to just highlight Input Method
Editor by itself. I also found a bunch of patches available:
http://search.microsoft.com/Results.aspx?q=imjpmig+d
ownloads&mkt=enUS&FORM=QBME1&l=1&refradio=0&qsc0=0
© Ron Kaminski 2010, All Rights Reserved
116
Classic capacity planning question descriptions
and proper answering techniques
What happened?
• Eventually they got the fix
migrated to production
and everything worked
fine from then on
– Don’t get discouraged if
folks don’t always do what
you want immediately
– Change controls, priority
conflicts and other issues
may stall the fix
– With enough graphical
evidence, eventually you
will win!
© Ron Kaminski 2010, All Rights Reserved
117
Classic capacity planning question descriptions
and proper answering techniques
• Ron logs in on a Saturday to work on slides for UKCMG (“Again! And
what do you get paid to do this?” asks my dear wife) and sees the
following:
• The evidence (from my pathology detection code’s morning mail)
CPU saturation found:
CPU_SATURATION_WARNING: Windows2000 node ustca337 used up to 99.000% of an available 100% from 2010/03/12
at 0400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustwasbx16 used up to 99.000% of an available 100% from
2010/03/12 at 1400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node uktcas06 used up to 99.000% of an available 100% from 2010/03/12
at 0300 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustca227 used up to 99.000% of an available 100% from 2010/03/12
at 0400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustca724 used up to 99.000% of an available 100% from 2010/03/12
at 0400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustcas44 used up to 99.000% of an available 100% from 2010/03/12
at 0400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustcas54 used up to 99.000% of an available 100% from 2010/03/12
at 0400 until 2300.
CPU_SATURATION_WARNING: Windows2003 node ustca088 used up to 99.000% of an available 100% from 2010/03/12
at 0800 until 2300.
© Ron Kaminski 2010, All Rights Reserved
118
Classic capacity planning question descriptions
and proper answering techniques
• The evidence continued
– Whenever a whole bunch of bad things happen
synchronized over many machines, think global tool
© Ron Kaminski 2010, All Rights Reserved
119
Classic capacity planning question descriptions
and proper answering techniques
• The evidence continued
– Whenever a whole bunch of bad things happen
synchronized over many machines, think global tool
© Ron Kaminski 2010, All Rights Reserved
120
Classic capacity planning question descriptions
and proper answering techniques
 This is really bad news, a critical Business
Sensitive / Critical production server doing
its normal real sqlservr workload with a Tool
process going on a CPU binge and causing
excessive response times due to CPU_Wait
© Ron Kaminski 2010, All Rights Reserved
121
Classic capacity planning question descriptions
and proper answering techniques
• The answer
– A new piece of monitoring code was installed BREAKING THE NO NEW
CODE INSTALLS ON A FRIDAY rule!
• What happened
– The code creator had deployed a new script, and he reviewed it after
getting mail about all of the warnings:
• ”This was a bug in a script update that I made; we should be seeing this behavior on
most of the attached server list. ______ is pushing out an update to the script now;
once this is done we’ll have to log into each of the affected servers, verify the
looping process is running sqlcheck.vbs, and kill it.”
– We were able to swiftly detect and fix the issue
• How would your site do this?
© Ron Kaminski 2010, All Rights Reserved
122
Classic capacity planning question descriptions
and proper answering techniques
• What we saw:
– We started getting Commit_Bytes approaching Commit_Limit warnings:
10/04/05 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 5 18:00:00 until Apr 5 23:59:00 and may still be.
10/04/06 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 6 00:00:00 until Apr 6 23:59:00 and may still be.
10/04/07 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 7 00:00:00 until Apr 7 23:59:00 and may still be.
10/04/09 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 9 00:00:00 until Apr 9 23:59:00 and may still be.
10/04/10 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 10 00:00:00 until Apr 10 23:59:00 and may still be.
10/04/11 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 11 00:00:00 until Apr 11 23:59:00 and may still be.
10/04/12 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 12 00:00:00 until Apr 12 23:59:00 and may still be.
10/04/13 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from
Apr 13 00:00:00 until Apr 13 23:59:00 and may still be.
© Ron Kaminski 2010, All Rights Reserved
123
Classic capacity planning question descriptions
and proper answering techniques
• We investigated, seeing rising total memory:
© Ron Kaminski 2010, All Rights Reserved
124
Classic capacity planning question descriptions
and proper answering techniques
• The evidence, memory by user:
© Ron Kaminski 2010, All Rights Reserved
125
Classic capacity planning question descriptions
and proper answering techniques
• The evidence, memory by leaking process:
© Ron Kaminski 2010, All Rights Reserved
126
Classic capacity planning question descriptions
and proper answering techniques
• The evidence, for the spreadsheet inclined:
© Ron Kaminski 2010, All Rights Reserved
127
Classic capacity planning question descriptions
and proper answering techniques
•
The answer:
– Clearly this application has a jlaunch process (run by the SAPServicePRG user) memory
leak
– You have two options:
•
•
Get them to patch/fix the application, or
Get them to reboot the machine periodically so that you don’t start paging hard and affect performance
– So you notify the project leader:
Hi all,
If you look at memory usage over the last few months on these three severs, you’ll see steady and/or repeating ramps.
http://ustwu002.kcc.com/node_reports/ustca146/memory.html
http://ustwu002.kcc.com/node_reports/ustca147/memory.html
http://ustwu002.kcc.com/node_reports/ustca148/memory.html
This leads eventually to warnings like these:
COMMIT_BYTES_PROBLEM: On ustca146, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be.
COMMIT_BYTES_PROBLEM: On ustca147, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be.
COMMIT_BYTES_PROBLEM: On ustca148, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be.
…and after that, when commit bytes hits commit limit, you can experience rather severe application slowdowns.
In every case, the major rising memory consumer seems to be jlaunch processes run by SAPServicePRG. Most recently:
PID 6160 on ustca146 started Mar 2 20:54:58
PID 3772 on ustca147 started Mar 2 20:54:50
PID 8032 on ustca148 started Mar 2 20:54:56
Could someone take a look at these to see if a fix is possible? If not, could we recycle these jlaunch processes, perhaps weekly, to keep memory usage down? Thanks
for looking!
© Ron Kaminski 2010, All Rights Reserved
128
Classic capacity planning question descriptions
and proper answering techniques
• What happened:
Hi Ron,
Thank you for keeping an eye on these servers! You are right, there is a steady growth of memory
usage by the SAP PRG processes on these application servers. This is not a surprise. There are
several known issues regarding memory leaks with the current version of the Java hibernate
libraries being used in the fake_name application and old fake_product. We have worked with the
application vendor, fake_name, to resolve some of the more significant issues that were causing
regular outages. Fake_vendor has not resolved some of the less-severe issues.
There are plans to upgrade the entire application suite and change the underlying application
execution platform from fake_product to new fake product. The application upgrade includes new
libraries for hibernate, and the memory leak issues related to hibernate with fake_product have not
appeared in new fake product. The landscape upgrade is currently scheduled for June. We will go
ahead and schedule a recycle of the old fake product to recycle the Jlaunch processes you
mentioned below. We will schedule regular process recycles until the system is upgraded.
Please let me know if you have any additional questions or concerns.
Thank you!
© Ron Kaminski 2010, All Rights Reserved
129
Classic capacity planning question descriptions
and proper answering techniques
• What happened:
– Memory leaks, key points to remember
• Graphics help get their attention,
• CSV files are there for the whackos who demand the real data
– Sometimes they say that they need it “to prove to the vendor”
» Believe me, the vendor usually knows all too well…
– It is easy to do and nips their evasions in the bud
– Remember the “stall techniques”?
• Sometimes they can’t, or aren’t, going to fix it
– Welcome to big corporations and “priorities”
– Then you need to get them to reboot periodically to get the leaked
memory back
• Do you have the graphs and data quickly available to discover,
document and communicate this?
© Ron Kaminski 2010, All Rights Reserved
130
We have this really cool way to see all of
the server’s disk space for the last 90 days
© Ron Kaminski 2010, All Rights Reserved
131
Classic capacity planning question descriptions
and proper answering techniques
The evidence:
Subject: Possible disk space issue looming on ustca479
Hi All,
Here is a view of total disk space and disk space used on ustca479:
Perhaps some purge/delete/cleanup is in order?
Ron Kaminski
© Ron Kaminski 2010, All Rights Reserved
132
Classic capacity planning question descriptions
and proper answering techniques
• The answer:
Subject: RE: Possible disk space issue looming on ustca479
Ron,
Thank you for the heads up. The increased disk space utilization is partially
due to enhanced logging that we have enabled over the past few
months. I have cleaned up some old logs and we will continue to monitor
the disk utilization to determine if additional disk space is required.
Thanks,
Matt
© Ron Kaminski 2010, All Rights Reserved
133
Classic capacity planning question descriptions
and proper answering techniques
• What happened:
Well, It was a start!
But alas, note the inexorable rise beginning again after the clean up.
© Ron Kaminski 2010, All Rights Reserved
134
An update from Friday…
• Note that the max space has grown
considerably, from 83 to 112 GB
© Ron Kaminski 2010, All Rights Reserved
135
Classic capacity planning question descriptions
and proper answering techniques
• The best way to deal with these is to avoid them
proactively by making great, workload
characterized consumption information available
to all
– Train your firm to use the capacity reporting and
pathology detection systems
• You have automated pathology detection, all the way
through ticketing issues, haven’t you?
– Think graphics, not tables of numbers
– If only a secret club know the capacity data, you are
making a big mistake
– Train OS support folks to use the “What if…?” models
© Ron Kaminski 2010, All Rights Reserved
136
Break Time!
• Please be back at
© Ron Kaminski 2010, All Rights Reserved
137
What I said about clouds and SAAS last year:
• Say goodbye to your data centers and your privileges folks!
– Cloudy days are coming, and this is good
• Paying people in each firm to worry about OS, backup, security, and staying
current was always expensive, and now it is ridiculous
– Change firms a few times and note how wildly different “It has to be this way!” is
• Our capacity planning needs, and tools, will have to change too
– Instead of vendor’s selling you software, many will sell the service running
on their cloud
• This is great! Let the vendor maintain their own code!
• They are the naturally cheapest way, the expertise needed is naturally
concentrated
• Having a year more to search for and find issues, I see a few
potential storm clouds in some firm’s sunny plans!
– Let’s dig into why…
© Ron Kaminski 2010, All Rights Reserved
138
Clouds and “Software as a Service”
• Definition: Clouds = Running our stuff on someone
else’s computers, plus whatever else will be needed for
the new demands that will place on us, like:
• Encryption, so we can run sensitive corporate data over the world
wide web safely
– Note that this is done on both sides, the user’s machine and in the cloud.
This may be an unpleasant surprise for firms that have replaced those
expensive desk top processors (and all that excess capacity) with “light
desktops “ running virtual machines on shared hardware
• Exhaustive disk cleansing when we delete files or parts of files
• Network lag measuring tools, because there will be slowdowns
and our users will want to direct their wrath
• Increased firm internet firewall bandwidth needed
• Increased firm internet bandwidth needed
© Ron Kaminski 2010, All Rights Reserved
139
What will those loads look like?
CPU In House Versus Cloud In House and Cloud In Cloud
25%
CPU needed
20%
15%
Printing
Web java front end
Security scanners
10%
Encrypt/Decrypt
Tools
Application
5%
OS
0%
Corporate application in house
Cloud application in house
Cloud application in cloud
And, we are ignoring the network load increase effects!
Moral: Clouds are gray, not green!
© Ron Kaminski 2010, All Rights Reserved
140
Cloud issues
• “We’ll just run everything in someone else’s cloud, so we won’t need
capacity planning any more. It will be the cloud vendor’s problem!”
– Clouds will place new , different, and often resource intensive new demands
on our firm’s computing infrastructure
– Capacity concerns will become very important, and historical records of what
consumed what will be paramount for figuring things out
• Someone is going to have to pay for all of that extra processing and it
won’t be the vendor!
• The “Mushroom Cloud” will be appearing at firms that ignore these risks
© Ron Kaminski 2010, All Rights Reserved
141
Clouds and “Software as a Service”
• Definition: Software as a Service = Letting someone
else run their code on their machines to serve us, but
undoubtedly with our data, plus whatever else will be
needed for the new demands that will place on us
– If there is customer identifiable information, we will need all of
that encrypt/decrypt overhead again
– Disk cleansing will be less of a priority as no one can run “disk
scrapers” unlike the cloud
– Network lag measuring tools, because there will be slowdowns
and our users will want to direct their wrath
– Increased firm internet firewall bandwidth needed
– Increased firm internet bandwidth needed
© Ron Kaminski 2010, All Rights Reserved
142
Other Cloud and SaaS issues
• The key thing to remember is that cloud and SaaS
vendors will have to eventually operate at a profit!
– This will drive them to the same attempts to economize that
your firms are trying now
• Big and cheap IO devices, that are of course much slower
• Virtualization will be a certainty, you will never know what fraction of
what hardware you will be on
• Architectural choices of the firm’s past won’t make sense any more
– What hardware largess do you tolerate now for “Mission critical
applications”?
» Hot spares?
» N+1 copies of data?
– Will your cloud vendor leave enough excess capacity for your
theoretical worst case?
» How will you be sure?
© Ron Kaminski 2010, All Rights Reserved
143
Other Cloud and SaaS issues
– And remember the graph, they have to run it on 2X to
3X+ the hardware for the same loads!
• Unless your firm’s Data Processing division is utterly
ridiculous in their spending (and many are) how can clouds
be cheaper?
• Clouds and SaaS only make sense when the nonhardware savings exceed the hardware and
network costs, or provide other business useful
opportunities
– Perhaps outsourcing a staff intensive application to a
SaaS vendor is still a really great idea
© Ron Kaminski 2010, All Rights Reserved
144
The moral of the story:
• Eventually businesses may evolve into partial cloud and SaaS
users when the overhead of extra processing is smaller than
some fraction (I’ll go out on a limb and say half) of the
average resources needed to run the application and the
security demands are low, and/or the total function cost is
lower
– Quick! Think of a low security function at your firm that you would be
happy to have some greasy haired geek intercept, and put that in a
low security cloud
• I couldn’t think of any as an example!
• Can anyone here?
– Almost all real corporate work will demand far more internal resources
to run externally than to run internally
• Be sure to add those costs to your cloud and SaaS plans!
© Ron Kaminski 2010, All Rights Reserved
145
The story continues…
• Go out and repeat my analysis at your firm on one of your firm’s
attempts to do it
• Publish a paper via CMG or elsewhere where you outline the
specific true costs in consumption and hosting spend
• If your costs come out like mine did, i.e. “This doesn’t make a lot of
sense!”…
– expect a flood of analyst calls from “consulting groups” wanting you to
expound on your “cloud computing experiences”
– expect some wholehearted chuckling and agreement that it is nuts
• I think that many firms are “acting like” consulting groups when they are
in fact trying to gather data to beat down internal pushes to “go cloud” or
to “go SAAS”
• Or they are consulting to potential cloud providers and giving them a less
than rosy view…
• For now, I would proceed very slowly…
© Ron Kaminski 2010, All Rights Reserved
146
Clouds, last words
• I used to live not far from here in Oviedo Fl
– Every summer day a lot of sunlight hitting the
swampy ground would create a lot of hot rising
moist air , so we had clouds and thunderstorms
about 3 PM each day
• In IT journals and analysts sessions, there is a
lot of hot moist hype filled air rising
– Maybe that is why they see clouds!
© Ron Kaminski 2010, All Rights Reserved
147
Last words: Don’t trust the vendor
(or yourselves)
© Ron Kaminski 2010, All Rights Reserved
148
In the interests of time, we are going
to skip some here
• But you all have the slides!
© Ron Kaminski 2010, All Rights Reserved
149
Modeling when all of the cards are
stacked against you
• In a perfect world, when new code is written
– There is a comprehensive test plan to verify functionality
– All issues are corrected prior to the capacity planning tests
– The capacity planning tests are performed on real (nonvirtual) hardware with known characteristics
– The testing group are old pros who know how to run a
proper “mesa test” which has…
• An hour of nothing
• An unsaturated hour of a realistic transaction mix, with realistic
think times, on a realistic indexed database, without excessive
application logging or extraneous monitors adding logging loads to
key disks
• Followed by an hour of nothing
– And finally a truthful estimate of expected user loads
© Ron Kaminski 2010, All Rights Reserved
150
Modeling when all of the cards are
stacked against you
• In the old days, we would be able to justify …
– A great testing team with experience
– Separate testing hardware (which also was used a test bed for getting the
“promote to production” process working well
– Real change control with handoffs
– No deployment to production without a model (my preference) or load test at
projected rates
• In the modern firm…
– a lot of those folks and skills were eliminated or outsourced
• The “Hardware is so cheap, we’ll just buy more!” mantra
• We’ll just run on someone else’s cloud!
– Outsourced code production teams may be “lacking in experience”
– Severe pressures to “Get it on the web now!” lead to some decisions that are
frankly dubious when viewed dispassionately
– If you are lucky to have modeling or load testing tools at all, the folks running
them are nowhere near as skilled as your old testing team, but they do have a
shorter deadline! 
• So what do you do?
© Ron Kaminski 2010, All Rights Reserved
151
Modeling when all of the cards are
stacked against you
• You make some concessions to reality
• Realities of this decade
– People will be running tests on Virtual machines
• Worse yet, they will be running tests on Virtual machines that do not know
that they are VMs, so their performance numbers will be way goofy at times
– Often you will encounter “shared development servers” and severe
problems getting folks to log off and stop using your test machines for
the period of your mesa test
– Your uptight project leaders are going to be oblivious to proper testing
techniques and/or the realities of physics
– Ron’s theorem:
• The chance of getting a clean unsaturated mesa test sample with a realistic
transaction mix and a realistic arrival rate decreases to 1/ (number of
continents that team communications span)
• So, you are going to get all sorts of messed up tests…
© Ron Kaminski 2010, All Rights Reserved
152
Modeling when all of the cards are
stacked against you
• A general rule is:
– “A careful modeler, who pays attention to calibration issues, can still get useful
information from a sample done on VMware hosted O/Ses
• So, lets’ start with the sample:
– Clearly there was no “quiet hour” of IIS_w3wp before and after our test, so
we have to adjust for the fact that our sample consumption is likely also
running extra processing
– At least the background “hum” of IIS_w3wp seems consistent
© Ron Kaminski 2010, All Rights Reserved
153
Modeling when all of the cards are
stacked against you
– Other issues, detecting when tests will not go as planned
(key “warning trigger words” highlighted)
• All of the following have been said to me by real people in the last
year:
– We are planning to go live next Thursday, and we haven’t run a test yet
while also saying:
» I plan on simulating 10,000 users logging on the system per second
from my laptop
• Test’s I’ve run indicate that with modern office web
connections to computer rooms that you can accurately
simulate around 20 users per test machine
» We’ve only got 10 machine licenses of the load testing software,
and I think that I can only use 3
• Smart load testing firms have clouds of testing machines (think
dozens or hundreds+) and specialists experienced with the
tools, (modeling from a small test still gives a better answer!)
© Ron Kaminski 2010, All Rights Reserved
154
Modeling when all of the cards are
stacked against you
» I plan on simulating 10,000 users logging on the system
per second from my laptop
• What in blazes are you running? I called a very large
and successful web payment system vendor where a
pal of mine works and he said that their maximum
updates per second ever seen (on cyber Monday no
less) are 416 per second hosted by a co-located
cloud of over 900 machines. You are going to get
10,000 mom’s per second putting in their address
and personal information on a diaper web site?
• Further research showed that 1,000,000 mom’s
had registered at in one year on the previous site
• That means all of the mom’s on earth will
re-register in 100 minutes!
• That must be some diaper coupon!
© Ron Kaminski 2010, All Rights Reserved
155
Modeling when all of the cards are
stacked against you
» I’ve never used this load testing software before
• Rational testers start testing loads midway through a
project to get info on what parts are slow so they can fix
them pre-launch, and then test weekly thereafter as
fixes go in to verify
» We are only going to test a few functions of the customer
web site, that will be enough for capacity planning purposes
right?
• A realistic transaction mix, ideally taken from a
production site with real users acting normally is the
“ultimate get” for accuracy. All else is supposition and in
my experience those “untested sections” can
sometimes be really intense
» Since production isn’t there yet, I’m going to have 20 real
people from the team act like users for an hour straight
• Long experience with project teams lets me know
whatever happens, those 20 users will not act like 20
real users, no matter what…
© Ron Kaminski 2010, All Rights Reserved
156
Modeling when all of the cards are
stacked against you
• More common issues with real people testers:
• Since they know the system, they start going
“ninja fast” during the sample period, generating
way more load per user than normal
• “I thought that we were supposed to “stress
the system”
• They know the system, so they type in perfect
addresses, so the verification rules are never
tripped, like they almost certainly would be in
the real world
• After about 15 minutes, they get bored, stop for
coffee or to complain to co-testers about the
project leader, their boss, their stupid firm, the
weather…
© Ron Kaminski 2010, All Rights Reserved
157
Modeling when all of the cards are
stacked against you
• More common issues with real people testers:
• Instead of a “realistic transaction mix” the
programmer pressed into service will check each
and every function (also known as a functionality
test) until every single function is tested, and then
start all over again
• They “can’t be bothered” to create a realistic test
user population’s data volumes, so they add their
mom, sisters, cousins, their kids baseball coach, and
that data will not match production data
characteristics
• Will a database of 35 users perform like
1,000,000 distinct moms in a database? Think
about the indexes, caching, etc…
• Whatever they say they will do, the first time that you work
with a new project team your chances of getting a good
“mesa test” hour with realistic transaction mixes and
consistent usage in all 60 minutes is about 0%
© Ron Kaminski 2010, All Rights Reserved
158
Modeling when all of the cards are
stacked against you
• The ever present unrealistic “think time” issue:
– I was still suspicious of the consumption that was attributed to just 20 users of
a contract review system. So I asked some questions:
I’m modeling away, and I have one more question:
Do you feel that this is a realistic rate of users activity, or much more than
they would do in an hour? What I mean is, we simulated 20 users for an
hour, working somewhat continuously. Is that how we expect them to act, or
was this really 20 “ninja users” working much harder than normal?
Let me know.
As it stands, 20 users used 77% of one processor, so 400 will use 1547.6%, or
almost 16 processors just for your web serving, not counting the OS, etc. I
have a feeling that while they were 20 users, they were 20 really busy users,
and that might be a bit distorted. Let me know how you feel or call.
© Ron Kaminski 2010, All Rights Reserved
159
Modeling when all of the cards are
stacked against you
•
What I heard back:
–
•
So I wrote back:
–
•
What is the pacing rate for the _____ script? ________ wrote this
and set the pacing to a fixed delay of 30 seconds. Sometimes that
can turn into a simulation of super aggressive users
OK team, so now the question is: Will your users really do all of that
work every 30 seconds? Don’t they ever get some coffee, take a call,
or something? Maybe they get together to plot a coup against the
manager who works them this hard? I can model it this way if you
believe that this is realistic. Let me know.
What they replied was key:
–
–
All the tasks are normal for a user accessing the Contract Browse.
They wouldn’t complete all those tasks in 30 seconds, however, it’s
probably over 3-5 minutes.
So I split the difference and called it 4 minutes per contract, or 1/8th
the load generated
© Ron Kaminski 2010, All Rights Reserved
160
Modeling when all of the cards are
stacked against you
• Spreadsheet tricks
– Determine the noise percentage:
CPU
IIS_w3wp
system
Before
During After
38.78 118.86 44.18
8.74
8.88 9.02
Users
Average
20 CPU due to test Thumbnail mdeled CPU
41.48
77.38
1547.6
0.34898
– Make scenario plans around their estimate
Users IIS_w3wp growth
Scenario plan 20 users
160
-34.90
200
-18.62
300
22.07
400
62.75
500
103.44
600
144.13
700
184.82
Note that the first scenario is labeled 160
users, or 8 times the 20 they told us.
Note also that we actually had to shrink the
CPU used to account for the noise in the
sample to get to 160 and even 200 users
The formula:
((((Test*(SP_users/S_users))/(Test+Noise))-1)*100)
© Ron Kaminski 2010, All Rights Reserved
161
Modeling when all of the cards are
stacked against you
• Then deliver the modeled results, with plans for future
improvements
4 CPU model
Contract Browse Capacity Planning Model
Load Test run on inwcq2211 2010-2-18 14:14 – 15:15
We used the load test to model growth scenarios to 400 simultaneous users and beyond on the
existing 2 CPU VMware node and also a 4 CPU VMware node. Other configurations are
possible, just ask.
Adjustments:
1) Since the “think time” was determined to be too low, we adjusted the given 30 seconds
to 4 minutes
2) This was not a pristine test, but it was the best that we could do given the time
constraints. Indeed our calculations show that only 65.1% of the test workload load was
due to the test load, the rest was noise that we adjusted for.
3) The test workload was put in the IIS_w3wp workload on inwcq2211.
http://ustwu002.kcc.com/node_reports/inwcq2211/wcpu.html
Here is a model of a 4 CPU system in the same scenario. Note again the growth of the IIS_3wp
workload from 160 to 700 users. Also note how response times barely rise from your sample to
400 users, only 1%. Note also how disk IO_WAIT starts to become a slight factor as you hit 700
users. This means that while CPU_WAIT will rise slightly, IO_WAIT will likely be the binding
limit on a 4 CPU system.
Summary
2 CPU Model
While it probably isn’t strictly necessary to go beyond 2 CPUs, we would recommend starting
with a 4 CPU system and watching it. If in time the users stay well below 2 CPUs, you might be
able to reduce it. VMware will take back the extra unused anyway, so it isn’t a costly decision
either way.
What we learned:
Your team did great for first timers on a modeling test mission! You should be proud of
yourselves! Still, we all learned things from this test that we could do better next time. Next time
we will:
In the first graph, you see the modeled CPU for all workloads from the model. Note how we had
to adjust for issues, so the sample is bigger than the first two scenarios. You can also see how
we grew IIS_3wp from the sample 160 to 700 simultaneous users.
The second graph shows user perceived response times for the IIS_w3wp user in the tests, and
clearly CPU_WAIT will be the limiting factor. You can see that it is slightly higher in the 400 user
loads, 14% higher to be precise. Is that bad? That is for you to judge, based on how you felt
about response time in your test. If that was great, then is 14% worse still great?
Do not “ramp up” to the desired number of users; go flat out from the start.
Try to find an empty node to run the test on.
Ideally run on a non-VMware segment to reduce VMware related issues.
If possible, run your test in a distinct username from other loads
Think hard about “Think time” and try to set it more realistically. Remember, we are not
trying to “kill the node”, we want to model happy users getting great service. We can
cause all other states in the model once we get a good sample.
6) Run a “test” test the night before to flush out script hiccups and then have a smooth run
on test day.
7) If you start and end on an even hour, the graphs will line up nicer! 
http://ustwu002.kcc.com/node_reports/inwcq2211/wcpu.html
1)
2)
3)
4)
5)
Please see next page for a 4 CPU model.
© Ron Kaminski 2010, All Rights Reserved
162
Modeling when all of the cards are
stacked against you
• Now get ready for the end run…
– After getting the modeled results, the project leader
now said that they planned on having 1000 users
• Do we believe them?
– Of course not!
• Then why would they say that?
– Because all project leaders like 150% surplus hardware to cover
up their cruddy code or other all too common mishaps
– So what do we do?
• Model it, and show them how it dies on disk I/O!
• As soon as you start making requests on them, their appetite
for surplus will suddenly and mysteriously disappear…
© Ron Kaminski 2010, All Rights Reserved
163
Modeling when all of the cards are
stacked against you
• So in the end, we did get some useful information to the
project
– We highlighted a probably looming disk I/O issue
– We got them to deploy to 4 CPU instead of a probably too small
2 CPU system
– We started their journey to being more precise forecasters of
hardware
– We detected and deflected an “end run” attempt at hardware
piggery
• The morals of the story:
– Sometimes you have to make do, but make due using as much
science and care that you can
– Expect an excessive amount of CPU_Wait on VMware based
models and really watch your model’s calibration
© Ron Kaminski 2010, All Rights Reserved
164
Know a bad method when you see it
• Many vendors are in the consolidation business these days
– zLinux
– VMware
– Other flavors
• Very few firms have great workload characterized capacity planning
information
• Many firms use outsourced or “vendor” estimates for sizing
– If you know that a firm doesn’t have proper capacity planning
information, what is a vendor going to use to make their estimates?
• Total CPU by hour?
• “industry estimates”?
– All of these typically grossly oversize machines
• Isn’t that how we got in this “too much hardware” mess to begin with?
• The biggest hour is never…
© Ron Kaminski 2010, All Rights Reserved
165
Know a bad method when you see it
• I have a management pal who says that every estimate
that he has ever received based on total consumption
is at least 2.5 times bigger than it needs to be
– Are your resource intensive maintenance activities
“Synched” across many machines?
• If so, any method that adds up the same hours across all machines
will recommend a lot of CPU
• Or, you could bust up your anti-virus runs over the whole evening
and get back several hundred CPUs…
– Each well meaning analyst who sees an estimate based on
“shaky” logic typically adds a 20% “fudge factor” as a
safety net
• If your estimate has been reviewed in sequence by several
analysts, at some point it is mostly “fudge”
• Stick to real consumption figures sampled from real systems taken
in hours without excessive of maintenance activities
© Ron Kaminski 2010, All Rights Reserved
166
Know a bad method when you see it
• “Canary Machines” is the highly dubious practice of putting
higher loads on a small subset of machines, and the
upgrading others when the canaries fall dead
– That method guarantees that
• some users will have slowdowns or failures
• you will have ongoing monitoring costs
– If it is a highly dynamic or surge prone environment, will you be
able to get hardware purchased, shipped, installed and working
before the next surge?
• What if the surge lasts weeks?
• When people don’t have proper tools or training that
enable them to see things before they happen, they may
choose to find problems by letting them happen to
someone
– You don’t have to…
© Ron Kaminski 2010, All Rights Reserved
167
Goals to work towards
• Get collections of some kind, on every node and device that
your firm uses. That includes:
–
–
–
–
–
Machines
Disk arrays
Routers/switches/network stuff, including firewall servers and the like
Your boss’s iPhone
Anything and everything!
• Get graphical ways to display all of that data
– On the web, all of the time, no exceptions
– “We’ve been having disk issues with the vendor, get us graphical
evidence of issues…”
• Plan for needing (and producing) tabular and/or CSV data
– “We want to get a quote for zLinux, could you give us every machine
that we have, CPU for the last 30 days in 15 minute intervals?”
– It comes in real handy when you get a new process pathology idea and
want to test it
© Ron Kaminski 2010, All Rights Reserved
168
Goals to work towards
• Train multiple people on how to use your capacity planning tools
– That way someone can go on vacation!
– …or leave the firm and they aren’t left in a lurch
• Find bright young folks and make new capacity planners out of
them
– I like to get a young system administrator at the top of their game,
who are maybe starting to get a little bored with OS tasks
• These are often your best protégés
• They also come in real handy when automating the installs of thousands
of collectors!
– Install collectors on 2000 machines? No problem!
• Use their natural technical curiosity to “hook them” and successful
models to “reel them in”
– Drag them to CMG and introduce them to Adam Grummit, Dr. Buzen,
Debbie Sheetz and all of your CMG pals that you call when you are
stuck
• Because they will need them too!
© Ron Kaminski 2010, All Rights Reserved
169
Goals to work towards
• Train project leaders about capacity planning before the last days of
a late project when they “Need to buy hardware right away or we
won’t make our date!”
• Get capacity planning in your firm’s defined software development
lifecycle or find out what development paradigm is in force in your
firm and get modeling and capacity planning in it
• Get a simple rule enforced: “No new hardware without a model or
review”
• Model new projects early, way before they are “almost done”
– The earlier an issue is found, the cheaper it is to fix
– Saving a project leader from disaster a few times is a great way to get
future cooperation
• Realize that sometimes you can’t do it all yourself
– Bring in some great technical capacity planning staff to help
• Freudian slip
– It is a blast when they are used to different presentation styles…
© Ron Kaminski 2010, All Rights Reserved
170
Things Not To Do
• While having lots of data and lots of experience can be quite powerful, it
can also be scary to others without it
• Being seen as a “know it all” harms your effectiveness, by reducing your
effective communications
– The smartest person in a firm that nobody can stand to listen to is not
effective
• Ron’s hard won “tips”
– Take the time to be seen listening
– Never take credit, always share successes
– When someone does some great work to fix a problem that you uncovered,
send a note to their boss (and maybe a few higher in the chain) asking them to
thank the fixer and praising the timeliness and quality of their work
– Avoid personal pronouns when describing a problem
• Don’t say “I noticed…”
• Say instead “The data shows…”
– Always use graphics to deliver bad news, people can argue words
– Ask your boss and other old-timers to help and offer advice
© Ron Kaminski 2010, All Rights Reserved
171
An audit list
• Does your firm staff for Capacity Planning success?
– Management sponsors high enough in the organization to
compel behavior changes?
– Gurus to design and ensure proper collection, data reduction
and presentation systems
• Ideally there should be at least one per decade or two and perhaps
per major O/S
– Great systems and installation staff to deploy collectors quickly
– Experienced pals at other firms who have already done it?
– Can project leaders and key application technical personnel be
trained and maintained so that they can effectively use the
capacity planning reporting systems independently?
– Is there sufficient continuity in key “outsourced or co-sourced”
support team members to benefit from the firm specific
knowledge available?
© Ron Kaminski 2010, All Rights Reserved
172
An audit list
• Does your firm:
– base all hardware purchase decisions on workload characterized
resource consumption over historically relevant periods and/or
realistic transaction mixes within relevantly sized test
databases?
• Not just on “total CPU” or memory
– have speedy (ideally web delivered and graphical) ways to check
all of the “sacred five” (CPU, DISK IO, Memory, Network and
Response Time for key workloads)?
– have key workload specific “drill down” capabilities when
needed to break down giant workloads into meaningful subgroupings of interest to applications folks?
– have regular, automated pathology detection and notification,
ideally with automatic ticketing to drive down incidents of
needless consumption and chaos?
© Ron Kaminski 2010, All Rights Reserved
173
An audit list
• Does your firm:
– adequately fund an independent testing group with:
• dozens of servers with licenses to run your load generation software to
generate realistic loads?
• trained testing staff who can:
– run proper mesa tests?
– design realistic load scripts with proper think times?
• sufficient disk storage to make realistically sized test data sets for multiple
simultaneous development efforts?
• In a world where more an more web page interactions will not be on PC
browsers, but instead from handhelds, can you generate significant
percentages of handheld browsers in your favorite load generation tool?
– Our future growth is in_____ and what browsers are they likely using?
– make capacity planning a known, taught and enforced part of their
documented software development life cycle that all projects must
follow?
– make outsourced partners follow the same rules?
– invest in staff training, including conferences to keep skills current?
© Ron Kaminski 2010, All Rights Reserved
174
An audit list
• Does your firm’s management:
– enforce proper capacity planning methods on both new
and established applications?
– work with development management to help them prove
the cost and time savings available from proper capacity
and performance management practices?
– buy enough collectors?
– have firm-wide instrumentation standards and
enforcement?
– protect the capacity planners from powerful applications
developers who used to get all the hardware that they
ever dreamed of, and now are angry at the “machine
stealing” capacity planners?
© Ron Kaminski 2010, All Rights Reserved
175
An audit list
• Is your firm’s management prone to the hype storms in IT media?
– Do they still justify SaaS and or “Cloud Computing” efforts using the false
assumptions that “in firm” hardware costs will be reduced?
– Do they believe disk array vendors that tell them that they no longer need to
worry about
• regular defragmentation?
• disk layout and RAID choices?
– The “Big cache” effect of too much hardware versus the disk vendor’s promises story
– The “Your defragger isn’t!” story
– Do they still believe that virtualization is the only answer? Or, even worse,
required?
• Do they understand the benefits of “stacking “ different applications on one machine?
• Do they understand the benefits of “collapsing” an application from multiple nodes to far
fewer or just one?
– Think that 24 X 7 “twitch monitoring” of computer systems by staff is effective
and efficient?
– Do they outsource IT systems and blindly trust and expect good results?
• All of these are provably false, yet popular and many are still receiving
venture funding right now…
© Ron Kaminski 2010, All Rights Reserved
176
An audit list
• Does your firm have automated recovery procedures in
place for their capacity planning systems?
– It is a simple fact, machines crash, including the ones that
you will depend on for capacity planning. Make sure that
you have automated and tested recovery procedures for:
•
•
•
•
Retrieving all that remotely collected data
All of your processing of data into workloads
All of your web report file creation
All of your pathology detection and ticketing
– Ideally you will test all of this in several distinct parts of the
process so that you can recover swiftly (and as
automatically as possible) when it happens to you
• And it will happen to you…
• maybe when you need to work on slides…
© Ron Kaminski 2010, All Rights Reserved
177
An audit list
• The best audit results your system can get
– When a project team member sends you and all of his
group links for all of their project’s machines on your web
reporting system saying that they review it weekly to find
performance issues and track the effects of changes
– It is even better when they find issues on their own and
use the web reporting system to highlight them!
© Ron Kaminski 2010, All Rights Reserved
178
An actual user’s letter…
Hi Folks,
I try to do checks on the Secret Server health on a weekly basis.
This week there are a few anomalies showing (they may have been temporary and gone now).
It might be worth taking a look at them to ensure there are no underlying problems.
http://ustwu002.kcc.com/node_reports/ustcca038a/memory.html
PAW Ustcca038a memory use seems to have jumped significantly since the 11th April.
It’s not a dangerous levels right now, but it may be worth looking into what is utilizing it, just in case
something is up.
http://ustwu002.kcc.com/node_reports/ustcca038b/wcpu.html
PAW ustcca038b CPU utilization went up significantly on the 14th. Looked like it peaked and started to
go down again, but worth a check.
Thanks.
North Atlantic Data Warehousing Support
Secret Name
Kimberly-Clark Corp.
IT Services: North Atlantic Data Warehousing
Kimberly-Clark Limited (registered in England under No. 308676) registered office address 1 Tower View, Kings Hill, West Malling, Kent ME19 4HA. Kimberly-Clark Europe Limited (registered in England under No. 4060641) and Kimberly-Clark
European Services Limited (registered in England under No. 4071548), registered office address 40 London Road, Reigate, Surrey RH2 9QP.
© Ron Kaminski 2010, All Rights Reserved
179
Summary
• Proper capacity planning isn’t just running a product
that produces screens of output to check off against a
requirements list
– Proper capacity planning data, properly presented, is
useful to all strata of IT and even the end users
• While it does require investment and knowledge, the
rewards can be immense
• Keep coming to CMG to stay current and see past the
hype!
– Get better at this than me and get up here and share how
you do it with everyone else!
– Write a paper for next years conference!
© Ron Kaminski 2010, All Rights Reserved
180
The Book Shelf
• The Visual Display of Quantitative Information, by Edward R.
Tufte, published by The Graphics Press
– You will never look at vendor resource consumption graphics
products the same way again
– Join me in the “Ban 3-D and bouncing lines with oversize dots to
represent sampled quantities” movement!
– If you don’t already have it, run from the room right now and go buy
it, it is that good
• Envisioning Information, by Edward R. Tufte, published by The
Graphics Press
– More great ways to think about human perception of numbers
• Handouts from any of Tufte’s lectures
• http://www.edwardtufte.com/tufte/
• The Visual Miscellaneum: A Colorful Guide to the World's
Most Consequential Trivia by David McCandless published by
Collins Design
© Ron Kaminski 2010, All Rights Reserved
181
The Book Shelf
• Learning Perl, by Schwartz & Christiansen, published by
O’Reilly & Associates Inc., the llama book ( Learn perl in a
few cross country flights!)
• Programming Perl, by Wall, Christiansen & Orwant,
published by O’Reilly & Associates Inc., the camel book
(This book is a must have)
• Graphics Programming with Perl, by Verbruggen,
published by Manning Publications, (This book really
helped me a lot when I decided to make my own reporting
system, check out the online sample chapters)
• http://www.cpan.org
© Ron Kaminski 2010, All Rights Reserved
182
The Book Shelf
• Cascading Style Sheets, The Definitive Guide, by Meyer,
published by O’Reilly & Associates Inc., the salmon book
• HTML & XHTML, The Definitive Guide, Musciano & Kennedy,
published by O’Reilly & Associates Inc., the koala book
• Perl Graphics Programming, Wallace, published by O’Reilly &
Associates Inc., the colubus monkey book
• CGI Programming with Perl, Guelich, Gundavaram & Birzniks,
published by O’Reilly & Associates Inc., the mouse book
© Ron Kaminski 2010, All Rights Reserved
183
The Book Shelf
• New Operating Systems on your plate?
– Windows® Internals: Including Windows Server 2008 and
Windows Vista, Fifth Edition, Russinovitch, Solomon &
Ionescu, published by Microsoft Press; 5 edition (June 17,
2009)
– Unix Power Tools, Third Edition, Powers, Peek, O’Reilly &
Loukides, published by O’Reilly & Associates Inc., the
power drill book
© Ron Kaminski 2010, All Rights Reserved
184
General Questions?
• Rules:
– No “Which vendor…” questions!
• All vendors do great things, often in different ways, and in ways
that change over time. Effective use comes from deep
understanding of their methods. CMG is a great place to ask the
vendors those questions and keep current!
• That is what the nightly drinking time is for! ;^))
– No “Which client did that…” questions!
• They may be in the audience!
© Ron Kaminski 2010, All Rights Reserved
185
Stump The Modeler!
• I’ve dumped a lot of material on you today, and some
among you have some great questions that may
bring a nuance into focus
– Ask them!
• Or you might see things differently than I do
– Write your own course! I’ll come and see it!
• Make sure to use a lot of graphs!
© Ron Kaminski 2010, All Rights Reserved
186
Give Blood!
• I have, regularly, since I turned 16, every 8
weeks, it is good for you and society!
– http://en.wikipedia.org/wiki/Blood_donation
– http://www.redcross.org/donate/give/
Oh Boy! Legalese
• Any process names, product names, lyrics, trademarks or
commercial products mentioned are the property of their
respective owners
• All opinions expressed are those of the author, not any of the
author’s present or past employers
• Any ideas from this paper implemented by the reader are
done at their own risk. The author and/or his present or past
employers assume no liability or risk arising from activities
suggested in this paper.
• Work safe, and have a good time!
© Ron Kaminski 2010, All Rights Reserved
188
Thank You So Much For Listening!
Write A Paper For CMG!