I’ve been doing quite a bit of work lately on metric collection and reporting.
It has been interesting and, as usual with these types of projects, enlightening. I’ve learned a few new things, learned that some of things I thought I knew I didn’t, and reinforced some other thoughts.
I suppose I can break this down into several observations and thoughts….
1. People are scared of metrics
If I really sat down and thought back through the years I could say I’ve seen this particular movie before. Something about this new project though really brought this out to the forefront though. People are nervous and scared about metrics, what they will “say”, who will “use” them and why.
I tend to to see metrics as part of a Evidence Based Management approach. How on earth can you fully understand where you have been, where you are going, and if your decisions are leading you to the right place if you don’t measure anything at all?
Imagine trying to drive a car without a speedometer, odometer, gas gauge, or…I don’t know something basic like…sight. Imagine just driving by “feeling” like you are some kind of Jedi or something.
The fear though is real enough. As discussed at great length on this other blog. Bad managers will probably misuse the information and this makes people nervous.
2. There is not a secret, magical, list of ITSM metrics out there for you to use
Management wants reports. They want them now. They want you to produce the reports yesterday actually. They truly believe that somewhere on the internet, there is in fact, a list of metrics that will work for every organization.
If we can use the car analogy again…what metrics are there for you in every car?
Speedometer, Odometer, Fuel gauge, Tachometer, some other things to tell you if your headlights are on or if you are signalling to turn left…maybe a few other things but not too much more. So, we have basically like…15 things or so that are “common.”
For IT, here are 17 common metrics (as good as any other list):
- How many Incidents do you have over X time
- What is the mean/median/min/max time to resolve an Incident
- What is your “1st call resolution” rate
- What is the mean/median/min/max time to resolve an Incident that has been escalated (meaning, excluding all those that count as “1st call resolution”)
- What is the Incident breakdown by “Service” (failing that, by “Application”)
- How many Changes do you have over X time
- What is the mean/median/min/max time to implement a Change
- What is the Change breakdown by “Service” (failing that, by “Application”)
- What is the driver of each Change (break/fix, maintenance, enhancement, transformation) – what % of all changes of each?
- What is your “up time” over X time
- What is your “mean time between failure”
- What is the uptime/mtbf breakdown by “Service” (or App…)
- What is the mean/median/min/max number of documented requirements per project (this is “scope”)
- What is the mean/median/min/max number of “changes” to scope per project
- What is the mean/median/min/max % of defects (regardless of “priority”) allowed/accepted per project. (meaning, how many go into production)
- What is the mean/median/min/max amount of time each project takes to complete
- What is the mean/median/min/max amount of cost each for each project
And this list doesn’t cover anything about Requests, or Problems or How pissed off your customers are (commonly known as Customer Satisfaction).
But still, even using that list isn’t a good idea without first trying to understand “why” you need them…
3. Don’t start with the metrics even if you have a nice list of them…
What is the story you are trying to tell? What is the problem you are trying to solve? Why do you need these metrics in the first place?
The GQM method seems like a good approach to me. Of course that isn’t “easy” either. You still need to work (think) on what your goals actually are and if they are appropriate/useful.
Then you have to figure out what the “right” questions are and then figure out what your metrics are.
4. Balance is still required…
“Be careful what you measure…it may be the only thing getting done.” No idea who said this first. I’ll take credit if it can’t be found anywhere else on the internet.
Going back to the first observation/thought – the concern that the metrics will be used for bad purposes or by bad managers – we have to be careful on what we report on.
I believe all metrics can be broken down into at least 3 different categories:
- Performance (Something over time – like # of Incidents resolved over the past 30 days)
- Compliance (SLA, or target/expectation — like # of Incidents resolved within 1 day)
- Quality (examples could be: Customer Sat or % of defects put in production or Mean Time Between Failure or # of Repeat Incidents or # of Incidents after a Change Implementation)
You can also use “Value” as a category if you can determine benefit/cost in some manner. I think it is possible to use something like # of times the service is used as an indication of value (for example – The number of hamburgers sold is an indication (not saying THE ONLY, but an indication) of the value your consumers place on that product).
When producing a report, it speak to a Goal, answer a question, and not be shown in isolation. I think it is important to show the combination of the 3 (or 4) categories of metrics in any report/presentation.
If you focus too much on 1 type of category it will probably be at the cost of one (or more of the others).
5. Stuff I’ve read/found that might be helpful…
- Mr. Finister’s Metrics 101
- A good powerpoint by Mr. Finister
- 7 step CSI and GQM
- GQM breakdown
- Cost/Benefit of Metric Collection for App Development
- Checklist for any project…including one about Metric Collection
- A case of (mis)using metrics
- Efficient Metrics