Ideas on Categorizing Changes and Problems…


Everybody categorizes changes and problems. But what I find is that the categories are typically too narrow and don’t allow for broader understanding of  what is driving your IT shop.

Usually data will show that you do a lot of work in a certain application or service. Or that your changes are only planned 8 days in advance or your emergency changes are 15% of all changes. Or maybe you had 13 problems in Email last year.

But this doesn’t help explain what is driving these numbers

Change Categories

I am a big fan of not just capturing if your Change effort was an Emergency or not but what the real driver for the change was.

In particular, I believe you can “categorize” your change efforts into just a few buckets:

Break/fix – something is broken and you have to fix it.

Planned Maintenance – Maybe this is a planned patch or reboot or even upgrade of your application. This is just to keep things going, maybe to maintain support with a vendor or close a security hole. There is no “additional benefit” to the client  that is driving this effort there might be some, but it isn’t the driver. How do you know if it is the driver? Well, how are you selling the effort? Are you selling it the change effort as something that will change the way the client works (additional features) or will they even notice? Is it more for your IT support staff or for the client? If it is for your IT staff – it is PM (unless it has already broken, then it is B/F).

Enhancement – This one is tricky. Most people in IT will want their work to go here but in reality, it is probably PM. Enhancement is for change efforts that will ‘enhance’ the way the client does work today. It will cause some Business change as a result. If your change doesn’t do this, or doesn’t intend to do this, it isn’t an Enhancement. Some examples:

I have a plain old telephone system. I am switching it all out for IP telephony at some considerable expense. This requires a lot of IT engineering and transition planning. When it is done, the business will be able to make phone calls just like they do today.

So, what kind of change is that?

If you answered Enhancement – you would be wrong. Nothing about how the business functions will be changed because of this effort. They will still make phone calls like they used to. The underlying technology has changed but that did not “enhance” the business did it? No, it did not.

Another example: I am using Exchange 2003 and I’m upgrading to Exchange 2010. There are a lot of new features that we may take advantage of in the future. This change though is just getting us to the new infrastructure. There is a lot riding on this because our company lives and dies with email.

What kind of change is this?

Planned Maintenance. Again, it does not change the way the business works. It is a background change for IT. That doesn’t take away from the hard work, the long hours, or the importance of the work. It just categorizes it for what it is.

One more example: When a new person is hired, the hiring manager (or office manager) can now click a single link from the Companies Home page and fill out a small form. This will kick off all the tickets that need to be created for laptop provisioning, phone provisioning, and account provisioning. The master ticket will then be emailed to you. You no longer have to call the Service Desk and track multiple tickets.

Finally, an Enhancement!  This is taking current service provided and making them better. Crucially, this is going to change the way your client works – not a drastic change, but a change nevertheless.

Transformative – This change is fairly rare. This is really transforming the way the client does business. This isn’t a fairly small step (Enhancement) this is a large step requiring extensive training and probably new/changed business processes.

Example: We are integrating Email, Voice Mail, Instant messaging and creating “soft phones” on your laptops. We are also introducing camera’s for all laptops/desktops so we can do video conferencing. We are calling this Unified Communications.

This can be a fairly dramatic change to an organization requiring a lot of training to your customer base. There are also a lot of new features (voice mail accessible in email, ability to “see” each other) that need to be discussed and explained so we can make the best use of them. One of the biggest might be the “soft phone” on the laptop. Now your work phone travels…so we expect you to pick it up. : – )

Perhaps though, this isn’t a big enough change for you to qualify as “transformative” so you may only call it a “enhancement” – that is fine. The point here is that you have to think about it and you can’t call everything you do “transformative” or an “enhancement” because of the impact it has on IT – the impact has to be on the customer!

Finally, the last type of change I would categorize is:

Legal or M&A – Any type of change that is dictated by legal requirements or because of merger and acquisition activity. Typically there is little lead time for this work – so it is actually “interrupt” work. This is probably coming across as an “Emergency” change – but the reason it is an emergency change is very different than say B/F. You probably need to know that so you can explain why Emergency changes have gone up or why you can’t get them below a certain point. If your business is buying/merging or heavily regulated with lots of rule changes – there is only so much an IT department can “predict” – the rest is reactionary.

Problem Categories

Now with all that in mind, I was thinking of Problems. What are the main categories of Problems?

The list I came up with are:

Hardware – Rather simple one here. A hardware error. I suspect you won’t see many of these but still, they happen from time to time.

Engineering/Configuration – This is an error that was introduced in engineering or configuration of the system, application or service. This could be vendor caused or internal engineering caused. This type of error is typically troublesome, hard to find, and takes awhile to fix. If you are lucky it is something relatively straightforward like a setting was wrong on your Load Balancer so it wasn’t actually distributing the load properly.

Administrative/Operational – This is an error caused by the inattention of the operational staff due to a) ineptitude b) ignorance or c) understaffed (other priorities). Things like, a disk filling up is an Administrative/Operational issue. Patches not being applied in a timely manner causing a security hole to be exploited is a Administrative/Operational error. A redundant NIC failure that isn’t noticed, then when the primary fails…the server is offline – is a mixture of Hardware and Administrative/Operational error.

That is it. All errors I can think of will fall into one of those 3 categories.

So, what do these categories tell you?

Well, for changes it tells you what type of work you are doing – Break/Fix and Planned Maintenance are KTLO (keep the light on) type work. They are “operational” in nature. They don’t excite the business at all (they do however, keep it running).

Enhancements and Transformative changes are all about BITA (Business IT Alignment) – what are you doing to help the business grow?  Do think you are aligned with the business? Why then are only 10% of all your changes either Enhancements or Transformative?

Legal, M&A – is a mixture of KTLO and BITA. You have to do it keep the lights on but it isn’t IT driven, it is business driven. This data can show them that you do jump when needed.

You can even use these categories to help set urgency (or priority if you are so inclined). What is more important – KTLO or BITA? Where do you put more of your attention, more of your star players? Probably BITA, but that is for you to decide.

On the Problem data, the Hardware category might be interesting but it probably won’t be. What will be interesting to know is if most of your Problems are Engineering related or Operational related. Is it because you are not doing a good enough job in Service Design/Transition or because you are not doing enough in Operations?

They require different organizational responses so it is probably quite helpful to know the answer. Should you invest in a new Testing team, testing manager, testing software or should you spend more on hiring (or outsourcing) Operational efforts? Maybe you have poor processes, or poor operational tools (like an Event tool or alerting tool).

Categorizing your changes and problems in this manner can help you make these decisions in a way that the traditional category methods cannot.



I’ve been doing quite a bit of work lately on metric collection and reporting.

It has been interesting and, as usual with these types of projects, enlightening. I’ve learned a few new things, learned that some of things I thought I knew I didn’t, and reinforced some other thoughts.

I suppose I can break this down into several observations and thoughts….

1. People are scared of metrics

If I really sat down and thought back through the years I could say I’ve seen this particular movie before. Something about this new project though really brought this out to the forefront though. People are nervous and scared about metrics, what they will “say”, who will “use” them and why.

I tend to to see metrics as part of a Evidence Based Management approach. How on earth can you fully understand where you have been, where you are going, and if your decisions are leading you to the right place if you don’t measure anything at all?

Imagine trying to drive a car without a speedometer, odometer, gas gauge, or…I don’t know something basic like…sight. Imagine just driving by “feeling” like you are some kind of Jedi or something.

The fear though is real enough. As discussed at great length on this other blog. Bad managers will probably misuse the information and this makes people nervous.

2. There is not a secret, magical, list of ITSM metrics out there for you to use

Management wants reports. They want them now. They want you to produce the reports yesterday actually. They truly believe that somewhere on the internet, there is in fact, a list of metrics that will work for every organization.

If we can use the car analogy again…what metrics are there for you in every car?

Speedometer, Odometer, Fuel gauge, Tachometer, some other things to tell you if your headlights are on or if you are signalling to turn left…maybe a few other things but not too much more. So, we have basically like…15 things or so that are “common.”

For IT, here are 17 common metrics (as good as any other list):

  1. How many Incidents do you have over X time
  2. What is the mean/median/min/max time to resolve an Incident
  3. What is your “1st call resolution” rate
  4. What is the mean/median/min/max time to resolve an Incident that has been escalated (meaning, excluding all those that count as “1st call resolution”)
  5. What is the Incident breakdown by “Service” (failing that, by “Application”)
  6. How many Changes do you have over X time
  7. What is the mean/median/min/max time to implement a Change
  8. What is the Change breakdown by “Service” (failing that, by “Application”)
  9. What is the driver of each Change (break/fix, maintenance, enhancement, transformation) – what % of all changes of each?
  10. What is your “up time” over X time
  11. What is your “mean time between failure”
  12. What is the uptime/mtbf breakdown by “Service” (or App…)
  13. What is the mean/median/min/max number of documented requirements per project (this is “scope”)
  14. What is the mean/median/min/max number of “changes” to scope per project
  15. What is the mean/median/min/max % of defects (regardless of “priority”) allowed/accepted per project. (meaning, how many go into production)
  16. What is the mean/median/min/max amount of time each project takes to complete
  17. What is the mean/median/min/max amount of cost each for each project

And this list doesn’t cover anything about Requests, or Problems or How pissed off your customers are (commonly known as Customer Satisfaction).

But still, even using that list isn’t a good idea without first trying to understand “why” you need them…

3. Don’t start with the metrics even if you have a nice list of them…

What is the story you are trying to tell? What is the problem you are trying to solve? Why do you need these metrics in the first place?

The GQM method seems like a good approach to me. Of course that isn’t “easy” either. You still need to work (think) on what your goals actually are and if they are appropriate/useful.

Then you have to figure out what the “right” questions are and then figure out what your metrics are.

4. Balance is still required…

“Be careful what you measure…it may be the only thing getting done.” No idea who said this first. I’ll take credit if it can’t be found anywhere else on the internet.

Going back to the first observation/thought – the concern that the metrics will be used for bad purposes or by bad managers – we have to be careful on what we report on.

I believe all metrics can be broken down into at least 3 different categories:

  • Performance (Something over time – like # of Incidents resolved over the past 30 days)
  • Compliance (SLA, or target/expectation — like # of Incidents resolved within 1 day)
  • Quality (examples could be: Customer Sat or % of defects put in production or Mean Time Between Failure or # of Repeat Incidents or # of Incidents after a Change Implementation)

You can also use “Value” as a category if you can determine benefit/cost in some manner. I think it is possible to use something like # of times the service is used as an indication of value (for example – The number of hamburgers sold is an indication (not saying THE ONLY, but an indication) of the value your consumers  place on that product).

When producing a report, it speak to a Goal, answer a question, and not be shown in isolation. I think it is important to show the combination of the 3 (or 4) categories of metrics in any report/presentation.

If you focus too much on 1 type of category it will probably be at the cost of one (or more of the others).

5. Stuff I’ve read/found that might be helpful…