Another published article…

I have been quiet on here for awhile.

I suppose having babies will keep you busy on other things besides blogging. Don’t worry, I am still working, thinking, and learning.

Lately I’ve been reading a lot on Systems Thinking, Enterprise Architecture, DevOps, and Scrum.

But the big news right now is that I have been lucky enough (again) to have an opinion piece published by The ITSM Review.

PMP or ITIL v3 Expert – which would be better for your career? 

Advertisements

Demand, Supply, Execution vs Plan, Build, Run

Charles Betz had several wonderful posts on the EMA blog’s but since he no longer works there EMA has seen fit to remove access to his work.

Fortunately for me, one my many habits is to open up about a billion tabs (much to the annoyance of my wife and to the amusement of my coworkers). I had noticed that EMA was no longer allowing access to Betz’s work but I happened to have several of them still open on some tabs! So, I just copied them and saved them to Evernote.

Sadly for the rest of the world we’ve all lost out on some really good information.

Demand, Supply, Execution

Luckily, one of Charles Betz’s blog entries was also posted at Nimsoft located here.

The basic premise of this work and some of this other writing is that IT does not do a good job of prioritizing all work from a universal standpoint.

Sure, we prioritize Incidents (against other Incidents), and we prioritize projects (against other projects) and maybe we prioritize day to day tasks (again, just against other day to day tasks) but we don’t really prioritize all of our work

Here is a direct quote from one of the now lost EMA blogs from Charles Betz:

“Jane is a systems administrator in a large enterprise. Or a DBA, or a security architect, or any of a number of similar positions providing shared services.

There are three primary ways that demand for Jane’s services appears:

  1. Project managers come to her boss and ask for a percentage of her time. Once she is designated as a project “resource” she has deliverables: requirements and design assessments, perhaps actual construction of infrastructure. She also finds herself responding to lightweight project workflow for “issues and risks” and “action items.”
  2. She is assigned incidents, service requests, work orders, changes, and the like through various enterprise workflow systems, especially the integrated IT service management system.
  3. She also is tasked by her manager with responding to various “initiatives” that occupy a middle ground between projects and workflow: audits, compliance efforts, capacity assessments, root cause analyses, key system reviews, and more.

On Wednesday, she gets called into a Severity 2 incident involving the organization’s supply chain system. On Thursday, she has a deadline of responding to a security audit finding for the organization’s general ledger system. And on Friday, she has a critical path deliverable due for a strategic enterprise project. Fun life!

There is often no specific prioritization across these tasks beyond “who is screaming loudest.” The Sev 2 incident may command her attention until it is resolved, but is this really the correct priority? What if it’s only a partial outage, and the project manager is ramping up the pressure for her deliverable? Jane may be attempting to do a “little bit of each” – switching her attention across the various tasks competing for her time, a very inefficient way to get work done.

Stories of such overburden pervade the IT industry. Ask yourself: how many people in my company accept both project and service request work (e.g. Incidents, Service Requests, Changes, perhaps “Work Orders” if distinct from Service Requests or Changes)? And are they also assigned to the less formalized initiatives (which we’ll call “continuous improvement”)? Do their line management and their project manager at least have visibility into this aggregate demand and its consequences?”

When I read this I thought my concept for how Change, Project and Release management would fit nicely with what he is talking about – a better way to manage all of IT Demand from a universal perspective. But that still leaves out Incident, Problem, and other routine tasking.

The traditional Plan Build Run methodology basically only considers projects or efforts of significant size. All other efforts are just not “planned” – they are just done, often behind the scenes or under the covers.

Often, as Charles Betz describes, leaving it up to individual engineers, administrators or team leads to prioritize and track this other work is inefficient and sometimes even ineffective.

It seems to make sense that a good way to solve this issue is to track all work and prioritize all work against all other work and not in silos  – silos of type of work or often silos by department or team.

Great Idea but…

Can you imagine IT Managers actually wanting to prioritize and track all this work every day? Well, maybe they will want to…but will they actually do it?

Typically IT Managers are not only as reactive as staff but often are the cause of the reactive nature of the environment. How many times have you seen an IT Manager forgo putting in an Incident ticket and instead just call the engineer directly and task her with work? How many times have you seen an IT Director or a CIO put in a Problem Candidate to be reviewed, prioritized and assigned properly and instead just call the IT Manager and tell them they want to see an RCA and some fix actions by tomorrow? Happens all the time doesn’t it?

So, if we really do want to better manage our resources and really do want to better prioritize our work effectively how on earth can we do it?

Consolidate work types

Start by at least consolidating work types together.

As I mentioned before – I think all changes requests should be tracked – which would include projects. That is everything from wanting to patch a server to create a new service is tracked in a single Change Management tool and managed by a single overarching Change Management process.

What about Incidents, Problems and “tasks” (things like audits, capacity planning, or CSI efforts)? This is more of a challenge because most IT shops:

  • Only use Incidents for “end user” issues
  • Don’t separate Requests from Incidents
  • Even if they do have a separate Request system – it is only used for “end user” requests
  • Do not have any formalized Problem Management process – therefore are not tracking it today
  • Do not have a formalized CSI registry
  • Do not have any particular way to track tasks for such activities like “capacity planning” or “audits”

Somehow we need to be able to separate the requests and incidents/interruptions that are internal only from those that are end user/customer based. Hopefully we can use the same ‘tool’ but with a different flag or something that precludes it from any customer SLA or customer reporting.

Also, more IT shops need to develop a better understanding and implement more processes around Problem and CSI. We can’t prioritize all work efforts if all work efforts are not being first recorded somewhere!

Finally, we (as in the IT community) need to develop some sort of terminology and framework that captures the non Incident, non Problem, non CSI type tasking that happens all the time.

And however we do this…it has to be easy. : – )

No one is going to bother doing any of this if the system of record takes more than about 10 seconds to create a task, a Problem Candidate, or a CSI candidate or more than 5 seconds to find one you’ve already created. I can send an email in that amount of time and I can search my email sent folder fairly quickly.

We have to do better than email.

ITIL for sale?

http://www.itskeptic.org/content/british-government-cabinet-office-selling-itil-and-prince2

 

As is Prince – the competitor (if you will) to PMI (the PMP certification). 

 

You can find the tender here: http://ted.europa.eu/udl?uri=TED:NOTICE:347757-2012:TEXT:EN:HTML

 

What will it mean for ITIL? I’m not sure. 

Bigger question is – what will it mean for ITSM?

Alerts, Events, and Data Collection

Over at IT Skeptic there is some question over the ITIL use of the terms “Event” and “Alert” . I thought this was fairly amusing because I’ve been having this discussion with my customers (and my team) for years.

I have designed, implemented and managed HP OM (OVO or Openview), NNM, OVIS,  Netview, T/EC, Concord SystemEdge, Topaz/BAC/BSM (including BPM/RUM), Freshwater SiteScope/Mercury SiteScope/HP SiteScope, and AlarmPoint/xMatters over the past 10 plus years.

When trying to set this ‘stuff’ up we have to know what is important to capture, what is of some importance to know (informational perhaps), and what you need to know right now to take immediate action on.

Over the years I’ve come to think of it in these terms

Data Collection = anything we’ve (either the customer, or us based on our ‘expertise’) determined important enough to capture and record. The primary source for reporting.

Examples:

  • Disk space utilization every 5 minutes (we don’t care what it is, only that we capture/record it)
  • CPU
  • Security Log entries,
  • End user emulation transaction times

Events = Any data collected that either has value for immediate action (either automated or manual) or contains information of a ‘proactive’ nature. Provides additional insight into Incident/Problem Management. Can be used by Problem to trend “events” over time.

Examples:

  • Disk space has exceeded a certain threshold (over a period of time (my preference), or occurred once) – Perhaps 50%, or 65%, or 95%
  • A security log entry of a particular type
  • End user emulation transactions have failed – from one location or maybe all locations (over a period of time (my preference), or occurred once)

Alerts = Any event that meets or exceeds defined thresholds that require immediate attention/action by ‘service providers’ (sys admins, DBAs, network engineers, product managers, service managers, service desk). Indicators of Incidents and/or Problems.

  • Disk space has exceeded a certain threshold – usually something high like 95% and most always over a period of time (to avoid the “false positive”)
  • A security log entry of a particular type
  • End user emulation transactions have failed – usually from more than 1 location and for a period of time.

So, I think of it like this: Alert must first be an Event which must first be Data that is collected.

  • Data collection > Events > Alerts
  • Lots of things > Some things > Few things

Not all Data collected is worthy to be an Event – I just want to log CPU over time so I can graph it later

Not all Events are worthy to be an Alert – CPU spiked once on one web server (although if it happens every day, perhaps Prob Mgmt using reports on Events can see this an investigate)
All Alerts should create Incidents and/or Problem tickets. Something is really messed up (or is soon to be) requiring immediate (or near immediate) action/work.

Works for me anyway .

ITSMF Conference

I just got back from ITSMF Fusion Conference in D.C.

A learned quite a bit and I thought there were several very interesting sessions.

Dana Olson from West Corporation had a good session on implementing Change Management, using WITIL (West ITIL). I thought this was a good use case of adapting ITIL for your particular environment. Some of the terms or concepts may not really work for your culture – she gave the example that a single person having an issue calling the Service Desk gets a “request” to restore the service as opposed to the more ITIL prescribed “incident.”

Andrew White from Nationwide had a brilliant session on Event and Problem Management. His approach to Problem Management was a more thought out, richer, than many approaches I am familiar with. It was more about continuing to ask “why” and “what next” rather than resting on the first likely cause of a problem. I may write down the fairly extensive example he gave later.

Timothy Rogers had a good session on Continual Service Plans. It was really about making CSI its own separate, distinct service, with its own owner, manager, and such. This is opposed to having CSI as maybe just part of whatever process (for instance a part of the responsibilities of the Change Management Owner is to do CSI for change management – this is to say, yes, but also all of CSI should be ‘owned’ and ‘managed’ more centrally). It is also important to note that CS, in his description, was beyond just CSI for ITSM and really running CSI for all services/applications/products being produced/managed/owned by all of IT. The idea here is to then have a single CSI registry that is then prioritized, planned, and then implemented.

Finally, I thought the ‘paid’ speakers were very good and quite inspirational.

 

Process Owner versus Process Manager

Since this was originally written – I have expanded on this concept and had it published over at The ITSM Review.

You can find my expanded thoughts on this here!

———————–

Over the past two months I’ve been involved in some really good discussions about the differences between process ownership versus process management.

This has been especially important at my company because there hasn’t been a lot of clarity around this nor has there been too much of either ownership or management of processes. As we started out to fix some of the process problems we have here, we needed to first understand the roles and responsibilities of owners and managers.

This is my understanding of the two and explanation of why they are important and different.

So what does it mean to own a process. The owner of a process is responsible for the creation, maintenance and improvement of that process. They are responsible for (at least here) collaborating with the people that will be using/abiding by the process and helping educate them on the process. They are responsible for ensuring the outcomes of the process line up with business objectives. This means, measuring the success factors and key performance indicators of the process (compliance comes to mind). Process owners need to ensure the processes are compliant to policies (IT, HR, Legal, other) and work with other processes (such as with taxonomy – what is defined as an “Incident” in one process is synonymous with another process).

The manager of  a process has slightly different responsibilities. They are the executors of the process. The ones moving through the steps defined by the process, or at the very least ensuring that the ‘work’ is being moved through the steps in the proper order/manner as described by the process. They are responsible for managing the inputs/outputs of each step and for the final output of the process matching what is expected as detailed/described by the process. Which is to say if the process is not defined well or if defined incorrectly, that is not the manager’s fault – but the owner’s fault. The manager is required only produce the output as to the defined expectations. The manager should offer input to the process owner for improvements but is not responsible for updating the process. The process manager should also have critical success factors and KPI’s for his/her input and outputs and analyze those reports.

So the owner should be bigger picture, describing how things should move through a process. A manager is tactical, actually moving through the process.

These are of course roles, and don’t necessarily have to be different people. They do require different types of thinking and outputs. Depending on the size, scope, complexity of the process it may be wise to have different people play these roles.

Moreover, you should have only one process owner but you may have multiple process managers (perhaps they only manage a portion, or perhaps they manage only certain teams through the process).