To ensure the success of an IT Monitoring project implementation in the long-run, the monitoring should be embedded into the existing organization with clear responsibilities and roles for each stakeholder.
In the slide below a sample organization is shown, with a typical matrix organization. This is often seen in enterprises.
There are four groups in focus regarding the monitoring deployment:
Users Most enterprises earn their money with these group of people. They expect a running application, performing well and supporting them getting their job done. These could be internal or external users or both.
Business This group are the stakeholders of the business processes and focus on getting things done, to maximize the contribution to the companies, they work for.
IT Application These people design, code, test and deploy the required applications.
IT Infrastructure This group of people are responsible to provide and run a reliable IT infrastructure and architecture for the future, including internal and external cloud platforms.
These definitions are samples, seen with different customers over the last years. Other organizations my have other setups working for them very well.
The point I want to highlight is, how to embed the monitoring into such an organization:
Support Level 1 While the business support organization supports the questions from the user’s using the applications and processes supported. Often the product catalog is in focus, the order process itself, the configuration of product and so on. The questions are business driven rather than IT technical. IT technical question are rerouted to the IT support desk. The IT support desk is in charge for all questions, regarding IT related objects. This includes assistance for program usage help, user login issues, performance issues and so on.
Support Level 2 This function is covered by an operations center. This organization keeps IT services up and running, monitors the state of key resources, applications, network and all other stuff making up a well performing IT environment. They are in charge to initiate requests to level 3 if problems arise, they can’t handle by themselves. Regarding monitoring, these people are the power user of the monitoring solution. All acquired data is used here, to provide a comprehensive overview about the IT status and to enable decisions, how to handle give situations.
Support Level 3 Level 3 has several teams, contributing to this mission. Developers and system programmers have to collaborate to get solutions for serious problems arising while operating the application.
System programmers are in charge to fix problems with hard- and software packages and their configuration.
Developers have to deal with issues originating from self-developed application code.Today, these roles are often combined in so called DevOps teams. These teams are responsible to perform deep dive analysis in case of application errors, including log analysis, detailed performance measurement and threshold definition. They also have to continually develop monitoring thresholds to increase alarm accuracy. They install new monitoring tools and integrate these into the existing solution. DevOps teams always keep in mind, that any new technology requires also a new monitoring review and potentially a new monitoring component.
Monitoring should be embedded into IT management processes, best described in ITIL. Incident management, problem management and change management are the disciplines in focus. For more details on monitoring and ITIL see the Process Symphony Knowledge Base.
As I described in my blog entry Finding the best Monitoring Solution, monitoring is a process rather than a status. It is a never ending iteration of requirement review, software upgrade, solution design and execution. That’s why it is so important to embed monitoring into IT organization’s daily business.
Discussing with customers the expectations they have, regarding a monitoring system are pretty different across departments involved. As mentioned in the blog Finding the best Monitoring solution, it is essential to understand theses requirements and have these needs in focus while moving on. The monitoring solution has to support a wide range of requirements.
So let us take a look on generalized requirements without thinking in products. The slide below shows four different work areas, often causing trouble in the daily business of IT departments.
Monitoring This is the core area of a monitoring system. Gather data about any resource, regarding performance and availability.
Prediction While feeding time series data from the monitoring component and other sources (like the event management system or log analysis, this system enables to detect anomalies earlier and helps to differentiate between common behavior and unexpected processing.
Event Management Events are consolidated and correlated in one place, decreasing the complexity to follow in SOS situations where multiple components are involved and are monitored with different tools. This discipline is tremendous important, because most customers are using specialized tools to monitor their IT infrastructure, their IT network, the user experience, the cloud components/resources and so on.
Log Analysis Most IT issues are fixed by consulting the required logs. Wide spread logs hamper this activity. It is very important to collect and consolidate these logs in one place and make them actionable to search, scan, sort and analyze these information sources efficiently.
Monitoring is more than sensing the availability and performance of a single IT resource. It is essential to do so, but several other steps have to follow.
Most IT monitoring projects fail over time, because the value for the business can’t be manifested over time. IT monitoring is not a short shot project. It is more a travel which never ends. It has to change as the IT landscape changes. That means, it must be as flexible as possible to support the requirements of tomorrow. A dedicated responsibility should be defined, that supports the monitoring requirements over time. Today, DevOp departments exactly take over this role – but it helps, having a common monitoring infrastructure on hand, ready to be used, flexible enough to integrate new tools.
This list is a good starting point to review the different choices found in the market. All mentioned products have their strengths and weaknesses in different areas. Each solution – let us call it product – has its own focus areas:
Time series processing
It is important to start with the expectations of the users and stakeholders:
Cloud Services Monitorng
User Response Time
User Stories & Playbooks
Define usage scenarios
User stories can help
Play books help prioritize activities
Review the existing solution(s)
Knowledge base and familiarity
Collaborate with the different teams in your organization
Having a system management solution is not a status you reach, it is more like a tour. You can’t buy a solution, install it, implement it and be happy for ever. It is more like a travel with good equipment making things easier. But the things have to be done. This leads to the awareness, that your organization has to support that function.
System management is a daily business which has to support the organization’s needs and be agile enough to fulfill changing requirements in an elastic environment.
So what is the best solution now for you and your organization?
Well, it depends on the outcome of the workshops, where the above questions are discussed and answers are defined. In most cases, there is not a single product, it is the combination of two or more products using interfaces to build a very customer specific solution.