IT Service Management – Traveling To The Cloud

More and more customers are moving to cloud architectures to fulfill the alternating resource requirements in the IT. Traditional monitoring approaches with checking the availability of a single system or resource instance does only make limited sense in this new era. Resources are provisioned and removed on dynamic request and have no long term life date.

It no longer matters whether a named system exists or not, it is about the service, and its implementing pieces. The number of systems will vary in accordance to the workload covered by the service. In some cases the service itself may disappear, when it is not permanently required. The key metric is the response time the service consumers achieve. But how can we assure this key performance metric at the highest level, without being hit by an unpredicted slowdown or outage.

We need a common monitoring tool watching the key performance metric keys on a resource level and frequently check the availability of these resources, like:

  • Disk

  • Memory

  • Network

  • CPU

Application containers will also be handled like resources, e.g.:

  • Java Heap

  • Servlet Container

  • Bean Container

  • Messaging Bus

Also resources from database systems, messaging engines and so on are monitored. With IBM Monitoring we have a useful and easy to handle tool, available on-premise and in the cloud.

With this data achieved by the monitoring tool, we can now feed a predictive insight tool. As described in a previous post, monitoring is the enabler for prediction. Prediction is a key success factor in cloud environments. It is essential to understand the behavior of an application in such an environment in a long term.

The promise of the cloud is, that an application has almost unlimited resources. If we are getting short on resources, we simply add additional ones. But how could be detect, that the application is behaving somehow suspicious? Every time we are adding additional resources these are eaten up by the workload. Does this correlate to the number of transaction, to the number of users or other metrics? Or is it a misbehaving application?

We need a correlation between different metrics. But are we able to oversee all possible dependencies? Are we aware of all these correlations?

IBM Operations Analytics Predictive Insights will help you in this area. Based on statistical models, it discovers mathematical relationships between metrics. A human intervention is not needed to achieve this result. The only thing to happen is, that the metrics are provided as streams in a frequent interval.

After the learning process is finished, the tool will send events on unexpected behavior, covering uni-variate and multivariate threshold violations.

For example, you have three metrics:

  • Number of request

  • Response time

  • Number of OS images handling the workload

Raising number of OS Images wouldn’t be detected by a simple threshold on a single resource, covered by the traditional monitoring solution.

Either the response time shows no anomaly nor the number of users does. Also the correlation between these to data streams remains inconspicuous. However, adding the number of OS images shows an anomaly in the relation to the other values. This could lead to a situation, where all available (even the cloud resources are limited, because we can’t afford it) resources are eaten up. In this situation our resource monitor would send out an alarm at a much later point of time.

For example, first, the OS agent would report a high CPU usage. Second, the response time delivered to the end users would reach a predefined limit. The time between the first resource event and the point in time where the user’s service level agreement metric (response time) is violated is too short to react.

With IBM Operations Analytics Predictive Insights we earn time to react.

So what is your impression? Did you also identify correlations to watch out for after analyzing the reason for a major outage and the way to avoid this outage?

Follow me on Twitter @DetlefWolf, or drop me a discussion point below to continue the conversation.

In my next blog I will start a discussion which values make sense to be fed into a prediction tool.

Sybase Monitoring

While we see big changes in the way we deliver monitoring services, the functionality of the tool is still a proof point when talking with subject matter experts.

The IBM Monitoring suite offers a huge coverage for a lot of systems, applications and containers. Find a list of supported agents on the IBM Service Engage website.

The Sybase ASE Agent enables the monitoring of SAP Sybase database servers. The agent fully integrates into the IBM Monitoring infrastructure.

The installation and configuration guide gives you insight how things fit together and how the product will be installed.

This agent is bundled with the IBM Application Performance Management suite.

The IBM Tivoli Composite Application Manager Agent for Sybase ASE Reference Guidegives you detailed information about:

  • Attribute groups and attributes
  • Workspaces
  • Situations
  • Take Action commands
  • Event mapping
  • Workspaces workgroups mapped to tasks
  • Upgrading for warehouse summarization
  • Sybase agent data collection

The agent is currently supported on-premise only and still uses the IBM Tivoli Monitoring V6 infrastructure.

WebSphere Monitoring

Over the last weeks I see an increasing request for  WebSphere Application Server (WAS) monitoring. This article summarizes solutions available, created over the last years on top of IBM’s monitoring solution, SmartCloud Application Performance Management (SCAPM).

The SCAPM portfolio comprises almost everything of IBM’s monitoring capabilities under the umbrella of ITM. The ITCAM for Application contains the WAS monitoring agents.

The documentation of the WAS monitoring solution may be found on the IBM Knowledge Center.

Additionally, I’ve created two add-ons for the WebSphere Monitoring. The situation package gives a set of sample monitoring rules, covering the most often seen requirements in the field.

To get a comprehensive overview of all WebSphere Application Server instances monitored in your environment, this navigator view might help.

For deep dive analysis the data collector might be connected directly with the ITCAM Managing Server to enable transaction debugging and detailed WebSphere environment analysis.

The WebSphere monitoring is only one discipline within the SCAPM portfolio.  Other areas of the application performance management are covered, including transaction tracking, HTTP response time measurement and robotic monitoring.

IBM Monitoring goes SaaS

Big changes in the IT market are taking place. We see cloud services all around changing the delivery model of software from product sale to a software as a service model.

IBM also delivers more and more parts of its portfolio in a software as a service model. One of the very first offerings is IBM Monitoring. Based on the IBM Service Engage platform the monitoring infrastructure is delivered to the customers.

But how does it work?

IBM delivers the server components in a Softlayer® data center. The infrastructure is hidden behind a firewall in combination with a reverse proxy. All customer agents and client devices are connected by using the HTTPS port (Port 443) on the announced service address.

How are the different customers separated from each other?

The user clients are connected to the correct customer specific monitoring environment based on the user credentials given on the login page.

The agents have customer specific credentials in their setup and are generated for each customer exclusively. These agents are provided upon registration for the service and can be downloaded on customer request.

How many agents should a customer have?

Well, there is no minimum number of agents a customer has to request to become eligible for IBM’s monitoring offering. However, there is a maximum number of agents a single instance of this monitoring infrastructure can serve. Depending on the complexity of the monitoring rules you apply we expect a maximum of about 1000 agents per infrastructure instance.

What kind of agents are available?

The following agents are currently available for the SaaS offering:

  • Operating Systems

    • Windows OS

    • Linux OS (RHEL, SLES)

    • AIX

  • Databases

    • DB2 UDB

    • Oracle DB

    • Microsoft SQL Server

    • Mongo DB

    • MySQL

    • PostGreSQL

  • Response Time Monitoring

  • Microsoft Active Directory

  • Virtualization Engines

    • KVM

    • System P AIX

  • JEE Container

    • WebSphere Application Server

    • WebSphere Liberty

    • Apache Tomcat

  • Languages and Frameworks

    • Ruby on Rails

    • Node.js

    • PHP

    • Python

There are several other agents planned to be released within the next few weeks or months, but I’m not authorized to write about in detail in this blog. If you want to more details, or if you have specific requirements, drop me a message, and I’ll come back to you with more specific information.

How are these agents installed?

The installation procedure is now pretty simple. The following videos show the installation on Linux and Windows. After downloading the appropriate packages for the target OS platform, the installation process can be initiated. The redesigned installation process on Linux follows now the standard installation rules for the OS platform (here now RPM).

The new IBM Monitoring is different from the previous one. The new lightweight infrastructure is available within a few minutes. The agents are easy to install and are simple to configure. The monitoring solution comes with a new user interface based on HTML without the need of any Java Runtime Environment. Because of that, the user interface is now also available on touch pads and smart phones.

Follow me on Twitter @DetlefWolf, or drop me a discussion point below if you have further question regarding the new IBM Monitoring.

 

Raising IT Monitoring Acceptance

After publishing my blog “IT Monitoring is out of style?” a discussion was initiated by several followers, how IT Monitoring acceptance could be achieved within the system administration groups.

To make that clear, system admins are not preventing monitoring in general, they complain about too often, toounspecific alerts which stops them from doing their daily business.

This leads to the refusal of such monitoring services. So what to do to get a commitment from the system admin team.

What system admins really hate?

  • Alerts, which indicate minor issues that could be also fixed later on within normal business hours, deranging them within their leisure time.

  • Alerts, which flip on and off within intervals (bouncing alerts)

  • Alerts, which are out of their responsibility

Well, I can imagine another bunch of bullet points, what system admins do not like, but remembering my own time as a system programmer, I believe these are the real eye-catchers in this area.

But there are also reasons, why they support a monitoring solution. They want to avoid the following situations:

  • Being hit by an outage of a service without an early warning

  • Upset users are floating the support team with calls, due to poor response times

You can fill this list with tons of other statements, so feel free to drop me your top reasons in the comment section.

 What really changed over the last years in the IT department is the service orientation. Formerly, we watched the system health, rather than the service health. Today we focus on the service health. And this offers a new approach to increase the acceptance of IT Monitoring solutions.

 

End-To-End-Measurement

A business partner, currently implementing a monitoring as a service model for small businesses, stated the requirement to get alerted only, if key business IT functions of its customer are on risk or are already out of service. We used the Internet Service Monitor to check the named services (like email, internet accessibility, phone server, and so on). By using the approach of the End-To-End-Measurement the detection of critical service status is assured. For more sophisticated services like Web Applications or SAP Transactions the Web Response Time Monitor delivers deep insight into transactions. To track down the availability and performance of transactions in business off hours, the Robotic Response Time agent delivers valuable insight and informs about unexpected outages.

All events coming from this discipline are good candidates to be escalated also in business off hours.

Resource Monitoring

Resources, like CPU usage, memory or disk consumption, database buffer pools, JEE heap size or whatever are very important metrics to analyze the health of the operating or application system. A single metric is only an indicator but too often not a good signal to throw a high critical alert. This is exactly the question discussed in “Still configuring thresholds to detect IT problems? Don’t just detect, predict!” But yes, there might be single metrics indicating a hard stop of a system or application, which requires immediate intervention. And this knowledge comes often from resource monitors. Additionally, the resource monitors gather important data for historical projections and capacity planning. Based on this data, predictive insight becomes actionable, and gives us another source of meaningful events. Events detected by Predictive Insights are also good candidates to be escalated even in business off hours, if you are interested in avoiding interruptions in IT services.

Suppressing Events

When I was a system programmer, my team’s main goal was to have as little as possible calls in business off hours. We tried to catch up with the events – also with the less important ones – within our standard office hours. To achieve this goal, we created rules, what kind of events – or what combination of events – are critical enough, to initiate a call in business off hours. In normal business hours we monitored the system with an extended set of rules to get early indications of unhealthy system conditions. This helped us to maintain a pretty tidy IT environment, causing relatively seldom unexpected system behavior. All these extended events were suppressed by the event engine (here OMNIBUS) in business off hours. When we came on-site again, we reviewed the list of open and already closed events, recapped the number of occurrence in the monitoring system to understand the situation we’ve missed while being off-site.

In summary, there are ways to get the commitment from the system administrator team for a monitoring solution. The system administrator’s goal is to have a high available, high performance system environment with fully functioning service running on it. IBM Monitoring tools help them to achieve this goal and offer them the flexibility to get filtered information about the system status as they need it.

For those customers, trying to avoid maintaining a monitoring infrastructure by themselves, the new Monitoring as a Service offering fits perfectly.

So what is your impression? Are you also discussing with system administrators about a powerful monitoring?

Follow me on Twitter @DetlefWolf, or drop me a discussion point below to continue the conversation.

IT monitoring is out of style?

This blog has been also published on Service Management 360 on 09-Jul-2014.

A few weeks ago I read a blog entry written by Vinay Rajagopal on Service Management 360 with the headline “Still configuring thresholds to detect IT problems? Don’t just detect, predict!” I was wondering what that new big data approach will imply and what it means to my profession focusing on IT monitoring. Is IT monitoring old style now?

The IT service management discipline today is really a big data business. We have to take a lot of data under consideration if we want to understand the health of IT services. In today’s modern application architectures, with their multitier processing layers and the requirement that everything be available all the time and that performance remains at an acceptable level, IT management becomes a threat that often ends in critical situations.

The “old” approach, of monitoring a single resource or a dedicated response time of a single transaction doesn’t seem to be the way to succeed anymore. However, it is still essential to perform IT monitoring for multiple reasons:

  1. IT monitoring helps to gather performance and availability data as well as log data from all involved systems.

    This data may be used to understand and learn the “normal” behavior. Understanding this “normal behavior” is essential to predict upcoming situations and to send out alerts earlier.

    The more data we gather from different source, the better our prediction accuracy gets.

    With this early detection mechanism in place from so many different data sources, injected by the IT monitoring, operations teams can earn enough time before the real outage takes place, so that they can avoid this outage.

     

  2. IT monitoring can help to identify very slow-growing misbehavior.

    Gathering large amounts of data does not guarantee that all misbehavior can be identified. If the response time of a transaction server system increases over a long period of time and all other monitored metrics evolve accordingly, an anomaly detection system will fail. There are no anomalies. Because growing workload is nothing unexpected and the growth takes place over a long period of time, only distinct thresholds will help. This is classical IT monitoring.

  3. IT monitoring helps subject matter experts to understand their silos.

    Yes, we should no longer think in silos, but for good system performance it is essential to have a good understanding of key performance metrics in the different disciplines, like operating systems, databases and middleware layers. IT monitoring gives the experts the required detailed insight and enables the teams to adjust performance tasks as required.

So the conclusion is simple: monitoring is a kind of prerequisite for doing successful predictive analysis. Without monitoring you won’t have the required data to make the required decisions, whether manually or automatically, as described with IBM SmartCloud Analytics – Predictive Insights.

Prediction based on big data approaches is a great enhancement for IT monitoring and enables IT operation teams to identify system anomalies much earlier and thus to start reactive responses in time.

IBM SmartCloud Application Performance Management offers a suite of products to cover most monitoring requirements and gather the required data for predictive analysis.

So what is your impression? Is monitoring yesterday’s discipline?

Follow me on Twitter @DetlefWolf, or drop me a discussion point below to continue the conversation.

IT Monitoring: Necessary, or just “nice to have”?

This blog was first published on Service Management 360 on 30-Oct-2013.

http://www.servicemanagement360.com/2013/10/30/it-monitoring-necessary-or-just-nice-to-have/

Why do I so often see poorly managed system environments in small and medium businesses?

Well, there are multiple reasons for that:

  • They are not aware that they deeply depend on reliable IT services.

  • IT is not their core business.

  • They don’t know what to take care of, and how to do it.

Monitoring is often seen as a nice-to-have discipline as long as IT service outages do not cause any business-relevant losses. This is often achieved by minimizing the dependencies on IT services even if this “backup scenario” is inefficient and cost intensive.

I’ve seen businesses who print every incoming email. The reason for that? Well, the email system might go down, which would prevent the company from getting any work done. Having every communication outside the IT systems protects them in the event of an outage, or so they believe. The vision of the paper-free office – forget about it!

Why don’t they introduce a monitoring solution to achieve more reliable IT services? Let’s do a quick review of the main reasons:

  • Complexity
    Monitoring requires a complex infrastructure, which has to be maintained. It requires deep knowledge of the monitoring mechanisms and the monitored applications, and the achieved results have to be frequently reviewed and amended.

  • Lack of knowledge
    What should be monitored? Which components should be under investigation? What thresholds should be set? All of these questions are absolutely valid and require time-consuming answers. And there are no “correct” answers. But there is experience in the market, and it is ready to be used.

  • Time consumption
    Small IT departments are not able to dedicate a team of people to perform professional IT monitoring. There are lots of things to do in such small operation centers, and watching a monitor wall all day is completely illusive. And to be honest, it is not a full-time job, because the number of systems is not large enough.

  • Monitoring is too expensive
    Well, cost is really the killer in every discussion. The ramp-up costs are too high. The products are too expensive. The implementation effort at the beginning is unaffordable.

But having said all that, is it now time to leave the room and tell my customers: “Yes, you are right, IT monitoring is for large businesses only”?

I think we have to address the above inhibitors very carefully and show up with alternatives for the small and medium business. Our customers’ businesses are too valuable, and unreliable IT services should not be allowed to set these services at risk.

Monitoring as a Service

As mentioned in the blog post linked above, I see the Monitoring as a Service delivery model as the best choice to cover all these challenges. But what should it look like?

First of all, we have to keep all the complexity out of sight of the service requester (that is, from you as the customer). The monitoring should just happen. The customer requests monitoring for its business services (emailing, customer portal and so on) and the service provider has to deal with it. The service provider has to have the knowledge, not the business owner.

Second, the service provider has to take care of the customer’s systems during business off-hours, to prevent system outages during normal operations, when the systems are being used frequently. This has been often seen in the past as one of the main inhibitors for small businesses. Even the system are not actively used, they still are doing background work, like database reorganizations, data backup and other administration stuff.

And now the third aspect, the investment. Customers expect to have managed availability and performance with low ramp-up costs and a quick initial implementation phase. This is only possible with service providers who have real solutions available that focus on their customers’ needs and the services these customers use. That means that industry-specific solutions are required to cover the different markets.

Below I’ll present an example scenario to illustrate these ideas.

Sample from the car manufacturing industry

A supplier in the car manufacturing industry is tightly connected to its main customer sharing IT services with him. To get connected to the principal customer’s systems this supplier has to have its own IT systems. Additionally, other specialized IT systems are on premise to serve the daily business processes, including:

  • Accounting (for example, SAP)

  • Billing (for example, SAP)

  • Emailing (for example, Lotus Notes)

  • Phone systems (for example, Ericson)

  • IT network (for example, CISCO)

The supplier’s IT systems are essential to deliver all required parts to its principal customer in the required just-in-time chain. Any failure in this process might lead to significant penalties to the supplier. The Monitoring as a Service provider should have artifacts available to quickly monitor the health and availability of theses infrastructure components and the IT services implemented on top of it:

  • Infrastructure monitoring

    • SAP monitoring

    • Lotus Notes monitoring

    • Network monitoring

  • Service monitoring

    • Accounting

    • Billing

    • Phone line availability

    • Applications on the principal customer’s systems

All these monitors require specialized, industry-specific know-how but are similar for all suppliers in this industry. The solutions could be provided in monitoring packages, including technical resource monitoring, process monitoring and availability tracking. Additionally, these packages should include reporting features for reviewing the achieved results.

Conclusion

By ordering Monitoring as a Service, small and medium businesses might overcome today’s existing inhibitors for implementing a strong control of their IT systems. With SmarCloud Application Performance Management the required products are there. Within IBM’s business partner organization and service providers, the infrastructure to deliver these services is also available. It is now your turn to act. What stops you from doing so?

Is IT monitoring for large businesses only?

This blog was first published on Service Management 360 on 13-Aug-2013.

http://www.servicemanagement360.com/2013/08/13/is-it-monitoring-for-large-businesses-only/

Could you imagine any large company leaving their IT systems unmanaged? I can’t.

These major companies have dedicated departments in place, responsible for monitoring IT system availability and performance. They are in charge of detecting issues before the business processes are affected. They also have to measure the fulfillment of service level agreements and deliver input for capacity planning on IT resources. Often they have to provide metrics to bill delivered services to the different business units.

Repeatable processes are in place to deliver these services:

  1. Sense
    First you must sense the user experience or the service quality (like sending email). If problems are detected go to phase 2.

  2. Isolate
    In multiple tier architecture application environments the isolation of the problem source is the key factor for a quick problem resolution. By having all resources under control and having transaction tracking mechanisms in place, you can quickly identify the failing resource.

  3. Diagnose
    After you identified the bottleneck, you need to perform a detailed diagnosis. You must investigate the system and its applications and make the right conclusions.

  4. Take action
    With the results from the previous step, you can then take the required actions.

  5. Evaluate
    By doing a re-evaluation you go back to step 1 and make sure that the alerted situation no longer exists and the applied action was successful.

The frequent measurement of key metrics is also used to earn data for historical analysis. Service-level measurement and capacity planning become actionable with this data.

And what about small and medium businesses?

Here I often see very limited attention given to system monitoring. In small and medium businesses often a kind of system availability monitoring with a very limited scope is performed. In some cases, the sensibility for that IT discipline doesn’t exist at all.

But what does it mean if IT systems are not available? Today, most kinds of businesses rely on IT services somehow. Any production facility, medical service, office, modern garage and so on is almost incapacitated when the IT systems are down. That means that staff can’t perform their core business roles, can’t earn money, can’t provide the services to their customers. This leads to a massive loss of revenue and reputation.

Additionally, a lot of these small and medium businesses are suppliers in a just-in-time supply chain for the large businesses (for example, a car manufacturer) and penalties apply if the delivery and production process is interrupted.

So the need for enterprise class monitoring systems exists. But why don’t they do it?

  1. It is too expensive!

  2. It is too complex!

  3. It is too time consuming!

These are the three favorite reasons I often hear. And all of them are partially true. Monitoring is a very special subject matter expert discipline. It requires detailed understanding of the monitored systems, applications and services as well as knowledge of the monitoring product used.

The purchase of an enterprise-class monitoring system might require a huge amount of money and a remarkable education effort. And it requires a kind of sustainability small IT departments can’t dedicate to monitoring. Monitoring requires repeated reviews to enhance quality, but it is not possible to keep two or three persons focused on monitoring questions because the workload for this discipline is not high enough in a small IT department and they have responsibility for lot of other things. In consequence, the skill level for this discipline declines and the results no longer justify the investment in enterprise-class monitoring.

So what now?

Is the answer “yes” on my initial question? No, it isn’t. A new delivery model is required. Enterprise-class monitoring is needed in all businesses relying on stable IT services.

The answer might be a Monitoring as a Service model. A trusted service provider could deliver such a monitoring service and overcome the above inhibitors. Because he delivers this service to multiple clients he can lower the ramp up costs for the software purchase, offer the required sustainability and bring in the expertise for monitoring systems, applications and services.

In my blog series “Monitoring as a Service” (see parts 1, 2, 3 and 4) I described a business model for using IBM monitoring solutions to set up such a service.

IBM SmartCloud Application Performance Management offers a suite of products to cover the above described five-step monitoring process, including reporting features for historical reviews and projections to the future.

So what is your impression? Are we covering the right markets? How could we enhance? Please share your thoughts below.