Implementing naming conventions using labels in checkmk

Most IT organizations have strict naming conventions for their hosts (servers, network devices and others) representing valuable information about:

  • country
  • region
  • customer
  • application
  • server
  • client
  • network
  • environment (test/production)

This list might be longer or any kind of subset of these attributes. As a sample I use my devices available in my home environment. (A review of my naming convention seems to be very urgent…)

Often naming conventions offer the ability to understand the usage of a given device, the location, or whatever. This information is also required to make your monitoring journey a success.

checkmk offers two key features to attach information of this kind to discovered hosts, labels and tags. In my opinion, the most valuable advantage of labels, compared to tags, is the ability to assign labels dynamically, without any dependency to folders or other mechanism.

This qualifies labels to dynamically apply “attributes” to hosts, using the currently valid naming conventions.

My home naming conventions are:

The following things I’d like to identify for monitoring:

  • the owner of the system
  • the type of the system (virtual system or physical)
  • the application, which is hosted on the system
  • the classification of the physical systems (e.g.: always online or not)

To do so, follow these steps in checkmk:

First Step: Open the setup dialog

  1. Type “host labels” in the search area
  2. In the navigation bar click “Setup”
  3. Click on host labels to enter the rules area for labels

Second Step: Use the “Create rule in folder” button
As naming conventions should apply across all systems in my monitoring environment, I create these rules in the “Main directory”.


Fill in your settings

  1. A description of your Rule
  2. The label you want to attach to this set of devices (how labels work)
  3. Select explicit hosts, to write down a regular expression, selecting the focused host names. The “~” indicates, that a regular expression will follow.
  4. Don’t forget to save your changes.

After repeating the above steps for all of my weak naming convention entries, I have the following rule set.

Please note, that my label names follow some conventions, to make sure, that there is no chaos introduced in the setup of checkmk.

Summary:

Labels can be implied dynamically to the hosts in your checkmk monitoring environment. Tags can’t be applied in this way. Labels can be used later on, while defining monitoring rules, views, dashboard, filters and so on. Labels are a good mechanism to group hosts in different dimensions.

Discover new devices in a Network with checkmk

After having installed checkmk it is one of the first tasks to discover all devices in a given network.

To do so please follow the steps below:

Enter Setup Hosts

Create a stage folder to place newly discovered hosts in.

Click Add Folder
Type in the Title within the basic settings and press the save button

To handle several networks, several sub-folders will be created (this is optional).

Enter stage folder and click Add folder again

Type in your specific settings and don’t forget to press the Save button
  • Select Network Scan
  • Add new IP range
  • Select Set IPv4 address
  • Set criticality host tag to “Do not monitor this host”

The Scan Interval is important, if you want detect new device quickly.

There are a lot of other settings possible. Please consult the documentation for further details (Section 6 of the linked article).

Embedding Monitoring into Existing Organizations

To ensure the success of an IT Monitoring project implementation in the long-run, the monitoring should be embedded into the existing organization with clear responsibilities and roles for each stakeholder.

In the slide below a sample organization is shown, with a typical matrix organization. This is often seen in enterprises.

There are four groups in focus regarding the monitoring deployment:

  • Users
    Most enterprises earn their money with these group of people. They expect a running application, performing well and supporting them getting their job done. These could be internal or external users or both.
  • Business
    This group are the stakeholders of the business processes and focus on getting things done, to maximize the contribution to the companies, they work for.
  • IT Application
    These people design, code, test and deploy the required applications.
  • IT Infrastructure
    This group of people are responsible to provide and run a reliable IT infrastructure and architecture for the future, including internal and external cloud platforms.

These definitions are samples, seen with different customers over the last years. Other organizations my have other setups working for them very well.

The point I want to highlight is, how to embed the monitoring into such an organization:

  • Support Level 1
    While the business support organization supports the questions from the user’s using the applications and processes supported. Often the product catalog is in focus, the order process itself, the configuration of product and so on. The questions are business driven rather than IT technical. IT technical question are rerouted to the IT support desk.
    The IT support desk is in charge for all questions, regarding IT related objects. This includes assistance for program usage help, user login issues, performance issues and so on.
  • Support Level 2
    This function is covered by an operations center. This organization keeps IT services up and running, monitors the state of key resources, applications, network and all other stuff making up a well performing IT environment. They are in charge to initiate requests to level 3 if problems arise, they can’t handle by themselves. Regarding monitoring, these people are the power user of the monitoring solution. All acquired data is used here, to provide a comprehensive overview about the IT status and to enable decisions, how to handle given situations.
  • Support Level 3
    Level 3 has several teams, contributing to this mission. Developers and system programmers have to collaborate to get solutions for serious problems arising while operating the application.
    • System programmers are in charge to fix problems with hard- and software packages and their configuration.
    • Developers have to deal with issues originating from self-developed application code.Today, these roles are often combined in so called DevOps teams. These teams are responsible to perform deep dive analysis in case of application errors, including log analysis, detailed performance measurement and threshold definition. They also have to continually develop monitoring thresholds to increase alarm accuracy. They install new monitoring tools and integrate these into the existing solution. DevOps teams always keep in mind, that any new technology requires also a new monitoring review and potentially a new monitoring component.

Monitoring should be embedded into IT management processes, best described in ITIL. Incident management, problem management and change management are the disciplines in focus. For more details on monitoring and ITIL see the Process Symphony Knowledge Base.

As I described in my blog entry Finding the best Monitoring Solution, monitoring is a process rather than a status. It is a never ending iteration of requirement review, software upgrade, solution design and execution. That’s why it is so important to embed monitoring into IT organization’s daily business.

Monitoring Focus Areas

Monitoring discussion with out a product focus.

Discussing with customers the expectations they have, regarding a monitoring system are pretty different across departments involved. As mentioned in the blog Finding the best Monitoring solution, it is essential to understand theses requirements and have these needs in focus while moving on. The monitoring solution has to support a wide range of requirements.

So let us take a look on generalized requirements without thinking in products. The slide below shows four different work areas, often causing trouble in the daily business of IT departments.

  • Monitoring
    This is the core area of a monitoring system. Gather data about any resource, regarding performance and availability.
  • Prediction
    While feeding time series data from the monitoring component and other sources (like the event management system or log analysis, this system enables to detect anomalies earlier and helps to differentiate between common behavior and unexpected processing.
  • Event Management
    Events are consolidated and correlated in one place, decreasing the complexity to follow in SOS situations where multiple components are involved and are monitored with different tools. This discipline is tremendous important, because most customers are using specialized tools to monitor their IT infrastructure, their IT network, the user experience, the cloud components/resources and so on.
  • Log Analysis
    Most IT issues are fixed by consulting the required logs. Wide spread logs hamper this activity. It is very important to collect and consolidate these logs in one place and make them actionable to search, scan, sort and analyze these information sources efficiently.

Monitoring is more than sensing the availability and performance of a single IT resource. It is essential to do so, but several other steps have to follow.

Most IT monitoring projects fail over time, because the value for the business can’t be manifested over time. IT monitoring is not a short shot project. It is more a travel which never ends. It has to change as the IT landscape changes. That means, it must be as flexible as possible to support the requirements of tomorrow. A dedicated responsibility should be defined, that supports the monitoring requirements over time. Today, DevOp departments exactly take over this role – but it helps, having a common monitoring infrastructure on hand, ready to be used, flexible enough to integrate new tools.

Finding the best Monitoring solution

Finding the best fitting monitoring solution is not an easy exercise. The market is in a fast shift and a large number of solutions is out there. Find a huge list of monitoring solutions at “100 Top Server Monitoring & Application Performance Monitoring (APM) Solutions”.

This list is a good starting point to review the different choices found in the market. All mentioned products have their strengths and weaknesses in different areas. Each solution – let us call it product – has its own focus areas:

  • Network monitoring
  • Time series processing
  • Logging
  • Application Performance
  • System Monitoring

It is important to start with the expectations of the users and stakeholders:

  • Requirement specifications
    • System Monitoring
    • Cloud Services Monitorng
    • Usage Monitoring
    • User Response Time
    • Transaction Tracking
  • User Stories & Playbooks
    • Define usage scenarios
    • User stories can help
    • Play books help prioritize activities
  • Review the existing solution(s)
    • Gap analysis
    • Efficiency
    • Serviceability
    • Support
    • Knowledge base and familiarity
  • Solution design
    • Agile Development
    • Iterate Frequently
    • Collaborate with the different teams in your organization

Having a system management solution is not a status you reach, it is more like a tour. You can’t buy a solution, install it, implement it and be happy for ever. It is more like a travel with good equipment making things easier. But the things have to be done. This leads to the awareness, that your organization has to support that function.

System management is a daily business which has to support the organization’s needs and be agile enough to fulfill changing requirements in an elastic environment.

So what is the best solution now for you and your organization?

Well, it depends on the outcome of the workshops, where the above questions are discussed and answers are defined. In most cases, there is not a single product, it is the combination of two or more products using interfaces to build a very customer specific solution.

Find a discussion of the monitoring focus areas in more detail under the blog entry Monitoring Focus Areas.

Eight steps to migrate IBM Agent Builder solutions

With the introduction of IBM Monitoring V8 a complete new user interface has been introduced and the agents also changed in the way how communicating with the server.

That implies, that all existing Agent Builder solutions you created have also to change. Agents created for your ITM V6 deployment have also to be adopted.

In this article I want to give a sample for one of my agent builder solutions, I’ve created for OpenVPN. I extend the solution in that way, that I can use it under ITM V6 as well as under APM V8.

The work to be performed is pretty limited. The documentation describes a few major prerequisites, to successfully deploy your agent to a IBM Performance Management infrastructure:

  • A minimum of one attribute group to be used generate an agent status overview dashboard.

  • These single row attribute group (or groups) will be used to provide the four other required fields for this overview

    • A status indicator for the overall service quality

    • A port number, where the service is provided

    • A host name where the agent service located

    • An IP address where the monitored service resides

For more details, please consult the documentation.

In my situation, these attributes were not provided with my ITM V6 agent builder solution. So I expanded my existing solution:

  1. Changing the version number of my agent builder solution (optional)

  2. Create a new data source “openvpnstatus_sh”, which is a script data provider delivering one single line with all attributes defined to it.
    Frame1

  3. The attribute “ReturnCode” will be used later on to describe the overall status of my OpenVPN server. So I have to define the good value and the bad value (see documentation for more details)

  4. Make sure, that the check box under Self Describing Agent is activated.

  5. Run the Dashboard Setup Wizard to produce the dashboard views

    Please make sure, that you checked “Show agent components in the dashboard”! Otherwise…

  6. Now you can select the different values:

    1. Status

    2. Additional attributes for the summary widget

    3. Select the attribute groups to be displayed in the details view for the agent