hamburger icon close icon

Why is Monitoring IT Infrastructure So Hard? Part 2

January 20, 2020

Topics: Cloud Insights Advanced6 minute read

In a previous article, we focused on IT modernization with regard to changes to roles and working practices and how to tackle these challenges from an organizational perspective. Now, we’ll build on that blog by focusing on tooling and how you can select the right tools to help you successfully manage a modern IT infrastructure.

Modernization of IT is generally synonymous with the usage of more diverse resources and services, including but not limited to those offered by public cloud providers. This presents a monitoring challenge, as the legacy tooling we’ve been using for years is no longer up to the task. But there are so many modern monitoring tools out there; how do you choose the right one?

What Tooling Will Meet Your Needs?

Two answers: “Nothing” and “it depends.” No tool will simply fit your organization right out of the box. You have to configure, train, tune, customize, and even, in some instances, develop it yourself to create a good fit. But how do you know that all the hours spent perfecting your tool are worth it?

In part 1 of this blog, we discussed legacy issues within an organization and how you have to create new routines to fit today’s methodologies. Routines also have an important say in what tooling you’ll choose. If nothing fits, it’s time to change.

A good fit is not just about altering and tuning tooling to align with your organization’s needs. It’s also about how a tool can scale. If you find yourself choosing between a good tool and a well-scaling tool, it’s crucial to think long term and ask yourself which will be the optimum fit in the end.

And, of course, one key consideration is simply: What does your team already know? What have they already done with similar technology? If your team is already highly skilled with Python, then it may be beneficial to look at tooling that will utilize their skills. In other words, look at what’s right in front of you; don’t just continue to do what you’ve always done when you’re facing totally new challenges.

How Does It Feel to Automate?

This is a critical area when it comes to tools for IT monitoring. Does automation of a certain task just not feel quite right for your team? It might be because not everything is accessible through the API, or you have to perform otherwise simple tasks in a round-about fashion.

Most modern tools are highly adaptable and accessible through APIs and other techniques and are made for automation. Some might feel more clunky than others, but before you cross automation off your list, look at your team, your routines, and what other organizations are already doing. You may just have to change something on your side to achieve that perfect fit.

What You Want vs. What You Need

A good place to start is to simply look at the scope you expect your tooling to cover beforehand. And some good questions that you should ask yourself to determine this scope are:

  • What scale do you expect to reach during the lifetime of the tooling?
  • Who do you expect to gain value from using the tool?
  • What functionality do you expect?
  • Can you extend the tooling if you need to?
  • Can you use other tooling to extend it if needed?
  • What problems are you aiming to solve?

Determining scope, and then sticking to it, is a good way to keep things from getting out of hand. It will save you from ending up with a Swiss Army Knife: a tool that can do a lot of things, but none of them very well. One important criterion to stick to? Make sure that you have the capability to extend your tooling at a later time by either developing your own functions or by integrating with other tooling.

Machine Learning: Do We Need it?

Let’s be honest: Machine learning (ML) is pretty cool. With all the development being put into this arena on a global level, it’s growing harder and harder to find anything that’s been left untouched by it.

Machine learning is not only cool, but also very powerful. And going forward, it will make up a large part of ops work. Should you decide to add ML to your strategy, it’s crucial that you set realistic expectations and aim for a proper implementation. ML is pretty good at telling you that something isn’t right. But getting it to tell you the actual problem tends to be a challenge.

Start small: Use machine learning to grab the low-hanging fruit, like detecting anomalies in logs and metrics. Keep ML in your toolbox in case you need it in the future.

Multiple Tools Are OK

It’s fine to use multiple tools. In most cases, we encourage it. Most tools are developed so that you can pick and choose what you use. And you can use all of those parts together to create the ultimate tooling to suit your needs. Similarly, it doesn’t make sense to force a single-pane-of-glass tool on your SMEs for low-level tasks when they’ve likely already identified the best specialized tool for the job.

That’s why it’s important that the tooling you choose can communicate with other systems—either using APIs, webhooks, events, or other methods of data output.

But don’t let things get out of control. You can define a clear strategy to map how the tools communicate with each other, utilizing event-driven design or other techniques.

Adding New Systems

Adding new systems is one of the most difficult considerations. How do you make your tooling support new systems and services? You need a well thought out process (see part 1); the tooling should make technology onboarding as straightforward as possible.

The streamlining and automation that you may need when it comes to adding new systems will be hard to standardize. But they should at least act as a good base with a common strategy when onboarding new systems. That’s why the routines that you create around tooling are of central importance: You will almost always fall back on those routines while onboarding. But remember that procedures that are too strict will slow you down and increase your technical debt.

Use tooling that supports a number of vendors out of the box and that has steady development and/or an active and involved community. Good tooling should enable change, not define it.

Lifecycle Management

Lifecycle management isn’t necessarily difficult, but it’s time-consuming and often overlooked in favor of more important tasks. Therefore, you also should consider the lifecycle management of your tooling and make sure to have routines and procedures that makes it as easy as possible.  

Configuration managers are a good example here. Chef, Puppet, Ansible, and other similar tools are all great when it comes to configuration management. Some require agents, and some don’t. But they are all widely used. When choosing among them, consider the agent and how it’s handled, but don’t let the agent requirement scare you away.

So How Do You Choose?

This is a complex question, but it all boils down to one thing: You choose what you think you can grow into.

Tooling should help you grow, and you should strive to be facilitated by your tooling, not controlled by it. It’s even more important to create routines around your tools that allow for modern infrastructure management.

Take small chunks and experiment; set up a lab and have sessions with your team where they can try out different tools. There is no easy one-way path, so letting your team collaborate and explore will give you a good start and make room some great ideas to form.

Open collaboration is ultimately the best way to prepare your organization for big changes in their monitoring strategy.

New call-to-action


Principal Technologist