AWS Monitoring Tools and Best Practices: Monitor What Matters

Written by Cloud Insights Team | Dec 10, 2020 1:10:24 PM

What Is AWS Monitoring?

Amazon Web Services (AWS) monitoring is a set of practices you can use to verify the security and performance of your AWS resources and data. These practices rely on various tools and services to collect, analyze, and present data insights. You can then use these insights to identify vulnerabilities and issues, predict performance, and optimize configurations.

This is part of an extensive series of guides about performance testing.

In this article, you will learn:

AWS First-Party Monitoring Tools
AWS Third-Party Monitoring Tools
Steps for Successful AWS Resource Monitoring
AWS Monitoring Best Practices

AWS First-Party Monitoring Tools

There are multiple services and utilities available from AWS that you can use to monitor your systems and access. Some of these tools are included in existing services, while others are available for additional costs.

AWS CloudTrail

CloudTrail is a service that you can use to track events across your account. The service automatically records event logs and activity logs for your services and stores the data in S3. Collected data includes user identities, traffic origin IPs, and timestamps. You can view all management events for free for the most recent 90 days. Data events and insights based on your data are also available for an additional fee.

AWS CloudWatch

CloudWatch is a service you can use to aggregate, visualize, and respond to service metrics. CloudWatch has two main components: alarms, which create alerts according to thresholds for single metrics, and events, which can automate responses to metric values or system changes.

AWS Certificate Manager

Certificate Manager is a tool you can use to provision, manage, and apply transport layer security (TLS) and secure sockets layer (SSL) certificates. These certificates are used to prove your services or devices' authenticity and enable you to secure network connections.

Amazon EC2 Dashboard

EC2 Dashboard is a monitoring tool for the Amazon EC2 virtual machine service. You can use this dashboard to monitor and maintain your EC2 instances and infrastructure. The dashboard lets you view instance states and service health, manage alarms and status reports, view scheduled events, and assess volume and instance metrics

AWS Third-Party Monitoring Tools

In addition to native tools, many AWS users also adopt third-party tools. These tools are useful for separating monitoring operations from your primary resources and can often provide support for hybrid or on-premises resources as well.

NetApp Cloud Insights

NetApp Cloud Insights is a tool for monitoring that you can use to visualize your infrastructure.It enables you to monitor, optimize, and troubleshoot resources in public and private clouds and on-premises. Cloud Insights includes features for conditional alerting, optimization recommendations, predictive analytics, machine learning based anomaly detection, and compliance auditing.

SolarWinds AppOptics

AppOptics is a tool that you can use to supplement metrics collected by CloudWatch. It enables you to track performance statistics, log trends, and capacity limits. You can integrate AppOptics with other AWS services and generate automatic analyses of your operations. AppOptics also includes features that enable you to monitor multiple AWS accounts from a single interface.

Zenoss ZenPack

ZenPack is an open source tool you can use to aggregate CloudWatch metrics and external resource metrics data. It includes an easy to use graphical user interface (GUI) and is compatible with a variety of AWS services. These services include S3, Amazon Virtual Private Cloud (VPC), and Amazon Suite.

Zabbix

Zabbix is an open source tool for collecting metrics from AWS and a variety of other applications, services, and databases. It includes features for dashboards, alert escalation, and a robust online community of support. The downside of Zabbix is that it cannot import data or generate performance reports.

Weave Scope

Weave Scope is an open source tool you can use to monitor and visualize your microservices. It includes features for service discovery and is compatible with Elastic Container Services (ECS). Weave Scope is based on three components (an interface, an app, and a probe) and enables you to troubleshoot service performance in real time.

Steps for Successful AWS Resource Monitoring

Phase A: Assess Your AWS Monitoring Needs

Before introducing monitoring into your pipeline or making changes to your existing workflow, you should carefully assess your existing infrastructure, tooling, resources, and skillset. Taking the time to assess your situation can help you develop a strategy that suits your needs.

Step 1: assessment questions
Here are key questions to ask when assessing your AWS monitoring needs:

Infrastructure—where is your network located? Is it on-premise? Do you want a dedicated monitoring system for each environment or do you want to do on-premise monitoring with cloud monitoring using one tool?
Compliance—what are your current compliance policies? What legal percussions do you need to take in order to comply with industry standards? Can you introduce a SaaS monitoring and logging solution into your ecosystem and remain compliant?
Inventory—do you need a new tool for AWS monitoring or can your current stack perform this task?
Complexity—what are the complexities and costs involved in removing any and all legacy agents from all servers, to clear space for the installation of new agents?
Metrics—do you know which metrics you absolutely need to monitor, and which metrics might be redundant?

Step 2: develop a strategy to tag AWS resources
Once you gain insight into your current monitoring needs and prioritize metrics, you can start developing a strategy for tagging AWS resources. Tags help you keep track of your resources, and monitor usage and behavior.

If you don’t have a tagging system in place, it can take some time to figure out how to organize resources. While every project and organization is unique, it is important to create a tagging system that can be used by a wide variety of professionals and collaborators. This way, all relevant parties can gain access to monitoring insights when needed.

Phase B: Select the Right Solution for Your Organization

After assessing your needs and setting up a tagging system for AWS resources, you can look for the solution that suits your needs. Often, it is effective to start with a simple solution and then expand as needed. However, if you know in advance you need a robust set of features, it’s best to go with a solution that either fits your needs, can be scaled easily, or meets all criteria.

Step 3: start simple with Amazon CloudWatch
CloudWatch metrics can help you monitor practically any AWS resource. CloudWatch provides a wide range of pre-built counters like DiskQueueLength and CPUUtilization. Some AWS services, such as RDS and EC2, can provide additional counters when integrated with CloudWatch.

CloudWatch counters enable you to create dashboards, which you can leverage when you need visualized data. In addition to counters and dashboards, CloudWatch offers an alerting system, which lets you know when incidents occur. If you are not using a dedicated monitoring system, and you need simple features, you can use CloudWatch.

Step 4: leverage best-of-breed solutions
When it comes to visibility, the more resource types you monitor, the more you can ensure the performance and safety of your assets. However, not all monitoring systems can provide visibility for all resources. Some monitoring solutions are designed for infrastructure while others are built for network traffic.

To avoid losing visibility over parts of your environment, you can either use a stack of tools or you can extend the capabilities of existing systems. If you opt to use a stack of monitoring, you might want to first check that the tools provide the features you require and are compatible with each other and your existing stack.

Additionally, you should consider adding a tool to centralize the stack, to ensure productivity remains effective. If you choose to extend existing systems by installing plugins or integrating with APIs, you should enable AWS integration and ensure that each extension is compliant with any regulatory requirements you are legally required to uphold.

Phase C: Capture Logs

Once you set up your monitoring solution or stack, you should decide which logs you want to capture and how you want to set this up. Logs are highly effective for keeping track of compliance requirements and troubleshooting issues.

Here is a list of logs you might want to capture:

Database logs—help you detect queries that are slow to run.
Application logs—point out application failures.
AWS CloudTrail—detects API calls made to AWS.
Elastic Load Balancing and host logs—might indicate availability or latency changes.
OS logs—can identify host failure reasons.
Web server logs—as well as firewall logs and VPC flow logs can detect patterns of access and attacks.

The majority of monitoring systems are either suited for metrics or logs, rather than prioritizing both of these tasks equally. To ensure full coverage, you should either use a stack or find a solution that enables you to capture both metrics and logs from AWS.

AWS Monitoring Best Practices

When monitoring your AWS resources, the following best practices can help you ensure that no resources are overlooked and that you can troubleshoot efficiently.

Use Automation Where Possible

Production deployments in AWS are typically too large and dynamic to monitor manually. The volume of metrics and log data that is generated is too large for humans to efficiently analyze. To ensure that critical data is not missed and responses are timely, you should use automation to handle most of your monitoring tasks.

Create Policies to Define Priority Levels

Prioritizing monitoring tasks helps ensure that critical services remain operational and that data remains protected. Additionally, prioritizing alerts or alert categories helps ensure that IT teams effectively distribute their time and efforts.

Resolve Problems Early On

Monitoring data should be used to respond to issues like potential service interruptions proactively. It is much easier to scale resources or throttle traffic in advance than manage a service outage. Additionally, addressing potential issues early on can help you avoid wasted resources and costs.

Use the Cloud to Your Advantage

Cloud environments are flexible and can enable you to experiment with configuration changes without affecting services. When optimizing based on metrics, take time to test your configurations. This way, you can verify if changes are more efficient before implementing them in production.

AWS Monitoring with NetApp Cloud Insights

NetApp Cloud Insights is an infrastructure monitoring tool that gives you visibility into your complete infrastructure. With Cloud Insights, you can monitor, troubleshoot and optimize all your resources including your public clouds and your private data centers.

Cloud Insights helps you find problems fast before they impact your business. Optimize usage so you can defer spend, do more with your limited budgets, detect ransomware attacks before it’s too late and easily report on data access for security compliance auditing.

In particular, NetApp Cloud Insights lets you automatically build topologies, correlate metrics, detect greedy or degraded resources, and alert on anomalous user behavior.

Start a 30-day free trial of NetApp Cloud Insights. No credit card required

Learn More About AWS Monitoring

AWS Monitoring Best Practices
Monitoring cloud environments can be quite different than on-premises ones. These environments are dynamic, highly distributed, and inherently more vulnerable to cyber threats. To ensure that you are applying the proper strategies when monitoring your cloud resources, it is important to make sure you are following best practices.

This article explains what AWS monitoring best practices are, how monitoring in AWS works, and highlights 6 best practices for ensuring effective monitoring in AWS.

AWS Monitoring Dashboard
Dashboards are an effective way to centralize your metrics monitoring and provide information to teams quickly. You can use these tools to ensure that your entire team is working from reliable information or to share the status of your operations with executive and shareholders.

This article explains what AWS monitoring dashboards are, the components of a dashboard, provides two tutorials for creating dashboards, and highlights some best practices.

CloudWatch Monitoring
Monitoring your AWS resources is one of the best ways to ensure that your services and applications remain performant and cost effective. To make this monitoring easier, AWS offers a service called CloudWatch which you can use to collect and visualize metrics across your services.

This article explains what CloudWatch monitoring is, how CloudWatch works, some key concepts to know in CloudWatch, and highlights a few metrics to watch for EBS and EC2.

Cloudwatch Log Insights
The ability to query and interpret logs enables you to derive greater insights from your data. It also enables you to diagnose and identify issues or opportunities for improvement faster. In AWS, you can use CloudWatch Logs Insights to perform these tasks and ensure your operations continue smoothly.

This article explains what CloudWatch Logs Insights is, how to get log data to the service, what the syntax for queries is, and how to perform a sample query.

Monitoring the Costs of Underutilized EBS Volumes
Overprovisioning your resources can eat away at your carefully planned cloud resource budgets and limit the amount of value you gain from services. To prevent wasted costs, it’s important to make sure that your resources are right-sized for your operations and are being used efficiently.

In this article you’ll learn how to find underperforming resources in EBS, how to evaluate your resource use, and how to apply metrics to improve your resource efficiency.

5 AWS Monitoring Best Practices You Must Know

Monitoring cloud environments can be quite different than on-premises ones. These environments are dynamic, highly distributed, and inherently more vulnerable to cyber threats. To ensure that you are applying the proper strategies when monitoring your cloud resources, it is important to make sure you are following best practices.