AWS X-Ray - Application Performance Monitoring

Overview

It’s no secret that DevOps teams must be able to quickly and efficiently debug issues in their operational environments, and a key metric in the maturity and quality of the DevOps team is the time it takes them to identify and resolve production issues. Application Performance Monitoring (APM) is a key component to this capability, and AWS X-Ray gives DevOps teams all of the features of a leading APM solution in an easy to implement managed service. 

With AWS X-Ray, the DevOps team can dig deep into the internals of their application to understand where errors are occurring, bottlenecks or performance degradation is happening, and how downstream dependencies are impacting the applications responses. It provides advanced application tracing with a wide variety of programming languages, and is ideal for microservice architectures and identifying problems that cross serialization boundaries.

Feature Highlights

There are too many extremely useful features in X-Ray to cover in a single post, but 3 that bubble to the top of the list are: 

  • Service Map
  • Latency Detection
  • Trace Mapping

The service map is an amazing feature that provides visualizations of how requests move through an application, dependent calls that are made with their response times, and essentially provides an overview of how communication flows through a microservice or service oriented application. It also provides near-real-time data about the health of dependent services, showing if requests to an external service start failing or getting throttled. It’s a screen that should be a part of every DevOps team’s operational dashboard.

Latency detection is a feature of X-Ray that alerts the team to changes in the latency of requests from client to server. Many teams use tools like Google Analytics to monitor the client-side performance of their application, but that only gives the engineer a small picture into what is happening. With X-Ray, the team could be alerted to increased latency on the client-side, and can easily trace that through the application to find the root cause. Additionally, latency detection could be used to identify traffic coming from new regions that may indicate a need to scale out the application to those regions. For example, the team may see sub-second response times from requests coming from the west coast, but requests from Florida may be taking 3 or more times as long. If that’s the case, the team could quickly determine if they need to scale the application out to us-east-1 to reduce the latency to an acceptable level or use other tools to resolve the performance issue. Being able to associate client-side performance to server-side activity allows the DevOps team to understand the cause of latency and make the appropriate choice in how to resolve it, whether that means scaling in the same region, resolving an issue with an external service, or deploying the application to new regions.

Trace Mapping is another invaluable feature of X-Ray. With Trace Mapping, the DevOps team can target a single request trace through the entire application, from the client through all of the services it interacts with. In microservice environments, it can be very difficult to pinpoint what is causing an unwanted behavior as requests travel through many services and interact with multiple 3rd parties. With Trace Mapping, it is easy to see exactly what services a single request interacts with, how each of those services performed in the context of that request, and ideally pinpoint issues that would otherwise be nearly impossible to troubleshoot.

Some people might assume that simple log aggregation solves this problem, but it certainly does not at scale. Imagine an environment with 100s of instances of a service and intermittent failures occurring, maybe because a single compute node or even a single instance of the service was not functioning properly. With 1000s of logs per second coming it, the information the DevOps team needs becomes buried in the noise from other, healthy instances. 

Using AWS X-Ray

As we previously discussed, AWS X-Ray works with a variety of programming languages to provide leading edge APM for services deployed to AWS. Whether the team is programming in Java, .NET, Go, or many other languages, getting started with X-Ray is basically as easy as installing a 3rd Party SDK in the application and applying configuration. The team may need to deploy the X-Ray agent on the servers also, depending on how the servers are provisioned. After SDK installation and configuration, it’s as simple as generating traffic on the application and monitoring it on the X-Ray console.

 

After the team has monitored the application, alerts and notifications should be configured to actively alert the team of issues with the application. There is also an X-Ray API built into each programming language’s SDK that can be used for more advanced features of X-Ray.

APM With X-Ray

AWS X-Ray is an APM Service that all DevOps teams should be using to quickly debug issues in the operational environment, identify performance bottlenecks in their applications, and understand how communication flows through their applications. It provides invaluable tools to trace errors across serialization boundaries, monitor the performance of external requests, and quickly visualize the health of their applications.

New call-to-action