How do you design your workload so that you can understand its state?

When designing your workload, it’s key that you do so in a way that allows you to understand its state at any point in time. Your workload should provide enough information across all components, for example, metrics, logs, and traces. Reliable and detailed information will enable you to provide effective responses when appropriate. In this article, we will explore different design aspects that will contribute to your understanding of your workload’s state and share relevant resources for your reference.


Implementing application telemetry

Your application code should be instrumented to emit information about its internal state, status, and achievement of business outcomes. Information such as queue depth, error messages, and response times will allow you to determine when a response is required.

You can collect metrics and logs both from Amazon EC2 instances and on-premises servers using the CloudWatch agent. Amazon CloudWatch can be defined as a metrics repository. This AWS service records metrics into a repository and allows you to retrieve statistics based on that data. CloudWatch also allows you to input custom metrics so that you can draw the statistics you need in accordance with your business requirements.

Useful resources:

What is Amazon CloudWatch?

How Amazon CloudWatch works

Using Amazon CloudWatch metrics

What is Amazon CloudWatch Logs?

Using CloudWatch Logs with container instances

Accessing Amazon CloudWatch Logs for AWS Lambda

Publish custom metrics

Gaining better observability of your VMs with Amazon CloudWatch - AWS Online Tech Talks


Implementing and configuring workload telemetry

Your workload should be designed and configured to provide information about its internal state and current status. By being able to monitor information such as API call volume, HTTP status codes, and scaling events, you will be able to accurately determine when a response is required and how to act most effectively.

Useful resources:

Amazon CloudWatch metrics and dimensions reference AWS CloudTrail What Is AWS CloudTrail? VPC Flow Logs


Implementing user activity telemetry

A clear understanding of user activity is key to determining when a response is required and to inform your decision-making to provide the most effective response. Because of this, you should design your application code to provide you with relevant information whenever you need it. When looking at user activity you should be able to record and monitor things like clickstreams or started, abandoned, and completed transactions. This information will allow you to piece together how your application is used, patterns of usage, and pinpoint instances in which a response might be required.


Implementing dependency telemetry

Most things can’t exist in a vacuum. There can be a number of resources your workload depends on to work effectively, such as external databases, DNS, network connectivity, and external credit card processing services. When designing and configuring your workload, you should take the necessary steps to ensure that you can obtain information about all the resources it depends on.

Useful resources:

Amazon CloudWatch Agent with AWS Systems Manager integration - unified metrics & log collection for Linux & Windows

Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent


Implementing transaction traceability

When determining when a response is required and correctly identify the factors that might contribute to an issue it’s essential to have information about the transaction flow across your workload. To achieve this, you should make sure you design your application code and configure your workload components to provide you with information such as transaction stage, active component, and time to complete the activity. This will enable you to determine which activities are in progress and which are complete, and understand what the results of completed activities are.

Useful resources:

AWS X-Ray

What is AWS X-Ray?