tl;dr - We saw a noticeable decrease in our AWS bill by batching CloudWatch metrics.
Insight into all aspects of our company is core to Mixmax, as such we love our metrics. We have metrics all the way from product right through internal development. We put metrics on everything. One place we store and inspect a lot of our engineering metrics is AWS CloudWatch as it allows us to seamlessly integrate metrics into our alerting and monitoring system.
CloudWatch data metrics are awesome because you can add extra dimensions to them. This gives you the ability to segment them visually in the CloudWatch dashboard and to build highly granular alerts. Since the AWS CloudWatch API is also pretty easy to use, we can programmatically build alerts and dashboard when we deploy the gathering of a new data metric to AWS.
Why do we need to batch them?
For a long time, we sent metrics to AWS as soon as they happened and everything was happy. As we began to scale ever and ever larger however, we found that there was a default 150
put-metric-data calls per second rate limit, so we decided to batch our requests. We didn’t want to have to jump through any hoops in modifying our code to do this when sending requests to CloudWatch, so we open sourced a super easy to use Node module for batching these
put-metric-data requests: cloudwatch-metrics.
By default, the library will log metrics to the us-east-1 region and read AWS credentials from the AWS SDK's default environment variables. If you want to change these values, you can call initialize:
Creating a metric is pretty basic, we simply need to provide the namespace and the type of metric:
We can also add our own default dimensions:
The metric constructor also accepts a set of optional arguments to control: whether we actually send the metric (useful for dev environments), a callback in case a request to CloudWatch fails, the default interval to wait before sending metrics and a max capacity of events to buffer before we send to CloudWatch (useful if you’re buffering a lot of events in a bursty fashion).
Sending metrics to CloudWatch
Sending data for a metric to CloudWatch is then extremely simple:
The only to keep in mind is that data is sent asynchronously to the server, so when this function is called it will not immediately send data to CloudWatch. It will wait for the
sendInterval to expire or for the
maxCapacity to be reached, whichever happens first.
Why you should batch your CloudWatch metrics
As we said before, there is a rate limit on how many data points you can send for your CloudWatch metrics. You can of course have this changed, but as you’re also charged per
put-metric-data request you can save a lot of $$$ by batching your requests. In fact, we saw a very noticeable decrease in our month AWS bill! The one constraint to keep in mind with this, is that POSTing
put-metric-data calls to AWS CloudWatch is capped at 40KB per request, so there is a limit to how large a request can be.
Enjoy simplifying infrastructure costs without jumping through hoops? Drop us a line.