By Allen Fung, Senior Software Engineer
At ShareThis, we use Graphite for application monitoring. Here’s a typical graph from Graphite.
This shows the “events per minute” that an application has consumed over time. As you can see, no records were consumed for a few hours on Monday afternoon. By combining release markers with the graph above, we can see that consumption stopped due to a bad release. Here’s how this looks.
While implementing the release marker, we found that the name of the marker cannot be any arbitrary value. Instead, it needs to be suffixed with “.count”. This is because the xFilesFactor for “.count$” is set to zero by default, but not for other patterns. If a metric has a non-zero xFilesFactor, its values are only guaranteed to be visible for the last 24 hours. The xFilesFactor can be configured in the following file.
/opt/graphite/conf/storage-aggregation.conf.
Here’s the actual command that we used in auto-deploy to generate the release marker.
echo release_marker.$APPLICATION.$HOST.count 1 `date +%s` | nc -w 2 graphite.ops.sharethis.com 2003
The above command sets the value of the metric to the number one at the specified time. If we try to just display the metric, we won’t see anything, because its values are too small. To make the metric visible, we need to pass it into the drawAsInfinite function. This will cause each non-zero data point of the metric to be displayed as a vertical line.