I recently stumbled upon Netflix Vector, a nice tool for deep insights in system performance counters at high resolution. It is based on PCP (http://pcp.io) for metrics collection which I wanted to have as RPM in version >= 3.10. Since…

I recently needed to send some custom metrics to AWS Cloudwatch. This is no rocket science but a lightweight example may safe some time implementing it.

AWS recently released it’s barely new CloudWatch feature for log analysis in eu-west-1 and us-west-1. It was started in us-east-1 first and got a rollout over to ireland yesterday. After installation and configuration of a small daemon called awslogs, instance logs appear in…

As mentioned before we run a four-node cluster for redundancy and charding. A pair of two nodes persists all metrics. To see if there is anything wrong, there is a monitoring check testing for double metrics, where they shouldn’t be….

After building a graphite cluster being able to handle a huge amount of metrics, there is some maintenance to be done on it. Updates, config-changes and so on. No problem so far but if there is a need for nearly 100% of uptime, some tricks are necessary.

Graphite is a toolset for distributed metric collection from nearly infinite sources. After some month of beta-testing graphite we decided to make it the backend for a new monitoring architecture, measuring nearly everything but alarming only on some of the metrics. This prevents redundant data gathering and enables to do really nice stuff like self-learning anomaly detection on this huge amount of metrics.