Graphite is a toolset for distributed metric collection from nearly infinite sources. Everything being able to send an ip package can report metrics to it. It consists of multiple components, each for a specialized task. Basically there is carbon-relay, receiving metrics and forwarding them in a special manner. It can be configured to spread metrics to other instances using a distributed hashing algorithm or to simply duplicate them. Another component is the carbon-cache, which persists metrics to so called whisper database files and holds as much as possible of them in a ram cache to fasten queries on them. For quick and easy access to the metric data there is graphite-web, a powerful tool allowing simple graphing of everything using wildcards, overlaying and a powerful set of functions, usable by a very nice url-based api.
After some month of beta-testing graphite we decided to make it the backend for a new monitoring architecture, measuring nearly everything but alarming only on some of the metrics. This prevents redundant data gathering and enables to do really nice stuff like self-learning anomaly detection on this huge amount of metrics.
While the testing environment consisted of two servers, both persisting all metrics received, we needed something bigger to handle the huge amount of data we wanted to collect. There were some requirements to fullfill:
- geo-redundant data storage (saving metrics on two datacenter locations)
- possibility to handle 500.000 metrics/min or bursts of 1.000.000 metrics/min
- capabilities for wave-deployments of new software versions without impact for read or write access to the cluster
- distributed i/o load to prevent massive load on single hypervisors
- possibility to easily extend the cluster if necessary in the future
In respect to these requirements we tried out different setups with different disadvantages. The only one matching all requirements is shown below.
A dns view based service address, resolving in dependency to the location of the system asking for it, points to the responsible loadbalancer. The loadbalancers route requests to the first level of carbon-relay instances, two of them per node. These first level relays duplicate data to their partner node in the other datacenter. Second level relays are configured to distribute metric data to a number of carbon-cache instances in the same location. This allows redundant and distributed storage of data.
This is realized by separted config files for each instance, configuring different ports and needed parameters for metric routing. Specialized multi-init-scripts iterate over configured instances for service start/stop/status. All carbon-cache instances write into the same folder on one node. Using consistent hashing not only distributes i/o load but also ensures that there will be no write of two cache instances to one file at a time. A metric (and its whisper data file) is always handled by the same cache-instance on a node.
On each node there is a graphite-web instance running, knowing all carbon-cache instances on the same node for access to local metrics. Furthermore there are all other graphite-web instances on the other nodes configured as „cluster_servers“ to be queried if graphite-web couldn’t satisfy a request by querying local carbon-cache instances. This ensures that every graphite-web instance is able to access all the data stored in the cluster. This allows read access to be load-balanced (round-robin).
The following graph displays the total number of metrics receiving the cluster (green line), meaning the metricsReceived metric, summarized for all first-level relay instances. The red line represents a summarization of metricsReceived on the carbon-cache instances. The red line should always be as twice as the green line cause of the geo-redundant data-duplication inside the cluster. In consequence 300.000 metrics per minute require the caches to persist 600.000 metrics. The purple line is located directly behind the red line, which means that every metric receiving the second-level relays is also persisted by carbon-caches. These simple graphs tell a lot about the health of a graphite cluster.
If you take a closer look inside the cluster and the single instances, there are some interesting facts to find. While carbon-relay hashes the metric target name (web.devweb01.system.cpu.iowait) inside its consistent hashing algorithm, there should be a slightly equal distribution of metrics over all carbon-cache instances, if there is enouth accident in the metric names. But reality looks a little bit different. The following graph shows metricsReceived for all carbon-caches while sending accident metrics by a simple performance testing tool.