SNMP and Monitoring Rant

This is a rant, it won’t be to long but…

So I just started playing with graphite¬†which is just all sorts of awesome. ¬†However most of the data I care about needs to come from SNMP, because it’s A, a SAN and I can’t just install some agents, or B) It’s a locked down Solaris box that’s got ancient versions of everything and won’t work with any of the new hotness. So I looked at collection agents that worked with graphite, and …. yeah nothing except collectd. Collectd sucks, I had nothing but issues getting it working, and it doesn’t play nice with graphite. I like graphite because it’s simple, a name, a value and a timestamp over a tcp port. Thats the sort of stuff that is awesome. collectD tries to do all this other stuff…and while I’m sure it’s great once it’s setup I don’t like it. I don’t use agents, because my servers have more important things to do than hang on a crappy plugin that doesn’t handle edge cases well.

Enough bitching though, I want to go back to the unix philosophy, do one thing, do it well. Graphite does one thing well, graphing, carbon does one thing well, getting data. SNMP does one thing…poorly…very poorly. Unfortunatley I have yet to find a decent system that can poll snmp data and present it in a useful fashion without having to go through a million steps configuring it. I don’t want my snmp poller to autodiscover my network and try to be smarter than me (OpenNMS I’m looking at you). I just want to say, these mibs, these hosts, this interval, go.

As a pet project I’m going to start working on one in python, until it’s ready for primetime I’ll deal with collectd and it screwing up my nice graphite naming scheme.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to SNMP and Monitoring Rant

  1. I got graphite + collectd to play reasonably well here (see link below)… you have to run 0.9.10 though to get graphite’s `scaleToSeconds()` function
    http://serverfault.com/questions/425828/graphite-snmp-counter-transforms

    BTW, do you have thoughts on why I need to use scale(, 0.125) in the question?

  2. admin says:

    ScaleToSeconds doesn’t do what it sounds like, basically it fixes the issues with aggregation, so if you specify 60, it will give a value for every 60 seconds, but not scale it. You still need to end up scaling by the time. So instead of just dividing by 8, the scaling factor should be 1/8/60 (or whatever your aggregation period is set to) or .0020833. That will give you Mbits/Sec. Let me know if that works.

  3. I’m polling my interfaces at the lowest aggregation interval… i.e. collectd sends data to carbon every 60 seconds, and I aggregate with this in storage-schemas.conf

    [default]
    pattern = .*
    retentions = 60s:1w, 5m:1y

    This seems to work fine, I am using scaleToSeconds(), so I can avoid issues like this…
    http://obfuscurity.com/2012/05/A-Precautionary-Tale-for-Graphite-Users

Leave a Reply

Your email address will not be published. Required fields are marked *


+ one = 10

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>