Well, nobody's heard from me in quite a while because I've been hard at work building my server monitor . Sorry, I suck. I have been, among other things, trying to get charting right. It seems that while everybody has a different monitoring tool, most of the open source monitors use either MTRG or some other sort of pre-built image for their charts. The problems with that approach are numerous. Like you might have to change the data to avoid big outliers to be able to view smaller details. Or often times you just get a simple averaged chart when minimum or maximum values are more interesting. Without real time graphing, you might have to wait several minutes or hours to see changes on your chart. For Leemba, my monitor and all- around time sink, I decided on a hybrid approach. Charts for overall historical values are generated nightly. But recent data is charted on the fly with flot . It uses the <canvas> tag to basically draw an image with Javascript. As it turns out, client-side charting can be dramatically better. When it's not possible to anticipate the client needs on the server, like charting arbitrary time ranges, logic can and should be pushed to the client. Fortunately for us, browsers are getting a lot smarter. For example, here's a chart of Twitter.com response times from noon December 15th, to noon December 19th:

Twitter Response Times

We can see increased response times peaking around noon every day. The maximum response was 10047ms and minimum for this period was 36ms. By default Leemba will display averages. Without some sort of downsampling, every minute in the range would be a point on the chart. It's just too much data to serialize and ship to the browser fast enough. As shown in the top-left options panel, the current period is 30 minutes (i.e. every 30 minutes is one point). In many monitors, that'd be it. But with the wonders of Flot, Leemba can make charts on the fly. For one, we can change the aggregate method, one of average, minimum or maximum and the chart will update right away. Also, checking the "Show Standard Deviation" box will add a line for the deviation and a second Y axis. It can be useful to spot erratic data. You can do a couple things to deal with outliers. The default mouse mode lets you select a new time range by just clicking and dragging on the chart itself. That way the chart can just exclude the time period. The other mode lets you zoom and pan the chart. Either way, the smaller details can be viewed without changing historical data. Selecting time ranges is also a great way to see even more detailed charts. With a small enough time range, Leemba won't need to downsample any data but will show the raw points. Downsampling depends on how often the tests run. For example, we can go back to the 17th when Twitter was hacked (*) without averaging:

Twitter Response Times

Lessons learned

Flot is pretty good but I had some problems along the way. For one, Flot only displays UTC times by default, which is not terribly helpful for us mere mortals. To fix that I currently loop the dataset and apply the time zone difference to each point and provide custom date formatting functions. (This is known .) Unfortunately, zoom, pan and selection modes in Flot are pretty much exclusive. So that's why the "Mouse Mode" is selectable. Even more unfortunate, that forces a redraw of the plot which resets the current zoom and pan settings. jqPlot was started just a few months ago and solves some of these problems, but it doesn't yet deal with null values, like used in the above chart. Flot is pretty quick. And it's nice to not have to worry about the flash plugin crashing as it so often does on Linux. Even on IE with the excanvas emulation, it performs. Drawing times on IE are not unlike loading and displaying a flash chart, so it's acceptable. Zooming is a bit slow, but then... Oh, well. It's IE. Browser support has been a non-issue. Even IE6 works without problems, and the Android and iPhone browsers display them just fine. Of course, the database is still the bottleneck here. And the difficultly in pulling out information from a standard relational database fast enough is undoubtedly why most monitors depend on pre-generating images. I've employed a mess of performance tricks in Postgres to make this work. I'll have to detail those in another post. If you want to check out Leemba, the Open Source project is hosted by Sourceforge . It's not ready for end users yet though. :-)

To be fair, it sounded like a DNS exploit that could happen to just about anybody.

blog comments powered by Disqus