Archive for January, 2012


Uptime check frequency – why does it matter?

Since we starting planning our uptime monitoring service we wanted to offer something different – not just a me too service but a deal changer. In short we wanted to do uptime checking better than anyone else.

One of the big differentiators is how often a monitoring service checks your site. Is it once per hour? Once every 5 minutes? The industry consensus seems to be that once per minute is adequate for everyone. That’s an assumption we wanted to challenge. At first checking every minute may seem pretty ok. When you receive a downtime alert SMS, does it really matter that it came perhaps 50 seconds after your site went down? We say yes, and here’s why.

bar graph white

The Slashdot effect

This is a pretty common issue when running a site. You want lots of visitors, millions would be nice. You’ve paid for a server or hosting service which can deal with your normal amount of traffic but then a massive spike comes along from a popular link. Now your site should be serving huge amounts of requests. But it doesn’t, it falls over under the weight of the traffic.

If you know the site has gone down you may be able to quickly add capacity or deal with the influx by replacing some of the big images and keep it online. In that situation 50 seconds is too long to wait. You could easily have lost several thousand viewers. If you’re selling stuff how many sales will that lose you? We’d bet quite a lot.

Even for your regular traffic 50 seconds could mean losing important viewers. Not good.

Another time you need to know right away is when you’re doing updates to your site. Perhaps a piece of hardware is being changed out. Perhaps the server settings are being tweaked. In that situation isn’t it best to know immediately if there’s a problem? Especially since you’re probably sitting right in front of your computer ready to fix any issues.

History matters

The next reason for more frequent checks is viewing the uptime history of your site. The more often the checks, the better the history.

history

When viewing charts like the one above it’s important to know the figures are accurate. Is your service provider really maintaining the uptime you expect? Is it time to switch? Better data allows you to make better decisions.

So for these and many less dramatic reasons we decided that 1 minute checks simply aren’t good enough. We’re developing our uptime monitoring to check every 30 seconds. That’s better than anything we’ve seen anyone offer. In fact it’s 2x better than the existing market leaders.


Let’s talk a little about confirmations

stationsThere’s another thing to think about when trying to get meaningful downtime alerts as fast as possible: False positives and how we deal with them. A lot of existing services will send you an alert once more than one of their monitoring stations has checked your site is down. This is because monitoring stations themselves aren’t infallible. The network could be flaky near one of the stations and that’s why it sees your site as down. So it’s a good idea to get another station to confirm before alerting you.

The problem with a lot of services we’ve seen is that they’ll do this in their own sweet time. A station will check your site, see it’s down and then wait at least another minute for another one to confirm it.

We made an architectural decision that when a station sees your site is down it will immediately ask another station to check it. That way you get alerts right away, and we make sure there aren’t any false positives.

One last thing: Realtime

One final observation we made was that other services give you no feedback about what’s happening until something goes wrong. This unnerved us. You can sit and look at some service’s web interfaces and have no clue anything is happening at all. For peace of mind we added the Realtime view.

realtime

This constantly shows what CloudTrawl is doing. You can actually see the countdown until the next check, where it’ll come from and the results of the last check from every worldwide monitoring station.

To sum up, we:

Check your site every 30 seconds (2x better than our competition).

Perform immediate confirmations (no false positives – no delays).

Show a realtime view allowing you to see exactly what CloudTrawl is doing and exactly what the state of your site is at any time.

These three reasons are why I’m personally very proud of what were doing with our uptime checking and why I genuinely believe there is no service out there which can beat us.

Want cool charts for your site?

With the release of CloudTrawl drawing closer we’ve been concentrating polishing the user interface. We’ve been working hard to make it feel like something webmasters intuitively already know how to use. To achieve this we’ve taken some inspiration from services such as Google Analytics. Their interface gets one thing really, really right: charts.

Analytics charts look awesome and are really easy to use. Base on that inspiration we’ve come up with our own charting system which we believe is just as cool and intuitive:

coolChart

The fully interactive chart above took only two days to implement. Now we can re-use it over and over to show lots of different kinds of data. It isn’t flash, it isn’t a static image, it has rollovers and all the other neat stuff you’d expect.

So how did we get something so feature rich up and running so fast? Easy; Google Charts.

Like a lot of the stuff Google does this is so easy to use it makes you want to cry. We’re a Java shop and so we used the GWT API which allowed us to create the extra controls for viewing data between two dates.

But if all you need is a simple chart with some copy and paste html this is really easy. Googles Quick Start guide has some script which you can copy paste and edit to show your first chart with your own data in a couple of minutes.

A shot of their chart gallery is below to give you an idea of some of the cool stuff you can use to jazz up your site:

googleChartGallery