Since we starting planning our uptime monitoring service we wanted to offer something different – not just a me too service but a deal changer. In short we wanted to do uptime checking better than anyone else.
One of the big differentiators is how often a monitoring service checks your site. Is it once per hour? Once every 5 minutes? The industry consensus seems to be that once per minute is adequate for everyone. That’s an assumption we wanted to challenge. At first checking every minute may seem pretty ok. When you receive a downtime alert SMS, does it really matter that it came perhaps 50 seconds after your site went down? We say yes, and here’s why.
The Slashdot effect
This is a pretty common issue when running a site. You want lots of visitors, millions would be nice. You’ve paid for a server or hosting service which can deal with your normal amount of traffic but then a massive spike comes along from a popular link. Now your site should be serving huge amounts of requests. But it doesn’t, it falls over under the weight of the traffic.
If you know the site has gone down you may be able to quickly add capacity or deal with the influx by replacing some of the big images and keep it online. In that situation 50 seconds is too long to wait. You could easily have lost several thousand viewers. If you’re selling stuff how many sales will that lose you? We’d bet quite a lot.
Even for your regular traffic 50 seconds could mean losing important viewers. Not good.
Another time you need to know right away is when you’re doing updates to your site. Perhaps a piece of hardware is being changed out. Perhaps the server settings are being tweaked. In that situation isn’t it best to know immediately if there’s a problem? Especially since you’re probably sitting right in front of your computer ready to fix any issues.
The next reason for more frequent checks is viewing the uptime history of your site. The more often the checks, the better the history.
When viewing charts like the one above it’s important to know the figures are accurate. Is your service provider really maintaining the uptime you expect? Is it time to switch? Better data allows you to make better decisions.
So for these and many less dramatic reasons we decided that 1 minute checks simply aren’t good enough. We’re developing our uptime monitoring to check every 30 seconds. That’s better than anything we’ve seen anyone offer. In fact it’s 2x better than the existing market leaders.
Let’s talk a little about confirmations
There’s another thing to think about when trying to get meaningful downtime alerts as fast as possible: False positives and how we deal with them. A lot of existing services will send you an alert once more than one of their monitoring stations has checked your site is down. This is because monitoring stations themselves aren’t infallible. The network could be flaky near one of the stations and that’s why it sees your site as down. So it’s a good idea to get another station to confirm before alerting you.
The problem with a lot of services we’ve seen is that they’ll do this in their own sweet time. A station will check your site, see it’s down and then wait at least another minute for another one to confirm it.
We made an architectural decision that when a station sees your site is down it will immediately ask another station to check it. That way you get alerts right away, and we make sure there aren’t any false positives.
One last thing: Realtime
One final observation we made was that other services give you no feedback about what’s happening until something goes wrong. This unnerved us. You can sit and look at some service’s web interfaces and have no clue anything is happening at all. For peace of mind we added the Realtime view.
This constantly shows what CloudTrawl is doing. You can actually see the countdown until the next check, where it’ll come from and the results of the last check from every worldwide monitoring station.
To sum up, we:
Check your site every 30 seconds (2x better than our competition).
Perform immediate confirmations (no false positives – no delays).
Show a realtime view allowing you to see exactly what CloudTrawl is doing and exactly what the state of your site is at any time.
These three reasons are why I’m personally very proud of what were doing with our uptime checking and why I genuinely believe there is no service out there which can beat us.