Archive for December, 2011


Many data centers run far too cold

I remember a few years ago I was working in a data center which was so cold we needed to wear sweaters and gloves just to work.

Recently we’ve heard a lot about hosting providers moving to colder countries just to save on the expense and environmental impact of keeping servers cool while avoiding downtime due to machines running too hot.

I just read an excellent Wired article which concludes that many machine rooms run far too cold anyway. Many of us could save a lot of money, co2 & frozen hands by just dialling up the temperature. It’s a highly recommended read for anyone involved in operating a data center:

World’s Data Centers Refuse to Exit Ice Age

Regression tests are awesome things

So I’m sitting here waiting for my regression tests for a new CloudTrawl component to run. If you’re not in the know a regression test is a bit of code which tests some other code. Kind of like a check list to stop a programmer making dumb mistakes.

While my mind wondered I thought –

“Why on earth don’t regular web sites have regression tests?!”

And then it hit me, that’s what we do. Seriously DeepTrawl & CloudTrawl are the regression test. Awesome. Job done. thought over.

Link checking JavaScript links

Having worked on DeepTrawl for a long time there are a few questions which come up again and again. Some of these are equally applicable to CloudTrawl (since it also does link checking) & I’ll try to address some of them in this blog.

One of the really common support questions is “does your link checker work with JavaScript links”. The answer is always no, and it’s likely neither product ever will. Here’s why.

JavaScript is a tricky beast. In the simplest case it might seem easy for our software to follow a JavaScript link. Perhaps one looking like this:

<a href=”#” onclick=”window.open(‘somedoc.html’, ‘newWindow’)”>Click Me</a>

The problem comes in that onclick property. You really could have anything in there. For instance there could be JavaScript to:

– Ask the user which of several pages they’d like

– Take account of their session (are they logged in?) and act accordingly

– Get some information from the server

The examples above can stretch out into infinity because JavaScript code can do anything you want, that’s the power of using a programming language in a web page.

So the answer might appear to be that we should build in a JavaScript engine and have our products do exactly what the browser would have done when it encountered that link. The problem really comes when the script has some interaction with the user. There’s no way to know what the user would have done, so there’s really no way to know where that link should lead. The code might be asking the user to click one of two buttons but it could just as easily be asking them to sign up for a service, entering a new username, address etc. and giving them some unique content based on what they entered.

If we tried to implement this we’d get into a spiral of problem solving and at every step there would be things we couldn’t solve perfectly. We’d have to fudge it & take a guesses. I don’t think guesses are what anyone wants from a link checker. We want certainty.

When I first started thinking about this problem I decided to take a look at what some of our competitors where doing. Some claim to be able to follow JavaScript links so I tested them out with some real examples and found results I didn’t like.

They would cope with very simple examples like the one above, perhaps even slightly more complex ones, like this:

<a href=”#” onclick=”window.open(‘somedoc’ + ‘.html’, ‘newWindow’)”>Click Me</a>

(Note that here I’m building somedoc.html by adding together two strings).

But if things got a little more complicated they just refused to check the link. That isn’t what I want for our customers. I’m not saying our competitors are intentionally misleading in their marketing, there are a lot of very simple JavaScript links on the web but I think people would expect all types of links to be handled and I just don’t believe we could ever live up to that promise.

Since we can’t make a good guarantee I didn’t want to make one at all. False negatives are really bad, our products saying they’ll find an error then failing to find it are the issues that keep me awake at night.

Usually when people ask, I tell them it’s a good idea to always have a regular <a> link with an href attribute to match every JavaScript link, even if the ‘copy’ is hidden away at the bottom of the page. This means DeepTrawl and now CloudTrawl will be able to scan the site for broken links. But there’s another really important reason to make sure all your links can be found somewhere in the page in regular old fashioned <a> tags.

Google and other search engines don’t guarantee to follow JavaScript links. A lot of the time they discover the content in a site the same way our technology does, by starting at the first page and following all the links. If a link is in JavaScript the search engines may not follow it. So, they may not see your pages and won’t be listing them in the search results. Ouch. Better get putting in those regular <a> tags!


Legacy Internet Explorer vs. 99.999

Right now we’re hearing a lot about individuals being asked (begged?) to move away from Internet Explorer 6. As many readers may know even Microsoft is getting in on the act.

This is important for home users. Security is a big problem for all web users, combining a lack of security awareness with a browser which won’t be patched at all in coming years is a really, really nasty mix.

There’s a reason why many of these users have’t already upgraded, if they’re really happy with IE6 that probably means they’re not into the latest-greatest web apps. They spend their time doing email, browsing Amazon & eBay. Productivity boosts possibly aren’t their bag.

But here’s the thing. There’s a very large group of people who really would benefit from using the latest web technologies: pretty much everyone who works in an office.

Google docs is basic but awesome for collaboration. Lucid Charts freaking rocks. I challenge anyone to watch their demo and tell me Visio is more compelling. More and more web apps are appearing which push not just what a web browser can do but what a productive professional can do.

Even better I.T. departments don’t need to get involved for users to start playing with these things and prove their value. Maybe for free, maybe by using their departments expense account users themselves can start using these apps and test their value without I.T. needing to invest their time or budget. I would imagine that I.T. departments would be thrilled by this.

In the I.T. world there’s long been talk about the five nines. This means systems should have a guaranteed uptime of 99.999. This amounts no more than about 5 minutes down time per year. This is an excellent goal (which of course as an uptime monitor we applaud).

Seeking this value means I.T. has it’s head screwed on when buying systems, they’re serving the organization well by keeping it’s employees productive.

So here’s the elephant in the room. The next improvement in productivity many not come from internal systems, it may very well come from applications chosen by end users, hosted in the cloud.

That makes it seems really, really crazy that many I.T. departments hang on to historic browsers which won’t work well with these new productivity aids.

We understand the reasons. IE6 at least has an understood security profile. Probably more importantly, many organizations have internal applications which were written specifically for IE6 and absolutely will not work with anything else.

Since Microsoft plan to stop supporting it organizations are going to have to move beyond IE6 and are doing so right now. At the same time they may need to upgrade or dump their creaking IE6 compatible applications. That will hurt. Users will lose data. Hair will be torn out.

I believe many applications used by organizations in the future will be chosen by the users and won’t have much to do with the I.T. department. But there will still be many applications which will be developed for or sold to organizations from the top down. These will cost a lot of money, probably $100k+.

So here’s my plea to larger organizations: when considering buying any new software system please, please make sure it’s standards compliant. It should work perfectly in every modern browser. It’s html should validate. It shouldn’t use Flash or anything else which keeps you locked in to technologies which may disappear.

This way next time I.T. wants to upgrade browsers company wide they’ll be able to do it without fear of breaking everything and users can use the latest and greatest innovative web-apps without being held back by an ageing browser no-one in the company wants to keep.

The commitment to 99.999% uptime is awesome, let’s make sure we can keep a similar commitment to keeping workers a productive as they want to be.

Why does CloudTrawl exist?

For a long time we’ve been selling DeepTrawl, a desktop application which checks sites for errors such as broken links, bad spelling, invalid html, missing meta-tags any many other issues a webmaster can introduce without realizing.

It makes sense to use a desktop app to check for these things. For instance, when your site is live and not being updated you’re aren’t going to get invalid html appearing. There’s no magic; that happens because you’ve changed something. It makes sense that you have a tool you run after you make changes, perhaps even running it over HTML on your hard drive before you put it live.

The problem is, not every type of error needs your intervention do pop up and grab a user. The bug trolls can move in under the bridge at any time, it doesn’t matter if you checked for trolls when you built the bridge.

Some of these issues cross over with what DeepTrawl does, just as a broken link can be created when you update a site it can happen on its own. The content you’re pointing to on an external site changes and suddenly you have a broken link. No warning, no alarms ringing just bam… your users are sent down a black hole when they click your link. Not good.

But it gets worse, your site could go offline. Not because you did something wrong, just because that’s what websites do. The internet was designed to be very resilient, unfortunately that doesn’t apply to the server running your site or any server. They go down… website visitors get unhappy.

So we need something online constantly checking for stuff that goes wrong on it’s own. That’s why we’re now working hard on CloudTrawl. There are a lot of services which will check for downtime. There are a few which will check for broken links automatically on a weekly basis. The problem is that’s several services, several monthly payments, several things to keep track of.

The idea behind CloudTrawl is you shouldn’t need several of these things, you should have one. One service which takes care of your site and tells you when bad stuff happens.

So this is what we’re spending our time working on. Hopefully sometime early in 2012 DeepTrawl will get a new baby sister, and CloudTrawl can fill the gaps DeepTrawl leaves… answering the question “what about when my site breaks on it’s own“?

Oh, and yes this is the first ever blog post on CloudTrawl.com, so in time honored tradition:

Hello World!