Archive for February, 2013


The world’s biggest companies have boatloads of broken links

Since we recently released CloudTrawl we decided to undertake some research to prove just how valuable it is. The uptime of major websites and the damage to reputation and profits downtime causes has been written about extensively, so we decided to go a different way. Every web user has seen a broken link; they often make our blood boil & frequently people will leave a site on seeing one assuming the content they’re seeking simply doesn’t exist. 404 has become the new blue screen of death. Broken links are a real risk to reputation & profit but we’ve never seen a comprehensive study on just how common they are in major sites.

We decided to undertake that study and to perform it on the group of sites whose owners aren’t lacking in resources: the Fortune 500.

The Results

Here’s a big figure to open with:

Fortune500_1

You read that right, 92% of the sites in our sample included at least one broken link & most had several. 68% had more than 10 broken links, 49% had more than 50 and a surprising 43% of Fortune 500 sites have more than 100 broken links.

We also broke down the amount of pages which had broken links against the total amount of pages in each site. A stunning 13% of all pages in Fortune 500 sites have at least one broken link (many pages have several).

Fortune500_2

What isn’t shown in the figures is the importance of some of these links. We saw examples of broken links to annual reports, quarterly statements, social presences (e.g. broken Facebook links) & external + internal news articles. Perhaps most worrying were the unreachable legal notices & terms & conditions documents. Along with making users leave the sites (& possibly making lawyers pass out!) these things are bad for search engine optimization. Google won’t be able to find these pages & sites will be penalized.

Our Method

To get a fair cross section of the Fortune 500 we chose 100 companies at random across the set. We entered their names into Google and picked the first US / international result owned by that company. This resulted in a mix of sites. Some were corporate (company news, quarterly statements etc.) and some were online presences for customers (stores & marketing). We rejected any sites which CloudTrawl didn’t finish crawling in 5 hours or which contained more than 5,000 pages (these can sometimes spawn loops in page generation and unfairly bias results, search engines also stop crawling sites if they think this is happening).

To eliminate false positives we quality checked results both randomly and where sites contained a high percentage of broken links. To make sure the headline figures weren’t biased we only check links (not images) and only checked for 404 & 410 http error codes, ignoring server timeouts etc. as these can sometimes be temporary.

Conclusion

Although there are some big headline figures above, the one that troubles us most is the 13%. Essentially we’re saying that more than 1/10 Fortune 500 web pages has a severe bug that’s waiting to pop up and grab unsuspecting users.

Next time you see a 404 error you’ll at least have the consolation that they’re proven to be really common. Of course we do give webmasters the tools to fix these issues – and I think we’ve presented a decisive demonstration of why they’re needed.

Note; feel free to use the infographics in this post; we hereby release them for use on other sites.

OMG; We’ve Launched!

It’s a proud day over at CloudTrawl.com; we just launched the full live service!

We’d loved it if you sign up for the free trial and we’re all ears for new feature requests & suggestions.

So, what made it into the first version? CloudTrawl is designed to watch out for stuff that goes wrong on it’s own, even if you don’t change your site. So for the first version we have:

Link Checking (we check every page of your site, daily or weekly)

Uptime Minotoring (we check your site is online every 30 seconds / 24×7)

We also have features like complete history charting, the ability to share site reports and settings with colleagues & customers, very cool looking real time views for uptime checks, the ability to “Start Now” for link checks, image validation and a lot more.

Even this tidy set of features is really just the tip of the iceberg of what’s planned for CloudTrawl. The ultimate goal: monitor absolutely everything that could go wrong with your site on it’s own; over time we’ll be adding more checks and we’d love for you to tell us what extra features and checks you think CloudTrawl should have.

Happy Trawling!