Tuesday, July 25, 2006

Quality Control of Content on the Internet

The most frustrating situations that I encounter as an Internet consultant are those where clients come to me after they have deployed a new website or made major changes to an existing website and ask me for my comments and recommendations. When this happens, I always preface my comments and recommendations with the hortatory statement that any content that they publish on their static websites should be subject to a scheduled content modification and release process. Moreover, I advise them that any changes that they make to their static websites should be gradual and reversible.

For reasons that defy rational explanation, few of my clients ever seem to heed my warnings about the importance of a scheduled content modification and release process, and then they call upon me to clean up the mess when one of their websites becomes invisible to Google. When this happens, I walk them through a laundry list of possible snafus and direct their attention to the fact that all of these snafus can be easily avoided by implementing a scheduled content modification and release process. Eventually, I am able to impress upon them that any changes that they make to their website should be properly vetted, at which point I encounter a long list of reasons why quality control is too much trouble.

When it comes to a failure to implement quality control procedures, I am probably somewhat guilty of telling my clients to "do as I say, not as I do" when it comes to most of my own websites. In this regard, I am not unlike the master mechanic who does not maintain his own car properly because most of my websites are legacy websites and/or experiments in content indexing that don't generate much income for me.

The situation is quite different when I launch a new static website for a client. To wit, I make a point of acquiring both a dot-com domain and a dot-net domain, then I password protect the dot-net domain so that content which is under development is invisible to the probing eyes of search engine spiders. I then make sure that all appropriate Apache directives are put into place so that ambiguous URLs are eliminated. To wit, < http://somesite.com >, < http://somesite.com/ >, < http://somesite.com/index.html >, < http://www.somesite.com >, and < http://somesite.com/index.html > all redirect to < http://www.somesite.com/ >. Beyond that, I make sure that a customized 404 page is displayed for non-existent URLs and non-existent third-level subdomains such as < http://www.somesite.com/non-existent-url.html > and < http://non-existent-subdomain.somesite.com/ >.

At this point, SOMESITE.COM is the proverbial blank slate, and I set up an Excel spreadsheet to track content release and modification for all the URLs to be published on SOMESITE.COM. I also use this spreadsheet to track the indexing of URLs from SOMESITE.COM on Google, Yahoo!, and MSN, including PageRank and inbound links for each URL. To that end, a content modification and release process for a static website should result in a slow and steady growth of content where (1) substantial changes to the content of a particular URL should take place at most once every three months; (2) inbound links should increase only as fast as they can be indexed by Google; and (3) the release of new URLs should not constitute more than a slow and steady 20% increase of the overall site content over a three month period.