Saturday, August 19, 2006

Ghost URLs

In a previous blog post, I set forth the site configuration procedures that I go through prior to launching a static website and the ongoing content modification and release process that I go through to keep a static website Google-friendly. Few webmasters follow quality control procedures like these; even fewer check their server logs to find and correct 404 errors (i.e., "page not found") with 301 redirects. Consequently, search engine databases suffer from problems that can be summed up by the aphorism "garbage in; garbage out." To wit, "ghost URLs."

I use the term ghost URL to refer to a wide variety of site configuration issues that haunt most of the websites that I encounter. One of these issues is caused by the fact that most URLs are ambiguous -- i.e., there are usually a wide variety of URLs that point to the exact same content because (1) a numerical IP address has more than one domain name associated with it; (2) a particular webserver fails to correct ambiguous requests for site content; and (3) the same content is mirrored in two or more documents. There are some legitimate reasons for mirroring static content, but most of the time these mirrors are unintentional spam, thereby diluting Google PageRank for the preferred URL.

The second type of ghost URL that I encounter is created when a webmaster moves content from one URL to another without implementing a 301 redirect or "refresh redirect." On rare occasions, a webmaster who has created a ghost URL will have had the foresight to set up a customized 404 error message, but most of the time webmasters who move content without implementing 301 redirects are oblivious to the problems caused by ghost URLs. Consequently, the end user will usually encounter the default 404 error message that his or her browser displays. As I stated previously, most recently in the blog post referenced above, a scheduled content modification and release process is the best way to avoid these types of ghost URLs.

A third type of ghost URL that I encounter is created by inbound links from one website that points to non-existent content on another website. As a webmaster, you have no direct control over these inbound links. What you can do is set up a customized 404 error page, monitor your site referral logs for recurring 404 errors, contact the webmasters who set up the offending inbound links and ask them to correct the problem, and implement 301 redirects. Whether or not the offending inbound links are fixed, the 301 redirects should stay in place until the 404 errors for a particular content request disappear.

Some ghost URLs are inadvertently created by webmasters who are vetting pages of beta content. The links on these beta pages often point to non-existent URLs, and the webmasters who set up these beta pages will often mistakenly assume that no one else knows about these beta pages because they haven't linked to them or told anyone about them. Little do they realize that all they have to do to let the the secret out is click on an outbound link on one of their beta pages and follow that link to another website. A curious webmaster or spider will then follow his, her, or its site referral logs and find the beta page along with all of its links.

Given the highly decentralized nature of the Internet, there's very little hope of a centralized strategy of quality control for static content on the World Wide Web emerging anytime soon, and as an Internet consultant, I have my hands full trying to bring webmasters up to speed on the site configuration issues that I narrated above. However, when I was sitting in the WiFi lounge at Search Engine Strategies San Jose 2006, a somewhat inexperienced webmaster asked me if I "knew anything about 301 redirects," and then listened quite intently to what I had to say about site configuration issues, so I suspect that there are quite a few other people who are actively addressing these issues. Even so, I suspect that ghost URLs will continue to haunt the Web for the foreseeable future.


Post a Comment

Links to this post:

Create a Link

<< Home