Tuesday, August 01, 2006

Wikipedia and the Semantic Web

On Monday July 31, 2006, The Colbert Report aired a feature on Wikipedia, and (at his prompting) Stephen Colbert's minions quickly descended on Wikipedia's article on elephants, deliberately introducing the clearly erroneous assertion that the population of elephants in Africa has tripled in the last six months. After checking out the action on Wikipedia, I searched the blogosphere for commentary on Wikipedia, hoping to find more media buzz about the impact of Colbert's broadcast. However, I ended up reviewing content that made me reflect (once again) on how important Wikipedia has become to the indexing of content on the Internet.

In a post entitled Wikipedia 3.0: The End of Google?, the Evolving Trends blog provided a rather esoteric treatise about the Semantic Web. To wit:
"The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords."
For those of you who are not familiar with OWL, it is an intentionally dyslexic acronym for Web Ontology Language. The problem with using OWL to power the Semantic Web is that it relies upon present-day Web-based publishers for quality control -- i.e., the same people who are likely to stay awake nights dreaming up scams and schemes to exploit the Web for personal profit. The bloggers at Evolving Trends would have you believe that, given the right tools, Wikipedians will be able to deal with these issues. However, I do not share their optimism.

Wikipedia is an unqualified success when it comes to large scale collaboration for online content generation, and -- as alluded to by the bloggers at Evolving Trends -- Wikipedia has the capacity to make Google obsolete. But not for the reasons alluded to by said bloggers -- i.e., by using a highfalutin ontology generated by conscientious Web-based publishers. Rather, Wikipedia has the capacity to out-Google Google by providing relevant responses to keyword-based queries. To this end, Wikipedia spends a great deal of time disambiguating keyword-based queries.

On a much smaller scale, albeit one that has provided proof of concept, I started doing something similar a few years ago with the XODP Web Guides. To wit, following much the same process that I used when creating a new ODP category back in the day, I review the search results for a particular search term, separate the Roman Meal from the Hormel, and provide appropriate titles and descriptions for the wheat. With the XODP Web Guides I also go two steps further: As appropriate, I provide an introductory blurb narrating the presumed relevance of a particular search term and parse the annotated links for that search term with appropriate interrogatories. Even so, the Semantic Web envisioned by Timothy Berners-Lee is not likely to be realized anytime soon. Certainly not in the context of Wikipedia, much less in the context of annotated link lists.

As one might expect, the biggest challenge with creating the Semantic Web is and will continue to be semantics. Most human beings take for granted their ability to understand what somebody else means when they say something, but this ability is hardly trivial and virtually impossible to code into a language processing program. Indeed, eloquent orators, writers, and translators are usually hard pressed to explain the rationale for their choice of words on a particular occasion. The words they use just seem right as they bubble up from their subconscious mind. If they're lucky, they have a chance to censor the words that might be misinterpreted, and after careful consideration can be relied upon to explain why the words that they did choose were in fact the right words.

Assuming that linguists are able to come up with an artificial intelligence (AI) that can read and write and/or converse with human beings, there will still be vast uncharted oceans of knowledge on the World Wide Web, unless said linguists are also able to give AI the ability to look at pictures and video and interpret their relevance as well. However, such abilities are not yet the province of AI; they are the province of sci-fi. Meanwhile, Wikipedia remains the most successful experiment in online content generation and indexing to date.

Inherent problems with quality control notwithstanding, Wikipedia has succeed where ODP/dMOZ failed in that the Wikipedia community is truly open. To wit, as there are no meaningful barriers to entry, anyone can contribute to Wikipedia. That's not to say that there aren't some control freaks at Wikipedia doing their best to assert themselves and turn Wikipedia into a complicated bureaucracy. However, if one has a vested interest in quality control, there are meaningful ways of challenging the mediocre status quo at Wikipedia.

While Wikipedia is at the vanguard of progressive and pragmatic efforts for online content generation and indexing, the ongoing efforts to develop the Semantic Web provide a surprisingly coherent vision of how the Internet should be indexed. In the decades to come, this vision will probably be explored in the context of academia and inform the research conducted by cognitive scientists. However, by the time this vision reaches the public at large, it will almost certainly be watered down to something much less sublime.


Post a Comment

Links to this post:

Create a Link

<< Home