Monday, February 19, 2007

WordNet, Disambiguation, and the Semantic Web

While following up on a previous XODP Blog post that covered Powerset's attempts to outdo Google by using Natural Language Processing (NLP), I discovered a book entitled Computational Linguistics, available for free in both HTML and PDF format. As someone who has a background in both linguistics and cognitive science, I found the book a fascinating read, and about a third of the way through it, I discovered Princeton's WordNet. While WordNet is anything but obscure, as NLP resources go, I thnk WordNet is one of the world's best kept secrets.

A while back I heaped praised on Wikipedia for its ability to disambiguate keyword-based queries. Without taking anything away from Wikipedia's ongoing efforts at disambiguation, WordNet is already the most comprehensive open source and open content online resource for disambiguation of English nouns, verbs, and adjectives, providing extensive semantic analysis for just under 150,000 unique word strings. Noticeably missing from WordNet's database are articles, prepositions, pronouns, conjunctions, and word particles. Moreover, WordNet does not provide any information about any word's etymology or pronunciation. However, Wordnet does provide operational definitions for wordstrings, information about their common usage, and comprehensive semantic grouping information, including synonyms, antonyms, hyponyms, hypernyms, meronyms, holonyms, and troponyms. And if you want to know what any of those words mean, I suggest that you go on over to WordNet and enter them into the online interface.

WordNet has more or less solved one of the most basic challenges that was facing developers of the Semantic Web, as WordNet provides a relatively comprehensive operational lexical ontology for the English language where the taxonomy is both clear and definite, but still very, very flexible. In lieu of a standard search engine algorithm, which is limited to determining what I call "keyword relevancy," a properly configured semantic web user agent would be able to determine the actual semantic relevancy of online resources. Moreover, the end user would not have to resort to a series of keyword-based searches. Rather, an end user would have an ongoing conversation with his or her user agent, providing more and more feedback on what said end user wanted to know.

The Semantic Web envisioned by Tim Berners-Lee suffers from the perception that it's a high falutin enterprise and is commonly referred to by most web developers as the Pedantic Web. This ivory tower disconnect could easily be remedied by creating user-friendly user agents. In time, casual end users would be willing to use a semantic search agent when they encountered problems finding useful online resources with a standard search engine. Meanwhile, people who want to use the Web for serious research would be able to carve out a semantic niche for themselves.


Post a Comment

Links to this post:

Create a Link

<< Home