Tuesday, February 20, 2007

WordNet Redux

In a recent XODP Blog post, I extolled the virtues of WordNet and declared that it was one of the best kept secrets when it comes to Natural Language Processing (NLP) resources. Following up on this post, I found that few people outside of the NLP arena seem to have even heard of WordNet, and even the experts in this field do not seem to appreciate WordNet's potential. For instance, in a post at the Artificial Artifical Intelligence Blog, Lukas Biewald laments:
"Are concepts really a hierarchy? I’ve heard cognitive scientists think so, but I disagree. And I think that trying to make all the concepts conform to this artificial hierarchal structure has turned WordNet into a much less useful resource.

"[ . . .]

". . . [Some] groups of concepts . . . actually have a hierarchical structure for an unrelated real-world reason. . . .

"[ . . .]

"But this hierarchy completely breaks down for more conceptual things. Is respect in the sense of 'respect for my Father,' a type of 'attitude'' or 'politeness' or 'filial duty' or 'affection?' Clearly it’s all these things. But the guys making WordNet didn’t want to believe that, so they make respect as a type of attitude one semantic category, and respect as a type of politeness another, and so on, until there are ten separate senses for respect."
In their book, Computational Linguistics, which I mentioned in my previous post about WordNet, Igor Bolshakov and Alexander Gelbukh anticipate these sort of objections to creating linear representations (i.e., Text) of non-linear entities (i.e., Meaning):
". . . The human had to be satisfied with the instrument of speech given to him by nature. This is why we use while speaking a linear and rather slow method of acoustic coding of the information we want to communicate to someone else.

"[ . . . ]

"While the information contained in the text can have a very complicated structure, with many relationships between its elements, the text itself has always one-dimensional, linear nature, given letter by letter. . . . [A] text represents non-linear information transformed into linear form. What is more, the human cannot represent in usual texts even the restricted non-linear elements of spoken language, namely, intonation and logical stress. . . .

". . . A text consists of elementary pieces having their own, usually rather elementary, meaning. This meaning is determined by the meaning of each one of their components, though not always in a straightforward way. These structures are organized in even larger structures like sentences, etc. . . . Such organization provides linguistics with the means to develop the methods of intelligent text processing."
Like Wikipedia and its ongoing efforts at disambiguation, WordNet cannot be all things to all people, nor should it try, notwithstanding Lukas Biewald's assertion that WordNet should make allowances for words meaning "all of the above." What WordNet can be, is, and should remain, is a srong foundation for more sophisticated semantic analysis.


