Semantic Technologies in IBM Watson
Question analysis: How Watson reads a clue
IBM J. RES. & DEV. VOL. 56 NO. 3/4 PAPER 2 MAY/JULY 2012
A. LALLY ET AL. 2 : 1
… Although these components are largely domain-independent, some tuning to the special locutions of Jeopardy! questions has been done
… These Question Classes (QClasses) are used to tune the question-answering process by invoking different answering techniques , different machine learning models , or both.
Most of our rule-based question analysis components are implemented in Prolog [6, 7], a well-established standard for representing pattern-matching rules.
… we explain how we implemented rule-based portions of question analysis using Prolog.
a named entity recognizer (NER), a co-reference resolution component, and a relation extraction component .
ESG has been adapted in several ways to the special locutions of Jeopardy! questions. In place of Bwh[ pronouns …
In spite of these adaptations, care was taken not to degrade parsing of normal English. This is done in part by use of switches for the parser that are turned on only when parsing Jeopardy! questions.
Most of the question analysis tasks in the Watson project are implemented as rules over the PAS and various external databases such as WordNet .
… In all, these rule sets consist of more than 6,000 Prolog clauses.
This decision can be somewhat subjective according to our definition of LAT. Examples of disagreements were “singer” versus “lead singer” and “body” versus “legislative body”.
The Jeopardy! domain includes a wide variety of kinds of questions, and we have found that a one-size-fits-all approach to answering them is not ideal. In addition, some parts of a question may play special roles and can benefit from specialized handling.
The QClasses PUZZLE, BOND, FITB (Fill-in-the blank), and BOND, and MULTIPLE-CHOICE have fairly standard representations in Jeopardy! and are detected primarily by regular expressions.
The rule-based recognizer includes regular expression patterns that capture canonical ways that abbreviation questions may be expressed in Jeopardy!
It is common in question-answering systems to represent a question as a graph either of syntactic relations in a parse or PAS [18–20] or of deep semantic relations in a handcrafted ontology [21–23]. Watson uses both approaches.
Most other question-answering systems use question analysis to identify a semantic answer type from a fixed ontology of known types [19, 26–29]. Because of the very broad domain that Jeopardy! questions cover, this is not practical.
CAS: common analysis structure
ESG: English Slot Grammar, a Slot Grammar parser
LATs: lexical answer types
NER: named entity recognizer
PAS: predicate-argument structure (eg.: PAS builder)
UIMA: Unstructured Information Management Architecture
Watson is powered by 10 racks of IBM Power 750 servers running Linux, and uses 15 terabytes of RAM, 2,880 processor cores and is capable of operating at 80 teraflops.
Watson was written in mostly Java but also significant chunks of code are written C++ and Prolog, all components are deployed and integrated using UIMA.
Introduction to Natural Language Processing
University of Michigan
Coursera, October 5 – December 27, 2015