IBM Watson (2013)

Watson-Avatar_tSemantic Technologies in IBM Watson


Question analysis: How Watson reads a clue
IBM J. RES. & DEV. VOL. 56 NO. 3/4 PAPER 2 MAY/JULY 2012
A. LALLY ET AL. 2 : 1

… Although these components are largely domain-independent, some tuning to the special locutions of Jeopardy! questions has been done

… These Question Classes (QClasses) are used to tune the question-answering process by invoking different answering techniques [3], different machine learning models [4], or both.

Most of our rule-based question analysis components are implemented in Prolog [6, 7], a well-established standard for representing pattern-matching rules.

… we explain how we implemented rule-based portions of question analysis using Prolog.

a named entity recognizer (NER), a co-reference resolution component, and a relation extraction component [12].

ESG has been adapted in several ways to the special locutions of Jeopardy! questions. In place of Bwh[ pronouns …

In spite of these adaptations, care was taken not to degrade parsing of normal English. This is done in part by use of switches for the parser that are turned on only when parsing Jeopardy! questions.

Most of the question analysis tasks in the Watson project are implemented as rules over the PAS and various external databases such as WordNet [16].

… In all, these rule sets consist of more than 6,000 Prolog clauses.

This decision can be somewhat subjective according to our definition of LAT. Examples of disagreements were “singer” versus “lead singer” and “body” versus “legislative body”.

The Jeopardy! domain includes a wide variety of kinds of questions, and we have found that a one-size-fits-all approach to answering them is not ideal. In addition, some parts of a question may play special roles and can benefit from specialized handling.

The QClasses PUZZLE, BOND, FITB (Fill-in-the blank), and BOND, and MULTIPLE-CHOICE have fairly standard representations in Jeopardy! and are detected primarily by regular expressions.

The rule-based recognizer includes regular expression patterns that capture canonical ways that abbreviation questions may be expressed in Jeopardy!

It is common in question-answering systems to represent a question as a graph either of syntactic relations in a parse or PAS [18–20] or of deep semantic relations in a handcrafted ontology [21–23]. Watson uses both approaches.

Most other question-answering systems use question analysis to identify a semantic answer type from a fixed ontology of known types [19, 26–29]. Because of the very broad domain that Jeopardy! questions cover, this is not practical.

CAS: common analysis structure
ESG: English Slot Grammar, a Slot Grammar parser
LATs: lexical answer types
NER: named entity recognizer
PAS: predicate-argument structure (eg.: PAS builder)
UIMA: Unstructured Information Management Architecture


Watson is powered by 10 racks of IBM Power 750 servers running Linux, and uses 15 terabytes of RAM, 2,880 processor cores and is capable of operating at 80 teraflops.
Watson was written in mostly Java but also significant chunks of code are written C++ and Prolog, all components are deployed and integrated using UIMA.


Introduction to Natural Language Processing
University of Michigan
Coursera, October 5 – December 27, 2015

Building Watson
December 2010

The Robot Will See You Now

The Robot Will See You Now
Feb 20 2013

IBM’s Watson—the same machine that beat Ken Jennings at Jeopardy—is now churning through case histories at Memorial Sloan-Kettering, learning to make diagnoses and treatment recommendations. This is one in a series of developments suggesting that technology may be about to disrupt health care in the same way it has disrupted so many other industries. Are doctors necessary? Just how far might the automation of medicine go?

Are doctors necessary?

Just how far might the automation of medicine go?

processing up to 60 million pages of text per second, even when that text is in the form of plain old prose, or what scientists call “natural language.”

something like 80 percent of all information is “unstructured.” In medicine, it consists of physician notes dictated into medical records, long-winded sentences published in academic journals, and raw numbers stored online by public-health departments.

Watson even has the ability to convey doubt. When it makes diagnoses and recommends treatments, it usually issues a series of possibilities, each with its own level of confidence attached.

see also: