Utilizing a research supercomputing cluster provided by Yahoo, and supported by money from Google and the Defense Advanced Research Projects Agency, a team of researchers at Carnegie Mellon University in Pittsburgh has been improving the computer system called NELL - The Never-Ending Language Learning system. Housed in a basement at the university, NELL was "taught" some basic knowledge in a variety of categories, and then turned loose on the web with specific instructions to teach itself. By analyzing immense quantities of human-created text, NELL will begin to detect the patterns that define the use of language.
Calculating 24 hours a day, seven days a week, NELL scans hundreds of millions of Web pages for text patterns that it uses to learn facts. So far, it has learned 390,000, with an approximate accuracy of 87 percent. These facts are grouped into categories, such as cities, sports teams, actors, plants, animals, and 275 others and growing. The facts for each category are things like “Minneapolis is a city” and “Daffodil is a plant.”
NELL also learns facts that are relations between members of two categories. For example, Peyton Manning is a football player (category). The Indianapolis Colts is a football team (category). By scanning text patterns, NELL can infer with a high probability that Peyton Manning plays for the Indianapolis Colts — even if it has never read that Mr. Manning plays for the Colts. “Plays for” is a relation, and there are 280 kinds of relations. The number of categories and relations has more than doubled since earlier this year, and will steadily expand.The learned facts are continuously added to NELL’s growing database, which the researchers call a “knowledge base.” A large pool of facts, like that conveniently provided by the internet, will help refine NELL’s learning algorithms so that it finds facts on the Web more accurately and more efficiently over time. Basically, just like humans, the more it knows, the easier it is to learn more in the future.
Though NELL is learning mostly by itself with astonishing accuracy, sometimes the NELL gets its facts wrong. Every two weeks, researchers scan the categories for inaccuracies. Sometimes they do have to step in and correct a misunderstanding.
When Dr. Mitchell scanned the “baked goods” category recently, he noticed a clear pattern. NELL was at first quite accurate, easily identifying all kinds of pies, breads, cakes and cookies as baked goods. But things went awry after NELL’s noun-phrase classifier decided “Internet cookies” was a baked good. (Its database related to baked goods or the Internet apparently lacked the knowledge to correct the mistake.)While that might seem like a hitch in the plan, it actually humanizes NELL even more. Humans don't learn completely on their own, either. When we are young, adults help us learn and as adults, we all learn from each other. NELL, however, needs considerably less attention and guidance than a human student.
NELL had read the sentence “I deleted my Internet cookies.” So when it read “I deleted my files,” it decided “files” was probably a baked good, too. “It started this whole avalanche of mistakes,” Dr. Mitchell said. He corrected the Internet cookies error and restarted NELL’s bakery education.
NELL is one project in a widening field of research and investment aimed at enabling computers to better understand the meaning of language. Computers that understand language promise a big payoff someday. The potential applications range from smarter internet searches to virtual personal assistants that can reply to questions in specific disciplines or activities like health, education, travel and shopping.