A New York Times article reported on NELL – the Never-Ending Language Learning system – a research project under way at Carnegie Mellon University.

NELL works 24 hours a day, seven days a week, scanning hundreds of millions of Web pages for text patterns that it uses to learn facts, 390,000 to date, with an estimated accuracy of 87 percent. These facts are grouped into semantic categories — cities, companies, sports teams, actors, universities, plants and 274 others. The category facts are things like “San Francisco is a city” and “sunflower is a plant.”

NELL also learns more like a human be making connections between facts that belong to different categories. For example, Peyton Manning is a football player (category). The Indianapolis Colts is a football team (category). By scanning text patterns, NELL can infer with a high probability that Peyton Manning plays for the Indianapolis Colts — even if it has never read that Mr. Manning plays for the Colts. “Plays for” is a relation, and there are 280 kinds of relations. The number of categories and relations has more than doubled since earlier this year, and will steadily expand.

The problem is that if NELL is learning everything off the internet, what is it being taught? I dread to think about some of the sites it visits. 

Source:  http://www.nytimes.com/2010/10/05/science/05compute.html?scp=1&sq=NELL&st=cse