This robot can recognize what's going on in photos ... sort of
Computer scientist Stephen Wolfram has released a new tool, the Wolfram Image Identification Project, that allows users to upload or link to an image and then see how well the computer can recognize what's going on in the picture.
In a blog post, Wolfram describes the underlying technology behind the project. Like many computer vision programs, Wolfram's project is built around an "artificial neural network": a software framework inspired by biological brains that excels at the kind of pattern recognition needed for computer vision. In Wolfram's case, the neural network was "trained" by being exposed to tens of millions of labeled images. As Wolfram puts it in the blog post,
We decided to try the algorithm out on a few images that were on the front page of Business Insider around 3:30 PM eastern time Tuesday afternoon.
In many cases, the image identifier was able to at least get the overall gist of the pictures. It classified the Twin Peaks restaurant in Texas that was the site of a grisly shootout between rival biker gangs as a "store":
It also correctly classified Hillary Clinton and Marissa Mayer as "people", although it wasn't able to identify them specifically by name:
The algorithm also correctly, if vaguely, identified Paris cafe Le Comptoir as a building:
In a few situations, the algorithm completely ignored the people in an image, instead focusing on particular inanimate objects. Rather than noticing boxer Gennady Golovkin, the algorithm locked on to the glove on the boxer's hand, helpfully pulling up some extra info on boxing gloves:
Similarly, in this still from an upcoming KFC commercial, the algorithm ignored former "Saturday Night Live" actor Darrell Hammond's portrayal of Colonel Sanders and instead noticed the cars around him, identifying them as "transport":
In other cases, the algorithm got temptingly close but was just slightly off. It classified this Samsung smartphone as a "remote control," and as with the boxing glove, gave us some context:
On the subject of Tesla, the image identifier correctly noted that Tesla Motors CEO Elon Musk was standing in front of a car, but misclassified the car as a two-door coupe, rather than a four-door sedan. Still, pretty impressive:
Some images completely threw the algorithm off. The grey background and dark chyron on this NFL Network screenshot appear to have convinced the image classifier that New England Patriots owner Robert Kraft is in fact a clapperboard:
The algorithm also had trouble with more abstract items. The Yo app logo was parsed as "instrumentation":
And this screenshot of leaked footage from the upcoming video game "Doom 4" showing a soldier in a desolate wasteland was interpreted as a "spider":
While image recognition and classification are hard, and the algorithm is still a work in progress, it is fun to play with. Read more about the technology behind the app on Wolfram's blog here, or test it out with your own pictures here.