+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Google Is Teaching Machines To 'See' Pictures And Describe Them With Startling Accuracy

Nov 19, 2014, 00:34 IST

Google wants to be able to create accurate, automatic captions for complex photos, and, according to a recent blog post, it's getting closer to achieving that goal.

Advertisement

Google's machine-learning system can "see" a photo and automatically produce descriptive and relevant captions. The system has to be able to get a deeper representation of what's going on in a picture by recognizing how different objects in the picture relate to one another, an then translate that into natural-sounding description.

For example, the system automatically captioned this photo, "Two pizzas sitting on top of a stove top oven":

Complimentary Tech Event
Transform talent with learning that works
Capability development is critical for businesses who want to push the envelope of innovation.Discover how business leaders are strategizing around building talent capabilities and empowering employee transformation.Know More

Google

"This kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images," Google research scientists Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan write.

Advertisement

The key innovation is that Google's team has merged a computer vision system, which classifies objects in images, and a natural language processing model into one system, where it can take an image and directly produce a sentence to describe it.

In Google's words, the model "combines a vision CNN with a language-generating RNN so it can take in an image and generate a fitting natural-language caption":

Google

Not that the new approach isn't fallible. Here's an interesting look at some of its results:

Google

Advertisement

Google's the first to admit there's still work to be done.

"We look forward to continuing developments in systems that can read images and generate good natural-language descriptions," the team writes.

Next Article