Home
tech
news
Google VP teases Gemini's multimodal future: 'I've seen some pretty amazing things.'

Google VP teases Gemini's multimodal future: 'I've seen some pretty amazing things.'

Hugh Langley

Google is moving fast to launch its new AI model and put it in Bard.
The head of the company's Bard and Assistant talks about working with a "new magic ingredient."

There's a lot of pressure on Google right now.

The company is on the verge of releasing Gemini, its highly-anticipated new large language model, which will be closely compared to OpenAI's GPT-4.

Gemini will be multimodal, meaning it will be able to understand and produce text, images, and other types of content. CEO Sundar Pichai has hinted it will also be better at planning, while DeepMind CEO Demis Hassabis told Wired that Gemini is being trained using techniques that powered its AlphaGo program, which beat the best human Go player in 2016.

A key person in the middle of all this is Sissie Hsiao, Google's VP and general manager of Bard and Google Assistant. She's also a member of Insider's inaugural AI 100.

Hsiao isn't part of the team building Gemini – a newly-formed coalition of DeepMind and Google's Brain unit – but she is responsible for some of the major products that will give users access to these new AI systems.

"I've seen some pretty amazing things," said Hsiao. "Like, I'm trying to bake a cake, draw me 3 pictures of the steps to how to ice a three layer cake, and Gemini will actually create those images."

"These are completely novel pictures. These are not pictures from the internet," she added. "It's able to speak in imagery with humans now, not just text."

To say Google needs Gemini to be a success is an understatement. OpenAI recently announced the third iteration of its visual art generator, DALL-E, and updated ChatGPT so it can access more up-to-date information (until now, it didn't have access to data about anything that happened after September 2021).

If Gemini dazzles, it will help Google change the narrative that it was blindsided by Microsoft and OpenAI. If it disappoints, it will embolden critics who say Google has fallen behind.

The search giant has upfront about Bard's limitations from day one, and still refers to it as an "experiment." The chatbot has come under fire for producing misinformation. Hsiao's team recently introduced a feature that highlights information Bard suspects may not be accurate.

Hsiao, like most Google executives right now, loves to say Google is being "bold and responsible," but she also acknowledged things inside the company are moving "incredibly fast" and said the Bard team feels like a "startup" right now.

"It feels like the first one or two years I was at Google," she said. "It feels like the beginning all over again."

"People ask me, 'Is the hype real?" I don't think it's hype, I think it's real," she says. "Because I work with the technology every day."

"It's like a new magic ingredient showed up"

Hsiao joined Google in 2006 as a product manager on image search and Google Docs. From there she moved through various jobs on Google's advertising products.

"After I had done that for a few years, I really looked and said, 'What is the most cutting edge product at Google to work on, that is sort of unsolved,'" she said. Google Assistant was the answer, and in 2021 Google reshuffled its search team to put Hsiao in charge of its voice assistant.

Hsiao said she wanted to work on something that "marries cutting edge research with a cutting edge vision of a future that isn't here yet."

The Bard and Assistant teams were merged earlier this year under Hsiao, and we're starting to learn why: Google recently announced that Bard is getting integrated into the Assistant on mobile devices, right on the heels of Amazon revealing a new, more powerful Alexa.

Assistant with Bard, as Google is calling it, will fuse the generative AI powers of Bard with the helpful abilities of its voice assistant. Its arrival also suggests that the company is rethinking what the Assistant – which has stagnated in recent years – actually is in the era of powerful LLMs.

"We're moving away from voice as the primary modality," said Hsiao of the recent Assistant update, which will let users interact with it using text, voice, and images.

Whether Google is planning to overhaul the experience on smart speakers remains to be seen. "We're still exploring," said Hsiao. "It's too early to say definitively whether it will be useful or not."

While chatbots and large language models have wowed users, there are still questions about whether these AI systems are actually going to be useful in the long term. Google's advantage here is its array of already popular apps, such as Gmail, Docs, and Maps, that it can inject the underlying AI abilities into.

Hsiao thinks Bard itself has value as a product in the long-term. "It is the most unconstrained expression of the large language model," she told Insider. "There is beauty in it being a simple box because in that box you can actually ask it to do anything."

She also believes this new technology can be both a standalone AI product and something that is infused into other products.

Assistant with Bard may go some way to Google proving this out, but it's still too early to tell – and Hsiao's team, like many within Google right now, are moving speedily to experiment and find ways to inject AI in ways that aren't just flashy, but genuinely helpful.

"It's like a new magic ingredient showed up," she said, "and you're trying to figure out what it can do."

Google VP teases Gemini's multimodal future: 'I've seen some pretty amazing things.'

Hugh Langley

"It's like a new magic ingredient showed up"

Popular Right Now