AI technology has advanced significantly from the text-based chatbots of just a year ago. Today, we see the rise of multimodal models, which can process and generate content from multiple inputs like images, audio, and text simultaneously. A prime example of this shift is Google’s NotebookLM, which launched quietly a year ago as a research tool.
Recently, the tool added an AI podcasting feature called Audio Overview, allowing users to create podcasts about anything, even personal topics like their LinkedIn profile. This unexpected addition has gone viral, showcasing the powerful and surprising applications of multimodal AI.
The improvements in multimodal AI have been swift, with new tools surpassing earlier versions in terms of capability and sophistication. For instance, Meta’s initial text-to-video model, Make-A-Video, was seen as a novelty when it launched in 2022, but it now looks outdated compared to newer models like Movie Gen. Movie Gen, which allows users to create custom videos, sounds, and images from text prompts, represents the evolution of generative AI in the video content domain.
These advancements highlight the rapid pace at which multimodal AI tools are becoming more integrated into everyday creative processes.
The way people interact with AI is also shifting. OpenAI’s Canvas interface, for example, allows users to collaborate with ChatGPT on projects by editing text or code directly, reducing the need for multiple rounds of prompts. This contrasts with the traditional chat model that requires users to continually prompt the AI to generate text until they achieve the desired result. This change represents a more streamlined and interactive way of working with AI, encouraging deeper engagement with the technology.
In addition to these advancements, even traditional tools like search engines are being transformed by AI. Google has introduced multimodal search capabilities, allowing users to upload a video and ask questions based on its content.
For instance, in a demo, users could capture a video of fish swimming in an aquarium and then ask the AI for specific information about them. The AI would analyze the video and provide a relevant summary, showcasing how AI is expanding beyond simple queries to more complex, interactive searches.
Despite these exciting developments, the AI industry is still searching for its “killer app.” While there’s been tremendous progress in multimodal generative content, many companies are still experimenting with different AI tools to see what resonates with users.
Google’s NotebookLM, for instance, was originally a minor feature but became a viral hit, illustrating the unpredictability of AI innovation. These advancements are part of the broader generative AI boom, driven by massive investments and competition between tech companies to deliver impactful and profitable AI solutions.