
Welcome to "The AI Economy," a weekly newsletter by Ken Yeung on how AI is influencing business, work, society, and technology. Subscribe now to stay ahead with expert insights and curated updates—delivered straight to your inbox.
Google is doubling down on enterprise AI. At Cloud Next 2025, the company unveiled new cloud capabilities designed to help businesses turn AI potential into real-world impact. That includes the arrival of Gemini 2.5 Flash on the Vertex AI developer platform and upgrades to its generative media models. And, as AI agents begin taking on more complex tasks in the enterprise, the company is working to help organizations accelerate agent development through its new agent development kit while also marketing a new agentic interoperability standard to ensure bots can communicate across different frameworks and ecosystems.
Table of Contents
- Gemini 2.5 Flash
- Lyria in Vertex AI
- Veo 2 Gets Professional Editing
- Chirp 3 supports instant Custom Voice and Diarization
- Imagen 3 Quality and Editing improvements
- Introducing the Agent Development Kit (ADK)
- Meet the Agent2Agent Protocol
Here Comes Gemini 2.5 Flash
In March, Google debuted Gemini 2.5, its latest-generation thinking model for resolving complex problems. At the time, there was only one variant, Gemini 2.5 Pro Experimental, with a one million token context window, enhanced reasoning, and advanced coding. The model is available on Vertex AI in public preview. As Google Cloud Chief Executive Thomas Kurian noted, “Pro is optimized for precision and is great for writing and debugging intricate code or extracting critical information in medical documents.”
Today, Google announced it will soon release another variant, Gemini 2.5 Flash. This model is described as being best suited for low latency and cost-efficiency use cases, such as high-volume customer interactions. “Gemini 2.5 Flash adjusts the depth of reasoning based on the complexity of prompts, and you can control performance based on customers’ budgets,” Kurian wrote. “These new features make powerful AI easier to use and more affordable for everyday use cases, enabling our customers to build AI that solves complex problems and understands nuance.”
No specific timeline for Gemini 2.5 Flash’s release was given.
To help developers choose the model best suited for their needs, Google is introducing Vertex AI Model Optimizer, an experimental tool it claims will automatically generate high-quality responses for prompts based on the “desired balance of quality and cost.” In addition, the company is releasing its Vertex AI Global Endpoint to help those with workloads that aren’t beholden to processing information in a specific location and giving them capacity-aware routing for Gemini models across multiple regions.
Subscribe to The AI Economy
One-Stop Gen AI Shop

Google Cloud is also updating Vertex AI’s generative media models. The company disclosed companies like digital travel platform Agoda, app developer Bending Spoons, and Kraft Heinz are using these tools to create captivating visuals and videos, process millions of photos daily, and rapidly accelerate marketing campaign creation.
Before today, the platform offered generative models for images, video, and speech. However, it has added a new option: music. Google’s text-to-music model, Lyria, makes this possible.
Create Brand-Friendly Music in Seconds
Initially released by Google DeepMind in 2023, Lyria is an enterprise-ready model that transforms text prompts into 30-second music clips. The company boasts this app produces high-fidelity audio and is capable of “capturing subtle nuances and delivering rich, detailed compositions across a range of musical genres.” It cites use cases such as marketing campaigns, product launches, or in-store experiences where quickly created soundtracks that match a brand’s identity are needed. Google also highlights Lyria as a suitable alternative to royalty-free music, which can be time-intensive to search for. This AI app can generate custom music tracks in minutes that also sync with the content’s mood, pacing, and narrative.
“This is significant because it makes Vertex AI the only platform to bring together generative media models across all modalities…this means you can build a complete production-ready asset, starting from a text prompt to an image to a complete video asset with music and speech,” Nenshad Bardoliwalla, Google’s director of product management for Vertex AI, remarked during a press conference earlier this week.
Is Lyria trained on copyrighted material? Amin Vahdat, Google’s vice president and general manager of ML systems and Cloud AI, stated, “We work with all copyright providers for any data that’s used to train our models, including our gen media models. So, in other words, you can be assured that it is in the sort of open space and fully authorized or that we’ve worked with the copyright providers before using anything as input to our models.”
Veo 2 Gets Professional Editing

Google’s video generation model, Veo 2, also receives new capabilities. Along with helping create videos, it can now edit and apply visual effects—almost like a light version of Adobe Premiere. The company claims this feature set turns Veo from “a generation tool to a comprehensive video creation and editing platform.”
This transformation starts with a Magic Eraser-like feature in Veo 2. Called Inprint, it lets creators remove unwanted items—Bardoliwalla labels them “video bombs”—from the footage, including background images, logos, or distractions. It’s easy to do without the need for manual retouching.
Another feature is Outpaint, which extends the frame of the video so it’s optimized for the preferred screen size or platform. This is helpful when you have a landscape video and want to make it portrait for Instagram Reels or YouTube Shorts.
Veo 2 also allows creators to direct shot composition, camera angles, and pacing. With greater autonomy over the footage, teams won’t need to spend time repeatedly generating videos to find the right selection. It no longer requires specialized expertise in prompt engineering to get what you want.
Lastly, Veo 2 supports interpolation, which is the ability to define the beginning and end of a video sequence when connecting two existing assets. This allows for smooth transitions and ensures visual continuity.
Instant Custom Voices and Diarization in Chirp 3
Google’s audio understanding and generation model receives two new features designed to provide enterprise teams with more accurate and personalized conversations.
With Instant Custom Voice, Chirp 3 enables organizations to create brand-aligned voice experiences for call centers and content creation. It’s perhaps the equivalent of what ElevenLabs is doing with voice. It’ll generate “realistic” custom voices using 10 seconds of audio input. Google has included safety features and an allowlist to prevent misuse.
And then there’s Transcription with Diarization, a feature used when multiple people are speaking. It can extract and identify who’s talking while improving the clarity and usability of transcriptions for applications like meeting summaries, podcast analysis, and multi-party call recordings.
Imagen 3 Quality and Editing Improvements
Google is also updating its text-to-image generation model, Imagen 3. As with Veo 2, it now supports inpainting capabilities to help reconstruct missing or damaged portions of an image. The company states this update improves the model’s object removal quality, providing a “natural and seamless editing experience.”
Introducing Google’s Agent Development Kit
Improvements are also being made to Vertex to support the adoption of agents across the enterprise. “2025 will be a transition year where generative AI shifts from answering single questions to solving complex problems through agent tech systems,” Vahdat declared. “These agents will be able to carry out a range of tasks, from planning trips to monitoring customer issues and managing complex workflows.”
To accommodate this future, Google is unveiling its Agent Development Kit (ADK), a unified development environment to build, test, and operate sophisticated bots seamlessly. Vahdat explained that customers can assemble a multi-agent system “in under 100 lines of code” guided by creative reasoning and strict guardrails.
Its release follows a similar playbook that developer platforms used years ago when the mobile app era hit its stride. In addition to APIs, those companies created software development kits (SDKs), a collection of tools to help developers build on top of their service. Google’s ADK is no different—here’s a collection of resources needed to not only get started in the agentic era but also collaborate with its bot peers.
It’s also not the only one with something similar to a dev kit. OpenAI, LangChain, Microsoft, and IBM are just a few that offer this solution.
With this package, developers should be able to control how their agents think, reason, and interact with other bots. It enables natural, human-like interactions through bidirectional audio and video streaming, shifting away from text-based interfaces. The ADK can also access Google’s new Agent Garden, a library of pre-built patterns and components that can be repurposed to speed up development. At launch, agents can connect to over 100 pre-built connectors, an organization’s APIs, integration workflows, or data in cloud systems like BigQuery and AlloyDB.
Finally, the APK supports Model Context Protocol (MCP) to ensure “a simple, open, and interoperable mechanism to connect models to data and tools.”
Google’s ADK currently supports Python, with more coding languages to be added later this year. In addition, although it’s optimized to work with Google Gemini and Vertex AI, Google affirms that the ADK will work with your preferred tool.
Google Launches Agent2Agent Protocol
Another initiative Google has launched to advance its open approach to AI is its new Agent2Agent (A2A) interoperability protocol. Vahdat says the company has partnered with more than 50 industry leaders who “share our vision that one of the biggest challenges businesses face with AI adoption is getting agents built on different frameworks and vendors working together.”

He adds, “A2A connects agents across the entire enterprise ecosystem, giving them a common language to collaborate, irrespective of which framework or vendor they are built on, working with companies like Salesforce, ServiceNow, and SAP. A2A ensures successful communication between agents. Because our protocol isn’t specific to any one API or implementation, it’s flexible. It supports abstract requests and complex agent interactions.”
Nearly 60 partners have signed up to support A2A, including Accenture, Atlassian, Box, Capgemini, Cohere, Datadog, Glean, Intuit, KPMG, LangChain, MongoDB, New Relic, Oracle, Typeface, Workday, and Writer.
Google may be trying to establish the agentic equivalent of Matter for the smart home. However, it’s not alone in working on an interoperability standard—there’s AGNTCY, an open-source initiative founded by a coalition of tech companies, including Cisco, LangChain, LlamaIndex, Galileo, and Glean. Vahdat said that there’s growing momentum in this space, and Google is “looking to deliver the best benefits for our customers based on having the best platform and support for a range of different standards.”
When asked if the company might work with other standard groups, he replied, “We will go to what our customers need in providing the right level of support.”
Subscribe to “The AI Economy”
Exploring AI’s impact on business, work, society, and technology.
Leave a Reply
You must be logged in to post a comment.