Microsoft Has a Faster, Cheaper AI Image Model

You’re reading an issue of “The AI Economy,” my newsletter exploring the forces shaping the AI era—tracking how AI is rewriting business, work, technology, and culture. Subscribe to get expert insights and curated updates delivered straight to your inbox.

Microsoft isn’t slowing down in its push to develop its own lineup of homegrown AI models. On Tuesday, the company announced a variation of its text-to-image model called MAI-Image-2-Efficient. While both provide photorealistic and expressive images with reliable in-image text, the new model is designed to be 22 percent faster and four times more efficient.

MAI-Image-2 was introduced in March as part of Microsoft Copilot, Bing Image Creator, and the MAI Playground. It’s reportedly so good that it ranks among the top three best text-to-image models, according to Arena.ai. More recently, the company expanded access by adding it to its Foundry alongside its other first-party models, MAI-Voice-1 and MAI-Transcribe-1.

Compared to its sibling model, MAI-Image-2-Efficient is specifically built for when you want something that’s fast, scalable, and won’t unnecessarily lead to cost overruns. If you’re not looking for precision, then MAI-Image-2-Efficient is going to be your choice. This includes social media ads, placeholder mockups, and product thumbnails—workflows where speed and volume matter more than pixel-perfect precision.

However, if you want the AI to generate a hero portrait, a cinematic scene, or a polished final graphic where quality and care are more important than speed, that’s where Microsoft’s MAI-Image-2 shines.

And this model doesn’t just fare well against the company’s flagship image model—it’s 40 percent faster than other hyperscaler models, such as Google’s Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image, and Gemini 3 Pro Image.

Developers can start using MAI-Image-2-Efficient today in the Microsoft Foundry and MAI Playground. The model is priced at $5 per million text input tokens and $19.50 per million image output tokens. By comparison, MAI-Image-2 is $5 per million tokens for text input and $33 per million tokens for image output.

If you want to experience MAI-Image-2-Efficient within Microsoft’s platform, the model is scheduled to roll out to Copilot and Bing. Additional surfaces are planned in the future.

Featured Image: Credit: Ken Yeung

Microsoft Has a Faster, Cheaper Version of Its Best Image Model

Discover more from Ken Yeung