Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go

Microsoft CEO Satya Nadella speaks on stage at the company's Build conference in Seattle, Washington on May 21, 2024. Photo credit: Ken Yeung

Microsoft is doubling down on small language models with new Phi-4 variants that aim to prove a bold idea: small AI can think big. The new Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models are optimized for complex tasks, such as math and coding, and outperform much larger models while running on devices with limited resources.

While we often hear about new large language models from OpenAI, DeepSeek, Anthropic, Ai2, Cohere, and others being released, there is a growing movement to develop smaller versions. Unlike LLMs, SLMs aren’t dependent on being run through the cloud—they can operate locally on the device or the edge. That includes mobile devices, smartwatches, wearables, and smart home gadgets.

This new set of Phi-4 models brings distillation, reinforcement learning, and high-quality data. Microsoft asserts they’re “small enough for low-latency environments, yet maintain strong reasoning capabilities that rival much bigger models. This blend allows even resource-limited devices to perform complex reasoning tasks efficiently.”

Subscribe to The AI Economy

Phi-4-Reasoning

Phi-4-reasoning is a 14-billion parameter open-weight reasoning model trained on OpenAI’s o3-mini. That’s the same number of parameters as Phi-4. It uses a fine-tuned dataset composed of synthetic prompts and high-quality, filtered data focused on math, science, and coding skills. Microsoft claims it wanted to develop an SLM trained with data focused on high-quality and advanced reasoning.

Microsoft lists this model as being designed to further research on language models, “for use as a building block for generative AI-powered features.” It also claims Phi-4-reasoning demonstrates “that meticulous data curation and high-quality synthetic datasets allow smaller models to compete with larger counterparts.”

In a demonstration of the model’s potential, the company had it solve wordplay riddles by recognizing patterns and applying local reasoning. When asked, “How many strawberries for nine r’s?” Phi-4-reasoning supposedly deduced the correct answer of three—it reasoned that because the word “strawberries” had three “r’s,” it would need three to respond correctly. Microsoft reasoned that this problem proved the model not only could identify patterns in words and understand and solve riddles, but also could apply a straightforward multiplication operation to generate an answer.

Phi-4-reasoning performance across representative reasoning benchmarks spanning mathematical and scientific reasoning. Image credit: Microsoft
Phi-4-reasoning performance across representative reasoning benchmarks spanning mathematical and scientific reasoning. Image credit: Microsoft
Accuracy of models across general-purpose benchmarks for: long input context QA (FlenQA), instruction following (IFEval), Coding (HumanEvalPlus), knowledge & language understanding (MMLUPro), safety detection (ToxiGen), and other general skills (ArenaHard and PhiBench). Image credit: Microsoft
Accuracy of models across general-purpose benchmarks for: long input context QA (FlenQA), instruction following (IFEval), Coding (HumanEvalPlus), knowledge & language understanding (MMLUPro), safety detection (ToxiGen), and other general skills (ArenaHard and PhiBench). Image credit: Microsoft

Phi-4-Reasoning-Plus

As for Phi-4-reasoning-plus, it’s described as a state-of-the-art open-weight reasoning model with 14 billion parameters. Microsoft states that it has been fine-tuned from the base Phi-4 model through the use of supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning. The company wanted Phi-4-reasoning-plus to prove that a small capable model could be trained with data focused on “high quality and advanced reasoning” and brags it’s capable of greater accuracy. However, it comes at a cost: The model generates 50 percent more tokens, meaning greater latency.

Microsoft doesn’t expect this Phi-4-reasoning variant to be commonly used. It believes it’s optimized to accelerate research on language models. Phi-4-reasoning-plus is also designed and tested for math reasoning, not for anything else.

When compared to OpenAI o1-mini and DeepSeek-R1-Distill-Llama-70B, Microsoft brags that both Phi-4-reasoning and Phi-4-reasoning-plus outperformed these leading models, even despite their smaller sizes.

“Phi-4-reasoning models introduce a major improvement over Phi-4, surpass larger models like DeepSeek-R1-Distill-70B and approach Deep-Seek-R1 across various reasoning and general capabilities, including math, coding, algorithmic problem solving, and planning,” Weizhu Chen, Microsoft’s corporate vice president for generative AI, and Ece Kamar, Microsoft’s Research’s managing director of the AI Frontiers Lab, writes in a blog post.

Phi-4-Mini-Reasoning

Lastly, Phi-4-mini-reasoning is a lightweight open model built on synthetic data from DeepSeek-R1. That’s counter to its other reasoning siblings. It has 3.8 billion parameters and falls into a similar category to DeepSeek’s R1-Distill-Qwen-7B and R1-Distill-Llama-8B models, as well as Meta’s Llama-3.2-3B-Instruct model. It’s built for multi-step, logic-intensive mathematical problem-solving tasks.

Microsoft explains that this model is suited for educational applications, embedded tutoring, and lightweight deployment on edge or mobile systems. Furthermore, Phi-4-mini-reasoning has been trained on over one million math problems students might encounter from middle school to Ph.D. graduate level.

A Growing Phi-4 Family

The launch of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning means the Phi-4 family is rapidly expanding since its debut six months ago. In addition to these three and the base Phi-4 model, six other variations of the small language model are currently available on Hugging Face.

Microsoft appears to be rapidly iterating on its Phi models. Its inaugural family was released in 2023, before being quickly replaced by Phi-1.5 and Phi-2. The company would make its third-generation successor generally available at its 2024 Build conference. And now, with its signature event coming up later this month, there are new Phi models in town.

Featured Image: Microsoft CEO Satya Nadella speaks on stage at the company's Build conference in Seattle, Washington on May 21, 2024. Photo credit: Ken Yeung

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading