Archive
Tag: Large Language Models
Ai2’s FlexOlmo Lets You Co-Develop AI Language Models Without Sharing Data
Ai2 has unveiled FlexOlmo, a new framework that lets multiple organizations jointly develop language models without requiring centralized data pooling. Ideal for those in regulated industries, it provides developers with granular control over how and when their data influences the model, addressing long-standing concerns regarding privacy, IP protection, and data sovereignty in collaborative AI projects. […]
Ai2’s SciArena Benchmarks AI for Science, Inspired by ChatBot Arena
Nonprofit AI lab Ai2 has launched a new platform to help researchers evaluate which AI models perform best on scientific literature tasks. Called SciArena, it’s an open, collaborative service that enables head-to-head comparisons of large language models. Think ChatBot Arena—but built for the scientific research community. “Measuring progress in using AI agents for literature-grounded scientific […]
Ai2’s RewardBench 2 Is a Tougher Benchmark for Testing How Well AI Models Reflect Human Judgment
Ai2 has released an update to its RewardBench benchmark, making it more capable of evaluating reward models. The next-generation test is built using new, more complex examples to assess how accurately AI models can produce answers that are as accurate as those of a human. In its first round of testing, RewardBench 2 ranks Google’s […]
Microsoft Adds Model Tuning to M365 Copilot to Build Domain-Specific AI
Microsoft is adding a new feature to its Microsoft 365 (M365) Copilot assistant that transforms it into a domain expert. Announced at Microsoft Build 2025, the feature is called Copilot Tuning and is a low-code service allowing developers to train models on an organization’s own data without needing the assistance of data scientists. Imagine a […]
ServiceNow and Nvidia Debut Apriel Nemotron 15B, an Open-Source Reasoning Model Built for Faster, Cheaper Agentic AI
ServiceNow and Nvidia have had a long-standing partnership building generative AI solutions for the enterprise. This week, at ServiceNow’s Knowledge customer conference, the two are introducing the latest fruits of their labor, a new large language model called Apriel Nemotron 15B with reasoning capabilities. The companies believe it performs as well as OpenAI’s o1-mini, Alibaba’s […]
Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go
Microsoft is doubling down on small language models with new Phi-4 variants that aim to prove a bold idea: small AI can think big. The new Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models are optimized for complex tasks, such as math and coding, and outperform much larger models while running on devices with limited resources. While we […]
AWS Releases Amazon Nova Premier, Its ‘Most Capable Model’ for Complex Task
Months after Amazon Chief Executive Andy Jassy first introduced the world to the Nova foundation model family, the company has released what might be its top-of-the-line variant, Amazon Nova Premier. Billed as Amazon’s “most capable model,” it can process text, images, and video (though not audio yet) and tackle complex, multi-step tasks that demand precise […]
ServiceNow Expands Its AI Ambitions With New Apriel Small Language Models
ServiceNow, the enterprise automation platform, has released a family of small language models called Apriel, available in two variants: a 5B base model and an instruct-tuned version. The company lists that they are designed to handle general-purpose tasks, from answering questions and retrieving information, to content generation, code assistance, reasoning, and creative writing. Both models […]
Ai2’s OLMoTrace Tool Reveals the Origins of AI Model Training Data
Ai2 has launched a tool to help developers answer a critical question: Where do AI models get their data? Called OLMoTrace, it’s an open-source application offering fact-checking of information provided in prompt replies. This data traceability is vital for those interested in governance, regulation, and auditing. This feature is currently available on Ai2’s flagship model, […]
Meta Launches Llama 4 Scout and Maverick, Open-Weight Multimodal Models That Outperform GPT-4 and Gemini
Meta has released the first two models in its new Llama 4 family, which it claims will power “more personalized multimodal experiences.” On Saturday, the company introduced Llama 4 Scout and Maverick, both with 17 billion active parameters. Their debut follows a report that Meta delayed Llama 4’s launch multiple times after failing to meet […]
