Vision Language Models

Artificial Intelligence December 16, 2025

Ai2 Releases Molmo 2, an Open Video Model That Outperforms Qwen 3, GPT-5, and Gemini 2.5 Pro While Knowing Where the Action Happens

Ai2 has unveiled Molmo 2, the latest iteration of its open-source vision-language model (VLM). Arriving over a year after the original, this state-of-the-art update brings the most notable upgrades yet: support for multiple images and video, and grounding. The next-generation Molmo can now count and track objects or actions within videos. And just like its […]

Artificial Intelligence August 12, 2025

Ai2’s MolmoAct Trains Robots to Think in 3D Before Acting in Reality

Ai2, or the Allen Institute for AI, is unveiling a new class of models designed to help robots move through the world with greater spatial awareness. Known as Action Reasoning Models (ARM), it’s billed as giving machines spatial awareness that text-based inputs alone can’t deliver. The first in this ARM family is MolmoAct, an open-source […]

Artificial Intelligence December 4, 2024

Luma AI Introduces Ray 2 Model, Enabling Video Creation from Text and Images in Seconds

Weeks after releasing its Photon text-to-image model, Luma AI shows no signs of slowing down. The creator of the popular generative AI platform Dream Machine announced at AWS’ re:Invent conference the launch of its Ray 2 model, which it boasts can produce videos using text and images, all in under 10 seconds. High-Quality Video From […]

Artificial Intelligence November 26, 2024

Hugging Face Introduces SmolVLM, a 2B Model for Multimodal AI on Edge Devices

AI startup Hugging Face has released an open-source family of compact visual language models named SmolVLM. With two billion parameters, it’s built for on-device inference, which the company claims outperforms similar models with comparable GPU RAM usage and token throughputs. Three models are available at launch: SmolVLM-Base, which offers downstream fine-tuning; SmolVLM-Synthetic, a fine-tuned variant […]