Archive
Tag: Vision Language Models
Ai2 Releases Molmo 2, an Open Video Model That Outperforms Qwen 3, GPT-5, and Gemini 2.5 Pro While Knowing Where the Action Happens
Ai2 has unveiled Molmo 2, the latest iteration of its open-source vision-language model (VLM). Arriving over a year after the original, this state-of-the-art update brings the most notable upgrades yet: support for multiple images and video, and grounding. The next-generation Molmo can now count and track objects or actions within videos. And just like its […]
Ai2’s MolmoAct Trains Robots to Think in 3D Before Acting in Reality
Ai2, or the Allen Institute for AI, is unveiling a new class of models designed to help robots move through the world with greater spatial awareness. Known as Action Reasoning Models (ARM), it’s billed as giving machines spatial awareness that text-based inputs alone can’t deliver. The first in this ARM family is MolmoAct, an open-source […]
Luma AI Introduces Ray 2 Model, Enabling Video Creation from Text and Images in Seconds
Weeks after releasing its Photon text-to-image model, Luma AI shows no signs of slowing down. The creator of the popular generative AI platform Dream Machine announced at AWS’ re:Invent conference the launch of its Ray 2 model, which it boasts can produce videos using text and images, all in under 10 seconds. High-Quality Video From […]
Hugging Face Introduces SmolVLM, a 2B Model for Multimodal AI on Edge Devices
AI startup Hugging Face has released an open-source family of compact visual language models named SmolVLM. With two billion parameters, it’s built for on-device inference, which the company claims outperforms similar models with comparable GPU RAM usage and token throughputs. Three models are available at launch: SmolVLM-Base, which offers downstream fine-tuning; SmolVLM-Synthetic, a fine-tuned variant […]
