Ai2 Unveils OLMo 2, Calling It The Most Advanced Fully Open Language Model Yet

Photo credit: Ken Yeung

In February, Ai2 released OLMo, its “truly open-source” large language model, as a suitable alternative to closed and restrictive models. Nine months later, the nonprofit research institute introduced its successor, OLMo 2, describing it as the “best fully open language model to date” comparable to Meta’s Llama 3.1. This family features two variations: a 7B and 13B model. Each is trained on up to 5 trillion tokens.

Not only is Ai2 making OLMo 2 available, but it’s also releasing the weights, data, code, recipes, and intermediate checkpoints for anyone to play with.

OLMo 2 caps off a milestone-filled 2024 for Ai2. Building on the initial release of OLMo, the organization introduced spin-off models like OLMo-0424 and OLMoE, unveiled the multimodal AI model Molmo, and launched Tulu 3 to advance open-source post-training techniques.

How Ai2 Created OLMo 2

The next-generation OLMo has an architecture similar to its predecessor but with several modifications designed to improve training stability. Ai2 also chose to train the model in two steps: OLMo 2 begins by learning OLMo-Mix-1124, a collection of 3.9 trillion words from DCM, Dolma, Starcoder, and Proof Pile II datasets. This educates the model on a wide range of general knowledge.

Within the second stage, OLMo 2 is fine-tuned and refined against diverse web content, academic papers, Q&A forums, instructional guides, and math problems. This is through the Dolmino-Mix-1124 dataset, which consists of 843 billion tokens, and that Ai2 samples to create three mixes of 50 billion, 100 billion, and 300 billion tokens each.

OLMo 2 Instruct

But that’s not all: The company is introducing an Instruct variant to complement OLMo 2’s 7B and 13B models. These were built with the help of Tulu 3. Ai2 applied recipes to help with supervised fine-tuning and reinforcement learning with verifiable rewards (RLVR). The company then assessed the model’s performance using Tulu 3’s evaluation suite.

To its surprise, Ai2 discovered Tulu 3’s recipes could easily be applied to OLMo 2 without needing expensive customizations. “We removed models from our completions pool to remove any restrictions on the use of model outputs for derivative models,” the company writes. “Additionally, we updated the preference data to incorporate on-policy completions generated by our OLMo 2 models. Otherwise, the supervised finetuning (SFT) mix and preference tuning process remain largely unchanged. Most of the changes at these first two stages are differences in the learning rates. For the final stage, Reinforcement Learning with Verifiable Rewards (RLVR), we also saw consistent improvements across key evaluations such as GSM8K and MATH for both the 7B and 13B models.”

Benchmark Evaluations

A graph illustrating how Ai2's OLMo 2 models compare to other open models. Image credit: Ai2
A graph illustrating how Ai2’s OLMo 2 models compare to other open models. Image credit: Ai2

So, how does OLMO 2 compare to the competition? First, it’s critical to remember that Ai2’s new model is fully open, meaning its entire development process and outputs are transparent, accessible, and can be reproduced by anyone. The company embraces this philosophy—Chief Executive Ali Farhadi says, “AI is born and raised in the open,” though he clarifies that there’s disagreement among the community about what “open” means.

“The hallmark, the spirit of openness, has always been, ‘I need to understand your work to the extent that I could change it to do my work.’ This change sometimes means that I need to make a change to your work early in the pipeline. Sometimes, I need to fork out of it. Sometimes, I need to grab your end product and have a right to use it. All of those things. So anything that doesn’t give me that spirit, it’s just not open source. Whatever we’re going to call it, just call it, but just not open source.”

That being said, in addition to the original OLMo, OLMo 2’s counterparts include DCLM-7B, MAP-Neo-7B, and Amber-7B. Ai2’s evaluations proved that OLMo 2 7B and 13B models outperformed “open weight models of equivalent size.”

“Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.”

How Ai2's OLMo 2 models performed against other open-source models across ten tasks. Image credit: Ai2
How Ai2’s OLMo 2 models performed against other open-source models across ten tasks. Image credit: Ai2

Regarding OLMo 2’s Instruct variants, they reportedly outperformed several leading competitors, such as Qwen 2.5 14B Instruct, Llama 3.1 8B Instruct, and, interestingly, even Ai2’s Tulu 3 8B model.

How Ai2's OLMo 2 Instruct variants performed against other similar models. Image credit: Ai2
How Ai2’s OLMo 2 Instruct variants performed against other similar models. Image credit: Ai2

Developers interested in Ai2’s OLMo 2 can test it on its dedicated playground. The models can be downloaded on Ai2’s website (7B and 13B) or Hugging Face.

Featured Image: Ai2 logo featured at the company's Seattle headquarters. Photo credit: Ken Yeung

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading