AMD Unveils OLMo, Its First Fully Open 1B-Parameter LLM Series

The AMD logo on top of a abstract image of an AI model. Logo credit: AMD

AMD has introduced OLMo, a new series of large language models it trained in-house using trillions of tokens on a cluster of its Instinct MI250 GPUs. Though its specific purpose isn’t explicitly stated, it’s believed the company created OLMo to highlight the capabilities of its Instinct GPUs in running “large-scale multi-node LM training jobs with trillions of tokens to achieving improved reasoning and instruction-following performance compared to other fully open similar size LMs.”

This is the second LLM AMD has created from scratch, joining the AMD-135M small language model the company released in September. That one was designed for edge devices and resource-constrained environments. It’s based on the same OLMo first developed by AI2, formerly the Allen Institute for Artificial Intelligence.

Similar to AI2’s OLMo, AMD’s one is open-source and state-of-the-art. However, unlike it, AMD’s is smaller, with 1 billion parameters versus 7 billion. The new OLMo was pre-trained with “1.3 trillion tokens on 16 nodes, each with four AMD Instinct MI250 GPUs.” There are also three different variations:

  • AMD OLMo 1B: Pre-trained on a subset of Dolma v1.7 with 1.3 trillion tokens
  • AMD OLMo 1B SFT: Supervised fine-tuned model trained on Tulu V2 dataset (1st phase) and then OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets (2nd phase)
  • AMD OLMo 1B SFT DPO: Fine-tuned to better reflect human preferences through Direct Preference Optimization trained on the UltraFeedback dataset

AMD states the LLMs are “decoder-only transformer models that are trained using next-token prediction,” meaning they can predict what word comes next in a sentence (e.g., autofill). This makes the models suitable for chatbots.

AMD trained its OLMo series of models in three stages. Image credit: AMD
AMD trained its OLMo series of models in three stages. Image credit: AMD

Here’s how AMD’s OLMo 1B stacked up against other publicly available, similarly sized fully open-source models:

  • The average overall general reasoning tasks (48.77%) is comparable to that of the latest OLMo-0724-hf model (49.3%) with less than half of its pre-training compute budget and better than all the other baseline models.
  • Accuracy gains over the next best models on ARC-Easy (+6.36%), ARC-Challenge (+1.02%), and SciQ (+0.50%) benchmarks.
AMD OLMo 1B model pre-training results on standard benchmarks. Image credit: AMD
AMD OLMo 1B model pre-training results on standard benchmarks. Image credit: AMD

When compared against other instruction-tuned baselines:

  • Two phased SFT helped raise the model accuracy from the pre-trained checkpoint across almost all benchmarks on average, specifically MMLU by +5.09% and GSM8k by +15.32%.
  • AMD OLMo 1B SFT performance on GSM8k (18.2%) is significantly better (+15.39%) than the next best baseline model (TinyLlama-1.1B-Chat-v1.0 at 2.81%).
  • Average accuracy over standard benchmarks (Figure 3) for our SFT model beats baseline chat models by minimum +2.65%. Alignment (DPO) boosts it by further +0.46%.
AMD OLMo 1B model instruction tuning results on standard benchmarks for general reasoning capabilities and multi-task understanding. Image credit: AMD
AMD OLMo 1B model instruction tuning results on standard benchmarks for general reasoning capabilities and multi-task understanding. Image credit: AMD

AMD OLMo is the latest in a line of AI innovations the company has introduced. The release of models is rare as most of its work is incorporated into its CPUs and GPUs. For example, in October, it unveiled AI-infused Ryzen, Instinct, and Epyc chips. The goal is to help bring AI closer to edge devices, such as Microsoft’s new Copilot+ PCs, which now are equipped with AMD silicon.

The company states that open-sourcing OLMo’s data, weights, training recipes, and code is intended to help developers reproduce its work while furthering innovation “on top.” Releasing models like this could also help promote AMD’s processors, showcase their power, and boost their reputations against the chips from Nvidia and Intel.

You can check out AMD’s OLMo model on Hugging Face.

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading