Llama, Llama, Llama: Everything About Meta’s New AI Model

What you need to know about Meta's new large language model set.
"The AI Economy," a newsletter exploring AI's impact on business, work, society and tech.
This is "The AI Economy," a weekly LinkedIn-first newsletter about AI's influence on business, work, society and tech and written by Ken Yeung. Sign up here.

As I’m writing this week’s newsletter and listening to Taylor Swift’s latest album—which has its own AI controversy—I’m astonished at how jam-packed the news cycle has been. Stanford University released a massive and informative report that gives us a pulse on where AI stands today. Google is undergoing a reorganization to align its AI efforts better. And AI is all the talk at this year’s TED conference.

But Meta is what everyone’s talking about. The company released its next-generation large language model, Llama 3, dominating the conversation before OpenAI announces GPT-5 and Microsoft and Google hold their annual developer conferences.

Let’s take a look at what makes Llama 3 special. And stick around for nearly 50 AI headlines you may have missed!

The Prompt

Part of Meta’s growing line of AI tools, Llama 3 comes in two sizes: Llama 3 8B, which features eight billion parameters and the more powerful Llama 3 70B, which has 70 billion parameters. Both versions are available today or soon will be on Amazon Web Services, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson X, Microsoft Azure, Nvidia NIM, and Snowflake.

Like its predecessors, Llama 3 is open-sourced. “We believe these are the best open-source models of their class, period,” Meta claims. “In support of our longstanding open approach, we’re putting Llama 3 in the hands of the community. We want to kickstart the next wave of innovation in AI across the stack—from applications to developer tools to [evaluations] to inference optimizations and more.”

Experience Llama 3 on Meta AI

While the models are available for downloading, Llama 3 is already in use, powering the Meta AI experience on Facebook, Instagram, WhatsApp, Messenger and the web. Eventually, the company says it will test multimodal Meta AI on its Ray-Ban smart glasses.

Meta AI is the company’s chatbot—the thing you see embedded into the search bar of all Meta apps. It was introduced last September and is positioned as a competitor to ChatGPT. Chief Executive Mark Zuckerberg tells The Verge that Meta AI aims to be “the most intelligent AI assistant that people can freely use across the world. With Llama 3, we basically feel like we’re there.”

It’s now available outside of the U.S., with English rollouts to Australia, Canada, Ghana, Jamaica, Malawi, New Zealand, Nigeria, Pakistan, Singapore, South Africa, Uganda, Zambia, and Zimbabwe.

Llama 3 Architecture

The new model uses a tokenizer with a vocabulary of 128,000 tokens to encode language more efficiently. It also improves inference efficiency thanks to grouped query attention (GQA).

Meta pre-trained Llama 3 on over 15 trillion tokens, which it says were collected from publicly available sources. That dataset is seven times larger than the one used for Llama 2 and includes four times more code. Over five percent of the dataset consists of non-English data covering over 30 languages, a step the company made to prepare its model for multilingual use cases. However, it warned Llama 3’s performance in non-English languages to be on par with English.

The company says training Llama 3’s largest models required a combination of three types of parallelization (a form of computing), synchronizing data with model and pipeline. These training runs were done on two custom-built 24K GPU clusters.

Benchmarking

Meta claims its new LLMs offer performance comparable to or better than that of Google Gemini, Anthropic’s Claude 3 Sonnet, and Mistral’s 7B Instruct. As VentureBeat notes, Llama 3 “does well at multiple-choice questions (MMLU) and coding (HumanEval), but the 70B is not as strong as Gemini Pro 1.5 at solving math word problems (MATH), nor at graduate-student-level multiple-choice questions (GPQA).”

It also notes that the Llama 3 8B outperforms Gemma 7B and Mistral 7B across many benchmarks, including grade school math questions.

Take a look at the following charts to see more about how Llama 3 compares to other leading AI models:

How Llama 3 performs compared to Gemma, Mistral, Gemini and Claude 3. Image credit: Meta
How Llama 3 performs compared to Gemma, Mistral, Gemini and Claude 3. Image credit: Meta
How Meta Llama 3 performs in Human evaluations. Image credit: Meta
How Meta Llama 3 performs in Human evaluations. Image credit: Meta
Meta Llama 3 pre-trained model performance versus Mistral and Gemma. Image credit: Meta
Meta Llama 3 pre-trained model performance versus Mistral and Gemma. Image credit: Meta

Questions surrounding data sources

What specific sources did Meta use to train Llama 3? The company hasn’t disclosed the sites, only to say it pulled from “publicly available sources.” This could mean, among other things, the content you share across the Meta family of apps, from Facebook and Instagram to WhatsApp and Messenger. So, double-check those privacy settings to ensure you’re sharing what you want to share.

Not stopping with Llama 3

“The Llama 3 8B and 70B models mark the beginning of what we plan to release for Llama 3,” Meta reveals. It plans to debut more models that are capable of multimodality, able to converse in multiple languages, have a longer context window and have stronger overall capabilities.

If you think 70 billion parameters are big, wait until Meta releases its 400 billion parameter model, which is still being trained.

Further Reading:


Today’s Visual Snapshot

The Stanford Institute for Human-Centered Artificial Intelligence released its 2024 AI Index Report this week. For seven years, it has explored the technology’s influence on society. This year, it broadened its research to probe trends such as technical advancements in AI, public perceptions, and the geopolitical dynamics surrounding its development.

It’s impossible to summarize everything in the study, which spans over 500 pages. However, IEEE Spectrum has replicated 15 featured charts summarizing the current state of AI. Below are several of note:

The number of foundational models sorted by organization in 2023. Image credit: IEEE Spectrum
The number of foundational models sorted by organization in 2023. Image credit: IEEE Spectrum

Google is the market leader in creating foundational models, surpassing Meta, Microsoft and OpenAI. These models are used as the backbone for AI apps, such as OpenAI’s GPT-4, which powers ChatGPT. As IEEE Spectrum notes, many of these foundational models are owned by “industry,” or commercial entities. A few were created by academic institutions such as Stanford University and the University of California, Berkeley.

How much it costs to train select AI models from 2017 to 2023. Image credit: IEEE Spectrum
How much it costs to train select AI models from 2017 to 2023. Image credit: IEEE Spectrum

Building and training AI models are expensive, and it shouldn’t surprise anyone that only a few companies can develop them. But how much is being spent? Google invested over $191 million into its most powerful LLM, Gemini Ultra. By comparison, OpenAI is believed to have invested over $78 million into GPT-4. And the 2017 transformer model created by Google that helped kickstart the LLM evolution? $930 went into training it.

How closed models fared against open models in performance scoring. Image credit: IEEE Spectrum
How closed models fared against open models in performance scoring. Image credit: IEEE Spectrum

The last chart I’m highlighting compares open and closed models. Which foundation model type performs the best? IEEE Spectrum notes that the AI Index looks at the trend of released open and closed models. The above chart suggests that closed models outperform open ones across multiple standard benchmarks. However, the report doesn’t address the core debate around these model types: Which is better for security and innovation?


Quote This

“We thought it was going to be something that had to do with training large models. At the time I thought it was probably going to be something that had to do with content. It’s just the pattern matching of running the company, there’s always another thing. At that time I was so deep into trying to get the recommendations working for Reels and other content. That’s just such a big unlock for Instagram and Facebook now, being able to show people content that’s interesting to them from people that they’re not even following.

But that ended up being a very good decision in retrospect. And it came from being behind. It wasn’t like “oh, I was so far ahead.” Actually, most of the times where we make some decision that ends up seeming good is because we messed something up before and just didn’t want to repeat the mistake.”

Mark Zuckerberg explaining that Meta purchased 350,000 Nvidia H100 GPUs this year to power Reels, but then found it more useful for its AI efforts (Dwarkesh Patel Podcast)


This Week’s AI News

🏭 Industry Insights

🤖 Machine Learning

✏️ Generative AI

☁️ Enterprise

⚙️ Hardware and Robotics

🔬 Science and Breakthroughs

💼 Business and Marketing

📺 Media and Entertainment

💰 Funding

⚖️ Copyright and Regulatory Issues

💥 Disruption and Misinformation

🎧 Podcasts


End Output

Thanks for reading. Be sure to subscribe so you don’t miss any future issues of this newsletter.

Did you miss any AI articles this week? Fret not; I’m curating the big stories in my Flipboard Magazine, “The AI Economy.”

Follow my Flipboard Magazine for all the latest AI news I curate for "The AI Economy" newsletter.
Follow my Flipboard Magazine for all the latest AI news I curate for “The AI Economy” newsletter.

Connect with me on LinkedIn and check out my blog to read more insights and thoughts on business and technology. 

Do you have a story you think would be a great fit for “The AI Economy”? Awesome! Shoot me a message – I’m all ears!

Until next time, stay curious!

Subscribe to “The AI Economy”

New issues published on Fridays, exclusively on LinkedIn

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading