Meta's Generative AI Head: How We Trained Llama 3

Ahmad AlDahle is Meta's VP of generative AI. He joins Big Tech War Stories to disucss how the company built its Llama 3 model, releasing today. Training Llama 3 took ten times more data, one hundred times more computing resources, and personality tweaks to make it more willing to answer questions it otherwise would’ve refused.

Takeaways

Lama 3 includes an updated 8 billion parameter model and a 70 billion parameter model, which are stateoftheart and highperforming.
The models are trained in two phases: pretraining, where the model consumes general knowledge, and posttraining, where human supervision is involved.
Scalability, infrastructure, and data work are crucial in building these models.
Weights in the models represent knowledge and require hardware to run.
The models' personality and behavior are carefully engineered to balance usefulness and safety.
The process of deploying the models in products involves close collaboration with application teams and rigorous quality checks.
Open sourcing the models is a consideration, but safety and cybersecurity are important factors to address.
MetaAI is being made more prominent in Meta's products, with integration into messaging platforms and search.
The longterm goal is to achieve artificial general intelligence, but the timeline and specifics are uncertain.

Chapters

00:00 Ahmad's Background and Tech Journey
02:50 Training AI Models: Data and Compute Resources
09:22 Scaling from Llama 2 to Llama 3
26:03 Advancing Reasoning Capabilities in AI

You can get the full show as a paid subscriber on https://www.bigtechnology.com