Sessione Bigger models or more data? The new scaling laws for LLMs - AI Conf 2024

Bigger models or more data? The new scaling laws for LLMs

Luca Baggi (xtream)
Lingua: Inglese
Orario: 11:30 - 12:15

The incredibly famous Chinchilla paper changed the way we train LLMs. The authors - including the current Mistral CEO - outlined the scaling laws to maximise your model performance under a compute budget, balancing the number of parameters and training tokens.

Today, these heuristics are in jeopardy. LLaMA-3, for one, is trained on an unreasonable amount of tokens of text - but this is why it's so good. How much data do we actually need to train LLMs? This talk will shed light on the latest trends in model training and perhaps suggest newer scaling laws.