Scaling laws, data distributions, and learning dynamics: simulated high-energy physics data as a benchmark for data in the wild

Physics-based data generation for predicting and testing fundamental questions around how AI learns, generalizes and scales.

| April 11, 2026
Abstract background with flowing shapes

Current theoretical AI research often relies on overly simple data modeling, making it difficult to answer fundamental questions about scaling laws or whether models truly learn underlying latent parameters. University of Toronto Professor Yonatan Kahn proposes a method of physics-based data generation that will provide ‘ground-truth information’ to researchers, allowing them to predict and test some of the fundamental questions around how AI learns, generalizes and scales.

Collaborators

  • Yonatan Kahn

    University of Toronto