I tend to work on one thing at the time, and always getting my hands dirty implementing, debugging, and scaling myself the models.
I’ve started my journey with continual learning, and did my PhD on this. In hindsight, I was focused on academic benchmarks (e.g. learning incrementally new classes on ImageNet) that were not realistic enough.
When joining DeepMind in late 2022 to work on continual learning, I’ve realized that current deep learning needs more modularity: to learn & unlearn specific knowledge skills but also to scale massively the model without also increasing its inference cost. In order to train massive modular systems, I’ve developped DiLoCo: a new way to do distribute training of LLMs across the world with two orders of magnitude less bandwidth. Several startups are now based on that technology. Using DiLoCo, I’ve made DiPaCo, a new kind of modular architecture whose weights are world-wide distributed and trained semi-independetly.
I’m still working on distributed training, fighting the tyranny of centralized synchronous training. My dream is to do compute arbitrage on all the GPUs and TPUs across the world. No devices should be ever idle, everything must be used towards training better AIs.
Distributed Training
- 2025:
- Streaming DiLoCo with overlapping quantized communication and covered in Import AI #398
- Scaling Laws for DiLoCo and covered in Import AI #404
- Eager Async updates DiLoCo
- Coverage by The Economist of my research on distributed training
- 2024:
- Async DiLoCo with heterogenous devices
- DiPaCo, distributed modular system and covered in Import AI #367
- 2023:
- DiLoCo: large-scale distributed training of LLMs
- It has kickstarted the startup PrimeIntellect with their first projects OpenDiLoCo and Intellect-1 and FlowerLabs with FlowerLM
- Covered in Import AI #349
Continual Learning
Work mosly during my PhD thesis (2019-2022).
- 2024:
- Interview in the french newspaper ActuIA on continual learning
- 2022:
- DyTox: Quick adaptation to new tasks with learns tokens for vision transformer
- MuHDi Unsupervised continual domain adaptation
- Latent replay for foundation models
- PhD thesis
- CoNFormer: panoptic segmentation
- Compute-optimal transfer learning
- 2020:
- 2019: PODNet: features distillation to reduce catastrophic forgetting with extremely large number of tasks