I tend to work on one thing at the time, and always getting my hands dirty implementing, debugging, and scaling myself the models.

I’ve started my journey with continual learning, and did my PhD on this. In hindsight, I was focused on academic benchmarks (e.g. learning incrementally new classes on ImageNet) that were not realistic enough.

When joining DeepMind in late 2022 to work on continual learning, I’ve realized that current deep learning needs more modularity: to learn & unlearn specific knowledge skills but also to scale massively the model without also increasing its inference cost. In order to train massive modular systems, I’ve developped DiLoCo: a new way to do distribute training of LLMs across the world with two orders of magnitude less bandwidth. Several startups are now based on that technology. Using DiLoCo, I’ve made DiPaCo, a new kind of modular architecture whose weights are world-wide distributed and trained semi-independetly.

I’m still working on distributed training, fighting the tyranny of centralized synchronous training. My dream is to do compute arbitrage on all the GPUs and TPUs across the world. No devices should be ever idle, everything must be used towards training better AIs.

Distributed Training

2025:
- Streaming DiLoCo with overlapping quantized communication and covered in Import AI #398
- Scaling Laws for DiLoCo and covered in Import AI #404
- Eager Async updates DiLoCo
- Coverage by The Economist of my research on distributed training
2024:
- Async DiLoCo with heterogenous devices
- DiPaCo, distributed modular system and covered in Import AI #367
2023:
- DiLoCo: large-scale distributed training of LLMs
- It has kickstarted the startup PrimeIntellect with their first projects OpenDiLoCo and Intellect-1 and FlowerLabs with FlowerLM
- Covered in Import AI #349

Continual Learning

Work mosly during my PhD thesis (2019-2022).

2024:
- Interview in the french newspaper ActuIA on continual learning
2022:
- DyTox: Quick adaptation to new tasks with learns tokens for vision transformer
- MuHDi Unsupervised continual domain adaptation
- Latent replay for foundation models
- PhD thesis
- CoNFormer: panoptic segmentation
- Compute-optimal transfer learning
2020:
- PLOP: Multi-scale features distillation for segmentation
- Ghost: Zero-shot learning using ghost features
- Continuum: a data framework for continual learning
- Synthetic visual data for rehearsal learning
2019: PODNet: features distillation to reduce catastrophic forgetting with extremely large number of tasks

Miscellaneous

2024:
- WARP: less reward hacking with regular merging of diverse RLHF-ed models
- AI Ethics: how to mitigate harms from persuasive LLMs
2019:
- Core: Fashion garments color detection
2018:
- NATO: led & won the NATO challenge with an aerial object detection method based on RetinaNet while at Dataiku