I tend to work on one thing at the time, and always getting my hands dirty implementing, debugging, and scaling myself the models.

I’ve started my journey with continual learning, and did my PhD on this. In hindsight, I was focused on academic benchmarks (e.g. learning incrementally new classes on ImageNet) that were not realistic enough.

When joining DeepMind in late 2022 to work on continual learning, I’ve realized that current deep learning needs more modularity: to learn & unlearn specific knowledge skills but also to scale massively the model without also increasing its inference cost. In order to train massive modular systems, I’ve developped DiLoCo: a new way to do distribute training of LLMs across the world with two orders of magnitude less bandwidth. Several startups are now based on that technology. Using DiLoCo, I’ve made DiPaCo, a new kind of modular architecture whose weights are world-wide distributed and trained semi-independetly.

I’m still working on distributed training, fighting the tyranny of requiring devices colocation. My dream is to do compute arbitrage on all the GPUs and TPUs across the world. No devices should be ever idle, everthing must be used towards training better AIs.

Distributed Training

2025:
- Streaming DiLoCo with overlapping quantized communication: streaming diloco and covered in Import AI #398
- Scaling Laws for DiLoCo: scaling laws and covered in Import AI #404
- Eager Async updates DiLoCo: eager async
- Coverage by The Economist of my research on distributed training
2024:
- Async DiLoCo with heterogenous devices: arxiv [ICML Workshop 2024]
- DiPaCo, distributed modular system: arxiv and covered in Import AI #367
2023:
- DiLoCo: large-scale distributed training of LLMs: arxiv [ICML Workshop 2024]
- It has kickstarted the startup PrimeIntellect with their first projects OpenDiLoCo and Intellect-1 and FlowerLabs with FlowerLM
- Covered in Import AI #349

Continual Learning

Work mosly during my PhD thesis (2019-2022).

2024:
- Interview in the french newspaper ActuIA on continual learning
2022:
- DyTox: Quick adaptation to new tasks with learns tokens for vision transformer: arxiv [CVPR 2022]
- MuHDi Unsupervised continual domain adaptation: arxiv [CVPR Workshop 2022]
- Latent replay for foundation models: arxiv [CoLLas 2023]
- PhD thesis: thesis
- CoNFormer: panoptic segmentation: arxiv [CVPR 2023]
- Compute-optimal transfer learning: arxiv [ICLR Workshop 2023]
2020:
- PLOP: Multi-scale features distillation for segmentation: arxiv [CVPR 2021]
- Ghost: Zero-shot learning using ghost features: arxiv [CVPR Workshop 2021]
- Continumm: a data framework for continual learning: arxiv [CVPR Workshop 2021]
- Synthetic visual data for rehearsal learning: synthetic data [TPAMI 2021]
2019: PODNet: features distillation to reduce catastrophic forgetting with extremely large number of tasks: PODNet [ECCV 2020]

Miscellaneous

2024:
- WARP: less reward hacking with regular merging of diverse RLHF-ed models:: arxiv
- AI Ethics: how to mitigate harms from persuasive LLMs: arxiv
2019:
- Core: Fashion garments color detection arxiv [CVPR Workshop 2020]
2018:
- NATO: led & won the NATO challenge with an aerial object detection method based on RetinaNet while at Dataiku: post