I tend to work on one thing at the time, and always getting my hands dirty implementing, debugging, and scaling myself the models.
I’ve started my journey with continual learning, and did my PhD on this. In hindsight, I was focused on academic benchmarks (e.g. learning incrementally new classes on ImageNet) that were not realistic enough.
When joining DeepMind in late 2022 to work on continual learning, I’ve realized that current deep learning needs more modularity: to learn & unlearn specific knowledge skills but also to scale massively the model without also increasing its inference cost. In order to train massive modular systems, I’ve developped DiLoCo: a new way to do distribute training of LLMs across the world with two orders of magnitude less bandwidth. Several startups are now based on that technology. Using DiLoCo, I’ve made DiPaCo, a new kind of modular architecture whose weights are world-wide distributed and trained semi-independetly.
I’m still working on distributed training, fighting the tyranny of requiring devices colocation. My dream is to do compute arbitrage on all the GPUs and TPUs across the world. No devices should be ever idle, everthing must be used towards training better AIs.
Distributed Training
- 2025:
- Streaming DiLoCo with overlapping quantized communication: streaming diloco and covered in Import AI #398
- Coverage by The Economist of my research on distributed training
- 2024:
- Async DiLoCo with heterogenous devices: arxiv
[ICML Workshop 2024]
- DiPaCo, distributed modular system: arxiv and covered in Import AI #367
- Async DiLoCo with heterogenous devices: arxiv
- 2023:
- DiLoCo: large-scale distributed training of LLMs: arxiv
[ICML Workshop 2024]
- It has kickstarted the startup PrimeIntellect with their first projects OpenDiLoCo and Intellect-1 and FlowerLabs with FlowerLM
- Covered in Import AI #349
- DiLoCo: large-scale distributed training of LLMs: arxiv
Continual Learning
Work mosly during my PhD thesis (2019-2022).
- 2024:
- Interview in the french newspaper ActuIA on continual learning
- 2022:
- DyTox: Quick adaptation to new tasks with learns tokens for vision transformer: arxiv
[CVPR 2022]
- MuHDi Unsupervised continual domain adaptation: arxiv
[CVPR Workshop 2022]
- Latent replay for foundation models: arxiv
[CoLLas 2023]
- PhD thesis: thesis
- CoNFormer: panoptic segmentation: arxiv
[CVPR 2023]
- Compute-optimal transfer learning: arxiv
[ICLR Workshop 2023]
- DyTox: Quick adaptation to new tasks with learns tokens for vision transformer: arxiv
- 2020:
- PLOP: Multi-scale features distillation for segmentation: arxiv
[CVPR 2021]
- Ghost: Zero-shot learning using ghost features: arxiv
[CVPR Workshop 2021]
- Continumm: a data framework for continual learning: arxiv
[CVPR Workshop 2021]
- Synthetic visual data for rehearsal learning: synthetic data
[TPAMI 2021]
- PLOP: Multi-scale features distillation for segmentation: arxiv
- 2019: PODNet: features distillation to reduce catastrophic forgetting with extremely large number of tasks: PODNet
[ECCV 2020]