- Kaplan, J. et al. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361
- Delétang, G. et al. (2023/2024). Language Modeling Is Compression. ICLR 2024. https://arxiv.org/abs/2309.10668
- Tishby, N. & Zaslavsky, N. (2015). Deep Learning and the Information Bottleneck Principle. https://arxiv.org/abs/1503.02406
- Shwartz-Ziv, R. & Tishby, N. (2017). Opening the Black Box of Deep Neural Networks via Information. https://arxiv.org/abs/1703.00810
- (2025). A Generalized Information Bottleneck Theory of Deep Learning. https://arxiv.org/abs/2509.26327