A Practical Guide to Large Models in Just 100 Pages
I recommend a practical guide to large models with only 100 pages: The Hundred-Page Language Models Book: hands-on with PyTorch.
The author, Andriy Burkov, holds a PhD in artificial intelligence with a primary research focus on natural language processing (NLP). With 20 years of extensive experience in machine learning, the author succinctly covers fundamental language models up to the Transformer architecture, enabling readers to grasp the concepts more quickly.

Key Topics:
-
Fundamentals: Preprocessing (tokenization, vocabulary construction, embeddings), and how to convert words into machine-understandable vectors.
-
Language Models: Starting from the simplest N-gram model, then implementing RNNs and LSTMs using PyTorch.
-
Core Model Concepts: Self-attention mechanisms and the Transformer architecture.
-
Practical Applications: Writing loss functions, optimizers, iterative training, text generation, pretraining, fine-tuning, and other model application techniques.
The book is concise at just 100 pages. The author’s goal is to teach readers how to train a large model as quickly as possible. The theoretical content is minimal, focusing instead on model code implementations using PyTorch—the currently most popular deep learning framework for building language models.
To get the most out of this book, it is recommended that readers have some Python coding experience and a basic understanding of deep learning. Many readers highly praise this book, calling it the most concise resource on large language models. Additionally, it frequently ranks among bestsellers, has been translated into over a dozen languages, and is used as a textbook at many universities.
All code from the book is available on GitHub and supports direct execution in Jupyter notebooks. Readers can download it here: github.com/aburkov/theLMbook