Small
Language Model

Transformer built from first principles

Aug 2025

PyTorch

Custom Tokenizer

Mixed Precision

I built and trained a small transformer from scratch to understand the language-model pipeline below the framework gloss: tokenizer, attention stack, training loop, loss curves, and the small decisions that make a model actually learn.

Role

Builder

System Surface

Tokenizer, attention, training

Stack

Python and PyTorch

Training Setup

Mixed precision on 2M samples

The model reached a much cleaner training curve after the tokenizer and attention stack stabilized, reducing loss from 9.4 to 2.4. The point was not to compete with frontier systems; it was to build the machinery closely enough to reason about it.

Build Period

2025 Project Cycle

Project Status

Public project

Primary Focus

Model mechanics

Output Format

Transformer pipeline

Small
Language Model

Index

Research

Archive

Papers

Connect

SmallLanguage Model

Small
Language Model