Small
Language Model

Transformer built from first principles

Aug 2025

PyTorch

Custom Tokenizer

Mixed Precision

I built and trained a small transformer from scratch to understand the language-model pipeline below the framework gloss: tokenizer, attention stack, training loop, loss curves, and the small decisions that make a model actually learn.

Role

Builder



System Surface

Tokenizer, attention, training

Stack

Python and PyTorch



Training Setup

Mixed precision on 2M samples

The model reached a much cleaner training curve after the tokenizer and attention stack stabilized, reducing loss from 9.4 to 2.4. The point was not to compete with frontier systems; it was to build the machinery closely enough to reason about it.

Build Period

2025 Project Cycle



Project Status

Public project

Primary Focus

Model mechanics



Output Format

Transformer pipeline