Mini-Seek
Language Model

DeepSeek-style model from scratch

Apr 2026

PyTorch

MLA / RoPE

MoE Routing

I implemented a compact DeepSeek-style language model in PyTorch with Multi-Head Latent Attention, decoupled RoPE, Mixture-of-Experts routing, Multi-Token Prediction heads, simulated FP8 utilities, and config-driven training and checkpointing.

Role

Model Builder



System Surface

Attention, routing, training

Stack

Python and PyTorch



Model Surface

MLA, MoE, MTP heads

The build is a compact lab for modern language-model mechanics: latent attention, positional encoding choices, expert routing, multi-token prediction, and training checkpoints that make the model easier to inspect and iterate.

Build Period

2026 Project Cycle



Project Status

Public repo

Primary Focus

Language-model mechanics



Output Format

DeepSeek-style pipeline