Building my own LLM - Sumukha Kashyap

# Building my own LLM > [!info] > A version of this post was published on my LinkedIn page as well. **I built my own LLM during the Christmas break! 🎄🤖** If you know me, you're probably familiar with my nature to pick up something new and obsessively learn everything I can about it over the span of a few weeks. While I’ve been an active user of tools like ChatGPT and Claude for a while now, I wanted to go beyond usage and develop a deeper understanding of the underlying technology. I decided the only way to do this would be start from a very first principles approach. I started by revisiting Linear Algebra and Calculus (even working through gradient descent calculations by hand!), followed by deep learning lectures and research papers on LSTMs and Transformers. Along the way, I used tools like Claude and Mathstral to assist with learning and experimentation, which proved invaluable in building a strong theoretical base. Once I felt confident in my understanding of the fundamentals, I wanted to start building something. After about a week of learning PyTorch, I was comfortable enough to experiment with small projects. I came across Andrej Karpathy’s NanoGPT project, and I decided to build and train my own GPT-like model. Using a subset of Jane Austen’s public domain works as training data, I developed ✨ 𝗯𝗲𝗻𝗻𝗲𝘁𝘁-𝗻𝗮𝗻𝗼𝗴𝗽𝘁 ✨. #### Training & Performance Details - 65,744 Tokens (around 5K lines of text) - ~1.2 million parameters - 6 layers, 6 attention heads and an embedding size of 128. - Final Training Loss: 0.9384 - Final Validation Loss: 1.2475 - Trained on an M1 MBP with the Apple Silicon GPU - Overfitting seen beyond 3400 epochs (see chart below) I've added some sample outputs in the images below. #### Conclusion The outputs are basically nonsensical gibberish - but it is made up of (mostly) real English words! The model starts by producing what might as well be secure passwords and goes on to generate real English words (see image below for progression in output). However, it is not nearly a large enough model to learn grammar / semantic meaning + i don't have enough compute to do that locally. However, I really enjoyed building this and learning from it. I already have lots of ideas for what to build next, and deepen my understanding. If you want to check out the model and maybe even run it yourself, here you go: https://github.com/shkp/bennett-nanogpt P.S If it's not already clear from my post above, I'm a complete noob to the AI/LLM field and still have a lot to learn. If you think this can be improved of if you have ideas for other cooler/better projects, just [hit me up](mailto:[email protected]). :) *Published: 04/01/2025*