Learning in Plain Sight

The idea that you can set something up, flick a switch and watch it run has always captivated me. A computer is a Rube Goldberg machine made of silicon and metal, powered by electrons. That captivation turned into a career and years after becoming a software developer I got interested in machine learning and natural language processing. Years after, AI models like ChatGPT hit the scene and further gained my interest and I want to understand how it works at a fundamental level.

I recently had the realization that I don't want to just learn and implement these models from scratch, I want to contribute to the world of computer science. I want to learn how to read and write academic papers even though I have no degree and no experience doing so. Over the next 52 weeks I am going to immerse myself in the academic world of AI.

What I can't do

I know how to build software that uses modern AI models. Things like agentic workflows, chatbots, RAG memory and all of those buzzwords. I have over 10 years of experience as a software engineer and have architected systems and built lots of things that I have been absolutely proud of.

Where I struggle the most is with the math. I have spent quite a bit of time studying both linear algebra and basic calculus, but I always have a hard time following the math in the papers that I try to read. I also have no intuition about the underlying models themselves. In the past I have implemented simple models from scratch and that knowledge only sticks for a while. This time is different, I am going to keep going until that intuition builds, I am not only going to learn how to effectively read papers but how to write them myself.

The 52 week plan

Phase 1 (May–Jul): Foundations.

Karpathy's Zero-to-Hero series as the spine: Micrograd, Makemore, GPT from scratch. My overarching goal here is to write a working transformer in PyTorch, written line-by-line, by end of July. I also plan on studying probability and statistics in parallel.

Phase 2 (Aug–Oct): Broadening.

This is where I really start diving into papers. Read a paper a week and try to implement at least one per month. I also want to write more about what I'm learning in a more rigorous manner.

Phase 3 (Nov–Jan): Specializing.

I am really interested in going deep on agent architectures and interpretability. I want to really understand what is going on under the hood and try to build my understanding of what's actually happening inside these models, and how to make them less opaque to others.

Phase 4 (Feb–May): Original work.

This is where I will actually attempt to create something of value: My first paper. I want to try my hand at writing a workshop paper and get it in front of real researchers. After this process, I will spend some time reflecting on the year and figure out my next 52 week plan.

Why publish this?

Well, accountability. I don't want to look back and see that I gave up 3 weeks in. Last year I did 24 weeks of intense low-level programming, I wrote every week and a larger post every month and it really helped me to keep going. I published those on Github and even though I only made it to week 19 I learned a lot and I am glad that I made it that far.

I plan to write a post a week recapping what I studied and how I feel about my learning journey. I want to stay committed and on track and when I'm done I will have a clear path back to the beginning, a clear look into how I thought and how I felt throughout.

Next up

Week 1 is in the books. Next Sunday: Micrograd built from scratch, the start of Makemore, the first two lectures of MIT 6.012, and the opening chapter of Bishop. The beginning of a transformer, one piece at a time.