About This Class
The Canvas links to CS 4961 is
HERE
and that for CS 6961 is
HERE .
We will maintain only the latter, as the requirements are pretty much the same.
The official syllabus page at the UofU is
HERE.
It describes the grading scale, etc. (Basically: 20% for assignments; 30% for participation which
includes attendance; and 50% for projects.)
This course
is aimed as a reality check: do we really
understand how today's AI and ML systems are built---upto
and including the design of the underlying GPU codes?
We hear stories of GPUs being the central currency of
today's computing; wouldn't it be nice to understand GPUs
and how one can code-up simple AI and ML systems from
scratch to run on them?
Other examples of software we like to study
include various scientific computing
software systems such as lossy data compression systems
and the software realizing irregular graph algorithms.
Even these systems run fast(er) on GPUs---calling for
a similar understanding.
Such software systems are difficult to create, understand, debug
and repair. These systems have many
interfaces, follow
a plethora of conventions regarding number representations, and
involve a basic set of parallel programming skills to implement.
I myself know many things well in this area.
But I don't know a ton of things---precisely
the reason for teaching. Also I know how to
learn (with your help and interactions) about them!
And I know that many experts out there lack this
closer-to-the-metal knowledge (no offense meant; they are
doing great work---but this sort of a low-level foray is
cathartic to all).
That is why I am eager (as you also better be) eager to build and learn.
Historically, I have liked taking
artifacts apart and reassembling them. I don't have a huge
recent track-record of doing this (I've done tons of this
over my lifetime, especially before I got busy chasing money---including
filing-down epoxy-encrusted
circuits to their pins and reassembling them). But it is time for me
to change course too! I feel I'll emerge a bit more "genuine"
if I engage in such "ablution" at least for a little while.
Remember "it is never late"--- even for me! :-)
Why don't we engage in a grand adventure over a semester where
we teach each other? We will present a few real GPU codes to you
on day-1 (more will be added as we go along). To help understand these
codes better, we will also be presenting some Python codes that either
mirror these codes or call upon the GPU codes to then export
efficient higher level functions. We will try and
understand the precautions
taken in erecting the numeric structures and parallelism structures
of these codes. Then we will all build different systems, treating
the disassembly results as a "kit of parts".
How to approach correctness? This is DALL.E's depiction following my prompts via ChatGPT :-)'


I may need to cap this class and choose students who are oriented toward research a bit more. Why? We will need to access GPUs. Also this class requires the right kind of background. For UGs, we will teach you what you don't know. For Grads, I will prefer students with strong interest in publication (and I may talk 1-1 to assess that). Please show up on day-1 and of course meet me beforehand as you wish and as we find common times to meet.
One may say "I am scared; I hardly know 80% of what you said. How can I even do this?" Fear not!
- I and my TMs (Harvey Dam) and Staff Researcher (Dr. Mark Baranowski) will meet you regularly.
- If you lack the basic GPU chops, you'll be given study material (see for example Mark Harris's Blog with A Colab notebook to explore the above here ). Then you come and present it in class. This is not your traditional class. You do the talking mostly! There will be plenty of 1-1s.
- How each week will go: We will have a meeting with you both during class and outside of class. We will teach you stuff 1-1. Then you follow-up and come prepared and present. Presentation slides will be longer - you may get 10min for presenting a gist. The rest is your learning to do a project!
You may ask "What is in it for me?" A lot.
- Confidence of deconstructing and reconstructing something serious that powers the world and our lives.
- A realization that things just don't automatically work correctly! Journey this arduous path.
- Learning in context; GPU programming takes years and years to master; here, we will see some aspects of GPU programming rather well---a different and focus-inducing experience.
- Something publishable. The idea is to ride the rough road a little, and then switch to a cool project. . For instance, reconstituting some parts of a system you tear down in a new language (Rust?) and/or proving properties---all game.
- GRADES: We will give you a passing grade with good participation and nothing else. For a strong grade, do a project, break/recreate something, write a good 10-12 page report. A paper coming out of the classwork after a ton of participation and teaching us a lot (!) is grounds for a solid A. The goals of this course are not set in stone. If you push us in the right direction, this will be a fun class. We will promise utmost hard work on our side, esp. given that this course has not been offered by me before. I do plan to write a book that captures the lessons from this class---hoping to create and prototype some chapters during this class.
- What you can and cannot do: Please set your goals early with respect to your starting-point. We will negotiate that. If you achieve your own personal goals, we are done. If, however, you are not planning to show up to class and/or participate, this class is not for you. You are allowed to take time off, etc., but after proper communication. In other words, treat this class as you would treat your workplace where "playing hooky" is not allowed.
Class Schedule and Material (Done for Weeks 1, 2; later ones TBD and not accurate as listed)
The class meets Tue and Thu from 2pm to 3:20pm in WEB 1230. Here is a more specific schedule.
Day | Topic |
---|---|
Week 1 | |
1/7 | Overview. Lecture slides are HERE. Video Here. |
1/9 | Simple 2-layer NNs (multi-layer perceptrons). Slides HW-1 to modify the NN on to a classifier issued. |
Week 2 | |
1/14 | Motivating Floating-Point for ML Applications. Slides are here. |
1/16 | Classifiers; Softmax; Meta's work on Scaling Softmax to million tokens; Universality; HW-1 info. ARE HERE. |
Week 3 | |
1/21 | FP Reasoning: AutoGrad (ML) and FP Error Analysis (Rigorous). Google Slides are here. PDF Slides are here. |
1/23 | Begin lecturing on GPTs. Google Slides are here. |
Week 4 | |
1/28 | Slides are here. Attention intuitions, coding, HW-2 |
1/30 | More GPTs. Slides are here. |
Week 5 | |
2/4 | Slides are here. |
2/6 | Jack Dongarra Talk (SCI DLS) Abstract is here |
Week 6 | |
2/11 | Student Presentations + Basic GPU Programming (GG). Slides are here. |
2/13 | Project option discussions |
Week 7 | |
2/18 | Student presentations (at least 4) |
2/20 | Slides are here. |
Week 8 | |
2/25 | Student presentations (at least 4) |
2/27 | Student presentations (at least 4) |
Week 9 | |
3/4 | Slides are here. |
3/5 | Final tie-together lecture |
SPRING BREAK | |
PROJECT MEETINGS DURING CLASS TIME as well as a generous number of outside hours. The idea is to stay busy, help you debug, and together learn stuff even outside of class. | |
We will come to class and discuss projects and other topics till the end of the semester. There could also be guest-lectures and other events. |
Resources
Here are a few examples of bugs likely to arise (these "nails" will seek the right hammers!)
- Rounding errors.
- Floating-point NaNs.
- Data races.
Here are a few examples of where the bugs are likely to arise:
- The autograd function of some simple neural networks, including baby LLMs (expert-written).
- Various inner layers of neural networks (designed by stalwarts such as Karpathy).
- Various lossy compressors and graph algorithm codes (written by pioneers in this area).
Here are some reference materials for learning about these topics; more will be given:
- A good survey : history of sequence-to-sequence models from the early days to attention. by Prof. Graham Neubig.
- Transformers: a Primer, by Justin S.Y. Lee, which builds on the Neubig survey.
- A comprehensive textbook authored by Dr. Smola and team - with detailed descriptions accompanied by Jupyter notebooks.
- A course with embedded quizzes (and answers) by TM Harvey Dam, PhD student.
- An inexpensive but very good book. Its Jupyter notebooks great!
- Grant Sanderson's LLM video (the best - 3Blue1Brown)
- Karpathy's videos MicroGrad, GPT-2 from scratch, GPT2 Repro, and (at least) LLM.c
- Tensor Cores (also see blog)
- Vectorization synthesis. A nice convergence of LLM-assisted code generation, guarded by formal verification.
- Faults: what they can do to networks; This is an ISCA 2023 paper on it. This is a forthcoming PPoPP'25 paper.
- The LC Compression Framework from Texas State University.
- The PyBlaz Lossy Compressor developed at the University of Utah.
- The Indigo-3 Graph Algorithm Benchmark Suite, allowing programmable bug injection.
- The GPU-FPX Framework for floating-point exception detection.
- A Feature-Targeted Testing Method for Tensor Cores.
What shall we study? Here are a few examples:
- How to do backprop through all sorts of layers including auto-encoders.
- How forward and reverse-mode differentiation is realized.
- How numerical precision is handled.
- How threads and warps are organized.
- How special units such as tensor-cores are brought in.