Welcome to CS 4961/6961

About This Class

The Canvas links to CS 4961 is HERE and that for CS 6961 is HERE . We will maintain only the latter, as the requirements are pretty much the same. The official syllabus page at the UofU is HERE. It describes the grading scale, etc. (Basically: 20% for assignments; 30% for participation which includes attendance; and 50% for projects.)

This course is aimed as a reality check: do we really understand how today's AI and ML systems are built---upto and including the design of the underlying GPU codes? We hear stories of GPUs being the central currency of today's computing; wouldn't it be nice to understand GPUs and how one can code-up simple AI and ML systems from scratch to run on them?

Other examples of software we like to study include various scientific computing software systems such as lossy data compression systems and the software realizing irregular graph algorithms. Even these systems run fast(er) on GPUs---calling for a similar understanding.

Such software systems are difficult to create, understand, debug and repair. These systems have many interfaces, follow a plethora of conventions regarding number representations, and involve a basic set of parallel programming skills to implement.

I myself know many things well in this area. But I don't know a ton of things---precisely the reason for teaching. Also I know how to learn (with your help and interactions) about them! And I know that many experts out there lack this closer-to-the-metal knowledge (no offense meant; they are doing great work---but this sort of a low-level foray is cathartic to all). That is why I am eager (as you also better be) eager to build and learn.

Historically, I have liked taking artifacts apart and reassembling them. I don't have a huge recent track-record of doing this (I've done tons of this over my lifetime, especially before I got busy chasing money---including filing-down epoxy-encrusted circuits to their pins and reassembling them). But it is time for me to change course too! I feel I'll emerge a bit more "genuine" if I engage in such "ablution" at least for a little while. Remember "it is never late"--- even for me! :-)

Why don't we engage in a grand adventure over a semester where we teach each other? We will present a few real GPU codes to you on day-1 (more will be added as we go along). To help understand these codes better, we will also be presenting some Python codes that either mirror these codes or call upon the GPU codes to then export efficient higher level functions. We will try and understand the precautions taken in erecting the numeric structures and parallelism structures of these codes. Then we will all build different systems, treating the disassembly results as a "kit of parts".

How to approach correctness? This is DALL.E's depiction following my prompts via ChatGPT :-)'

Don't be a hammer looking for any nail

...but be nails seeking the right set of hammers

What bugs shall we understand to locate and mitigate? So what systems will we tear down? See Resources

I may need to cap this class and choose students who are oriented toward research a bit more. Why? We will need to access GPUs. Also this class requires the right kind of background. For UGs, we will teach you what you don't know. For Grads, I will prefer students with strong interest in publication (and I may talk 1-1 to assess that). Please show up on day-1 and of course meet me beforehand as you wish and as we find common times to meet.

One may say "I am scared; I hardly know 80% of what you said. How can I even do this?" Fear not!

I and my TMs (Harvey Dam) and Staff Researcher (Dr. Mark Baranowski) will meet you regularly.
If you lack the basic GPU chops, you'll be given study material (see for example Mark Harris's Blog with A Colab notebook to explore the above here ). Then you come and present it in class. This is not your traditional class. You do the talking mostly! There will be plenty of 1-1s.
How each week will go: We will have a meeting with you both during class and outside of class. We will teach you stuff 1-1. Then you follow-up and come prepared and present. Presentation slides will be longer - you may get 10min for presenting a gist. The rest is your learning to do a project!

You may ask "What is in it for me?" A lot.

Confidence of deconstructing and reconstructing something serious that powers the world and our lives.
A realization that things just don't automatically work correctly! Journey this arduous path.
Learning in context; GPU programming takes years and years to master; here, we will see some aspects of GPU programming rather well---a different and focus-inducing experience.
Something publishable. The idea is to ride the rough road a little, and then switch to a cool project. . For instance, reconstituting some parts of a system you tear down in a new language (Rust?) and/or proving properties---all game.
GRADES: We will give you a passing grade with good participation and nothing else. For a strong grade, do a project, break/recreate something, write a good 10-12 page report. A paper coming out of the classwork after a ton of participation and teaching us a lot (!) is grounds for a solid A. The goals of this course are not set in stone. If you push us in the right direction, this will be a fun class. We will promise utmost hard work on our side, esp. given that this course has not been offered by me before. I do plan to write a book that captures the lessons from this class---hoping to create and prototype some chapters during this class.
What you can and cannot do: Please set your goals early with respect to your starting-point. We will negotiate that. If you achieve your own personal goals, we are done. If, however, you are not planning to show up to class and/or participate, this class is not for you. You are allowed to take time off, etc., but after proper communication. In other words, treat this class as you would treat your workplace where "playing hooky" is not allowed.

Class Schedule and Material (Done for Weeks 1, 2; later ones TBD and not accurate as listed)

The class meets Tue and Thu from 2pm to 3:20pm in WEB 1230. Here is a more specific schedule.

Day	Topic
Week 1
1/7	Overview. Lecture slides are HERE. Video Here.
1/9	Simple 2-layer NNs (multi-layer perceptrons). Slides HW-1 to modify the NN on to a classifier issued.
Week 2
1/14	Motivating Floating-Point for ML Applications. Slides are here.
1/16	Classifiers; Softmax; Meta's work on Scaling Softmax to million tokens; Universality; HW-1 info. ARE HERE.
Week 3
1/21	FP Reasoning: AutoGrad (ML) and FP Error Analysis (Rigorous). Google Slides are here. PDF Slides are here.
1/23	Begin lecturing on GPTs. Google Slides are here.
Week 4
1/28	Slides are here. Attention intuitions, coding, HW-2
1/30	More GPTs. Slides are here.
Week 5
2/4	Slides are here.
2/6	Jack Dongarra Talk (SCI DLS) Abstract is here
Week 6
2/11	Student Presentations + Basic GPU Programming (GG). Slides are here.
2/13	Project option discussions
Week 7
2/18	Student presentations (at least 4)
2/20	Slides are here.
Week 8
2/25	Student presentations (at least 4)
2/27	Student presentations (at least 4)
Week 9
3/4	Slides are here.
3/5	Final tie-together lecture
SPRING BREAK
PROJECT MEETINGS DURING CLASS TIME as well as a generous number of outside hours. The idea is to stay busy, help you debug, and together learn stuff even outside of class.
We will come to class and discuss projects and other topics till the end of the semester. There could also be guest-lectures and other events.

Resources

Here are a few examples of bugs likely to arise (these "nails" will seek the right hammers!)

Rounding errors.
Floating-point NaNs.
Data races.

Here are a few examples of where the bugs are likely to arise:

The autograd function of some simple neural networks, including baby LLMs (expert-written).
Various inner layers of neural networks (designed by stalwarts such as Karpathy).
Various lossy compressors and graph algorithm codes (written by pioneers in this area).

Here are some reference materials for learning about these topics; more will be given:

A good survey : history of sequence-to-sequence models from the early days to attention. by Prof. Graham Neubig.
Transformers: a Primer, by Justin S.Y. Lee, which builds on the Neubig survey.
A comprehensive textbook authored by Dr. Smola and team - with detailed descriptions accompanied by Jupyter notebooks.
A course with embedded quizzes (and answers) by TM Harvey Dam, PhD student.
An inexpensive but very good book. Its Jupyter notebooks great!
Grant Sanderson's LLM video (the best - 3Blue1Brown)
Karpathy's videos MicroGrad, GPT-2 from scratch, GPT2 Repro, and (at least) LLM.c
Tensor Cores (also see blog)
Vectorization synthesis. A nice convergence of LLM-assisted code generation, guarded by formal verification. Faults: what they can do to networks; This is an ISCA 2023 paper on it. This is a forthcoming PPoPP'25 paper.
The LC Compression Framework from Texas State University.
The PyBlaz Lossy Compressor developed at the University of Utah.
The Indigo-3 Graph Algorithm Benchmark Suite, allowing programmable bug injection.
The GPU-FPX Framework for floating-point exception detection.
A Feature-Targeted Testing Method for Tensor Cores.

What shall we study? Here are a few examples:
- How to do backprop through all sorts of layers including auto-encoders.
- How forward and reverse-mode differentiation is realized.
- How numerical precision is handled.
- How threads and warps are organized.
- How special units such as tensor-cores are brought in.
Extra resources are being kept at CURRENT RESOURCES.

Contact

Get email from here, please.