Welcome to CS 4961/6961 : Parallel Numerical Software Correctness

Your quest to understand and deal with how parallelism and floating-point can go wrong -- when you build HPC and AI/ML systems -- starts here! Join us for an engaging learning experience.

(Parallelism and floating-point as imagined by DALL.E; more DALL.E images below.)

About This Class

The Canvas links to CS 4961 is HERE and that for CS 6961 is HERE . We will maintain only the latter, as the requirements are pretty much the same. The official syllabus page at the UofU is HERE. It describes the grading scale, etc. (Basically: 20% for assignments; 30% for participation which includes attendance; and 50% for projects.)

This course is aimed as a reality check: do we really understand how today's AI and ML systems are built---upto and including the design of the underlying GPU codes? We hear stories of GPUs being the central currency of today's computing; wouldn't it be nice to understand GPUs and how one can code-up simple AI and ML systems from scratch to run on them?

Other examples of software we like to study include various scientific computing software systems such as lossy data compression systems and the software realizing irregular graph algorithms. Even these systems run fast(er) on GPUs---calling for a similar understanding.

Such software systems are difficult to create, understand, debug and repair. These systems have many interfaces, follow a plethora of conventions regarding number representations, and involve a basic set of parallel programming skills to implement.

I myself know many things well in this area. But I don't know a ton of things---precisely the reason for teaching. Also I know how to learn (with your help and interactions) about them! And I know that many experts out there lack this closer-to-the-metal knowledge (no offense meant; they are doing great work---but this sort of a low-level foray is cathartic to all). That is why I am eager (as you also better be) eager to build and learn.

Historically, I have liked taking artifacts apart and reassembling them. I don't have a huge recent track-record of doing this (I've done tons of this over my lifetime, especially before I got busy chasing money---including filing-down epoxy-encrusted circuits to their pins and reassembling them). But it is time for me to change course too! I feel I'll emerge a bit more "genuine" if I engage in such "ablution" at least for a little while. Remember "it is never late"--- even for me! :-)

Why don't we engage in a grand adventure over a semester where we teach each other? We will present a few real GPU codes to you on day-1 (more will be added as we go along). To help understand these codes better, we will also be presenting some Python codes that either mirror these codes or call upon the GPU codes to then export efficient higher level functions. We will try and understand the precautions taken in erecting the numeric structures and parallelism structures of these codes. Then we will all build different systems, treating the disassembly results as a "kit of parts".

How to approach correctness? This is DALL.E's depiction following my prompts via ChatGPT :-)'

Don't be a hammer looking for any nail ...but be nails seeking the right set of hammers
What bugs shall we understand to locate and mitigate? So what systems will we tear down? See Resources

I may need to cap this class and choose students who are oriented toward research a bit more. Why? We will need to access GPUs. Also this class requires the right kind of background. For UGs, we will teach you what you don't know. For Grads, I will prefer students with strong interest in publication (and I may talk 1-1 to assess that). Please show up on day-1 and of course meet me beforehand as you wish and as we find common times to meet.

One may say "I am scared; I hardly know 80% of what you said. How can I even do this?" Fear not!

You may ask "What is in it for me?" A lot.

Class Schedule and Material (Done for Weeks 1, 2; later ones TBD and not accurate as listed)

The class meets Tue and Thu from 2pm to 3:20pm in WEB 1230. Here is a more specific schedule.

Day Topic
Week 1
1/7 Overview. Lecture slides are HERE. Video Here.
1/9 Simple 2-layer NNs (multi-layer perceptrons). Slides HW-1 to modify the NN on to a classifier issued.
Week 2
1/14 Motivating Floating-Point for ML Applications. Slides are here.
1/16 Classifiers; Softmax; Meta's work on Scaling Softmax to million tokens; Universality; HW-1 info. ARE HERE.
Week 3
1/21 FP Reasoning: AutoGrad (ML) and FP Error Analysis (Rigorous). Google Slides are here. PDF Slides are here.
1/23 Begin lecturing on GPTs. Google Slides are here.
Week 4
1/28 Slides are here. Attention intuitions, coding, HW-2
1/30 More GPTs. Slides are here.
Week 5
2/4 Slides are here.
2/6 Jack Dongarra Talk (SCI DLS) Abstract is here
Week 6
2/11 Student Presentations + Basic GPU Programming (GG). Slides are here.
2/13 Project option discussions
Week 7
2/18 Student presentations (at least 4)
2/20 Slides are here.
Week 8
2/25 Student presentations (at least 4)
2/27 Student presentations (at least 4)
Week 9
3/4 Slides are here.
3/5 Final tie-together lecture
PROJECT MEETINGS DURING CLASS TIME as well as a generous number of outside hours. The idea is to stay busy, help you debug, and together learn stuff even outside of class.
We will come to class and discuss projects and other topics till the end of the semester. There could also be guest-lectures and other events.


Here are a few examples of bugs likely to arise (these "nails" will seek the right hammers!)

Here are a few examples of where the bugs are likely to arise:
Here are some reference materials for learning about these topics; more will be given:


Get email from here, please.