About This Catalog
This page collects the public links embedded throughout the shared course planning material. It intentionally excludes a few non-public or unsafe entries, such as localhost-only URLs, one-time tokenized raw links, and machine-local pseudo-links that are not meaningful on a public course website.
Course Documents and Administration
Slides, Notes, and Shared Class Documents
- Course organization notes
- Ganesh introductory slides
- Sree intro slides
- Number systems and tools slides
- Performance model slides
- GPU performance slides
- GPU execution and schedule slides
- AWS training slides
- Neuron architecture slides
- Project and discussion slides
- Student or guest presentation deck
- MLIR / MLIR-AIR slides
- Faial / GKLEE slides
- NVBit and barrier instrumentation slides
- SLEEK / later-semester slides
- Transform dialect / late-semester slides
- Hoare logic supplemental file
- AWS or profiling supplemental video/file
- AWS or profiling supplemental file mirror
- Additional AWS training video/file
- Software pipelining notes
- Assignment Overleaf workspace
- Asg-3 Overleaf workspace
- AWS/reporting Overleaf workspace
Software, Repositories, and Tooling
- Course software repository
- Tilus course checkout
- AIR2CUDA course checkout
- MLIR transform tutorial checkout
- Barrier NVBit tool checkout
- Barrier NVBit execution notes
- Nixnan LU demo
- Nixnan histogram results
- TVM FFI repository
- TVM FFI documentation
- NVBit instrumentation tutorial
- SLEEK repository
- GEAK repository
- TileGym repository
- Sebastian Raschka repositories
- AWS Neuron NKI samples
- NKI attention kernels source
- Stanford CS149 Trainium-2 assignment
- AWS Trainium starter guide
- Stephen Diehl GPU offload notebook
- Stanford CS336 LLM material
Papers and Reading Links
- Scalable SMT-based verification of GPU kernel functions
- Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction
- uiCA throughput prediction paper
- Faial modular static cost analysis PDF
- Memory Access Protocols / Faial-related paper PDF
- Tilus paper PDF
- Mojo in HPC paper
- RenderMan XPU paper/page
- tritonBLAS paper
- ParallelKittens paper
- Hoare Logic of GPU Programs
- LLVM/IR or GPU optimization reading
- OpenReview paper linked in syllabus
- Recent paper linked in syllabus
- Additional accelerator/compiler reading
- From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
- MLIR paper
- GKLEE paper PDF
- Another correctness/performance paper linked in syllabus
- Goldberg floating-point classic
- Recent linked paper PDF
- SLEEK IPDPS 2026 paper
- LLVM developer meeting slides
- Recent arXiv paper linked in syllabus
- CuFuzz NVIDIA research page
- Transform dialect or related compiler paper
- Tilus paper HTML view
- Equivalence Checking of GPU Kernels
- GEAK paper
- ProofWright paper
- KernelBench paper
- Dynamic determinacy race detection paper PDF
- Optimal software pipelining and warp specialization for tensor core GPUs
- Aiken 1995 software pipelining paper
- CIVL model checker
- You don't know jack about shared variables or memory models
- FastTrack race detection
- GPUVerify paper
- Vercors tool
- Saeed Darabi thesis
- Scalable yet rigorous floating-point error analysis
- SMT Colab notebook
- FP analysis tools
- Herbie floating-point rewriting
- GPU Computing Gems
- Recent performance-model paper
- Shared-memory atomic bottlenecks paper DOI
- Analytical performance models for GPUs