This website contains learnings from Statistics 285: Massive Computational Experiments, Painlessly written by Shilaan Alzahawi. Statistics 285 was taught at Stanford University in Spring 2021, by David Donoho, Mahsa Lotfi, and Alon Kipnis. More information on the class can be found on the course website.
Ambitious Data Science requires massive computational experimentation; the entry ticket for a solid PhD in some fields is now to conduct experiments involving 1 Million CPU hours. Recently several groups have created efficient computational environments that make it painless to run such massive experiments. This course reviews state-of-the-art practices for doing massive computational experiments on compute clusters in a painless and reproducible manner.
Students will learn how to automate their computing experiments first of all using nuts-and-bolts tools such as Perl and Bash, and later using available comprehensive frameworks such as ClusterJob and CodaLab, which enables them to take on ambitious Data Science projects. The course also features few guest lectures by renowned scientists in the field of Data Science. Students should have a familiarity with computational experiments and be facile in some high-level computer language such as R, Matlab, or Python.
This website contains a step-by-step guide, written by Shilaan Alzahawi, to running a massive computational experiment on a High Performance Computing (HPC) cluster – Stanford’s Sherlock cluster – using the ClusterJob automation system.