“Data science is one of the most in-demand skillsets today — with the least supply of people who can do the job,” said William Monroe, a scientist in UAB IT Research Computing. Programs such as UAB’s new Master’s in Data Science are working to fill the gap. Meanwhile, researchers, especially those working in neuroscience and biostatistics at UAB, grasp the potential of techniques such as machine-learning but “don’t know where to start,” Monroe said. “Machine-learning is now where HTML was in 1997. Unless you had had an HTML course back then, you didn’t know how to build a website and it was all a mystery. These are the hot tools right now that everyone is figuring out how to use.”
During the past year, researchers with questions have ended up at Monroe’s door. That’s because Research Computing has lots of Graphical Processing Units (GPUs), the hardware that has fueled the current artificial intelligence and machine-learning revolutions. “We have 72 Nvidia GPUs” in the Cheaha supercomputer cluster, Monroe explained, and data-science queries comprise a growing percentage of the weekly office hours sessions that Research Computing offers, he added.
Jump in today
Join the club
So Monroe and colleague Ravi Tripathi, a software developer, found a way to scale up their training efforts. In April, they launched the Data Science Club. This isn’t a traditional lab journal club — it’s more like the Dollar Shave Club for machine-learning and biostats. For free.
The Data Science Club is open to “anybody who might be interested,” Monroe said. Each week, members get an online notebook of code that covers a fundamental data-science concept — along with a YouTube Live screencast video of Monroe and Tripathi (which "airs" live on Fridays from 10:30-11:30 a.m.) working through the lesson’s steps in real time. This is decidedly hands-on instruction of the type you would get if you came in to Research Computing’s office hours — except the schedule is entirely up to the user, Monroe said. “This work is best done when you’re sitting at your computer trying to do it.”
Monroe and Tripathi don’t have to worry about whether their students have the right hardware. Club members get their own accounts on UAB’s Cheaha supercomputer — the state’s fastest — and interact with it using any web browser through the new UAB Research Computing on Demand system. The first Data Science Club video walks participants through setting up the necessary Python and R programming languages in their user accounts.
|Open office hours to discuss club content:
Getting over the hump
The videos don’t skip steps or delete operator errors. “Part of the experience is getting to see Ravi and I struggle through,” Monroe explained. “We forget to put in semicolons and make other mistakes, just like every user. That will hopefully make it more approachable.”
The ultimate goal is to build a community, he says. As his participant list grows, Monroe hopes that users can build collaborations. “We want to create a space where conversations can happen, where an undergraduate in electrical and computer engineering can come alongside a neurobiologist,” he said. “We want to generate that activation energy to push people over the hump of learning a new skill.”