June 2020

Byron Jaeger, PhD This email address is being protected from spambots. You need JavaScript enabled to view it.Assistant Professor, Biostatistics

Broad research focus?

Statistical programming and machine learning, developing R packages and projects, blood pressure and cardiovascular disease.

Year joined SOPH?

I joined the Department of Biostatistics in 2017 after completing my PhD at UNC.

What drew you to biostatistics?

When I was younger, I was interested in using what I knew (mostly math) to positively impact people's health. The professor of my intro to probability class recommended that I look into biostatistics and graduate school, so I went for it.

An exciting area you are working in right now?

I live in the both the methodological and applied research world, and there are cool things happening in both places. Lately, I have spent a lot of time engaging with data from the Jackson Heart Study to find what behaviors are associated with lower risk for cardiovascular disease among African Americans with different family income levels. I am also getting ready to submit a statistical manuscript that introduces and assesses a faster version of cross-validation for analyses that involve missing data.

Favorite (self-authored) manuscript?

I had a paper published last year that introduced a new kind of random forest. Random forests are a machine learning algorithm that grow a large collection of decision trees, giving each one its own subset of data to learn from.

Individually, each decision tree is pretty bad at predicting an outcome. However, averaging predictions from all of the trees (i.e., the forest) is much more accurate than any individual tree's prediction. Aside from accurate predictions, random forests are also a nice metaphor for teamwork and healthy discourse. The idea for this paper sort of fell into my head while I was walking back to my office from the UAB hypertension symposium. I had to write a bunch of C++ to make something that would fit the particular kind of random forest I wanted, and I was nervous the entire time that the idea wouldn't pan out. Luckily, the idea was solid and eventually published along with an R package (obliqueRSF).

Jaeger B, Long LD, Long DM, Sims M, Szychowski J, Min Y, Mcclure L, Howard G, and Simon, N. (2019). “Oblique random survival forests”, Annals of Applied Statistics, Vol. 13 (3), pp. 1847-1883.

Best conference you've attended?

At my first Joint Statistical Meeting, I enrolled in a machine learning introductory course taught by Noah Simon. It was my first year at the SOPH and I didn't feel confident in the basics yet. Noah can explain technical and tedious things as if he were explaining tic-tac-toe. I kept in touch with Noah after the course and we are now collaborators. He was the senior author on the "Oblique random survival forests" paper.

Any research questions that are on your wish list?

I enjoy writing statistical programs and packages and wish I had more time to do so. I'm looking forward to learning natural language processing, which can be used to organize unstructured data such as physician notes in the electronic health record. It can also obviate the need for manual chart review, enabling us to search medical records and text fields on a much broader scale.