Is U-BRITE 2.0 right for you?

Written by 

Editor's Note: The information published in this story is accurate at the time of publication. Always refer to for UAB's current guidelines and recommendations relating to COVID-19.

rep ubrite 550pxResearch, as many scientists have observed, is like solving a jigsaw puzzle.

In that analogy, U-BRITE is a nice, big table with space to spread out the pieces and see what you have to work with. You could also think of it as one of those fancy cases that lets you store your puzzle-in-progress while you plug away at it over time. And, to bend the analogy further, you also could think of U-BRITE as an invisibility cloak to prevent unauthorized observers from snooping at the pieces you have collected.

To be precise, this is U-BRITE 2.0, the new and improved version of the UAB Biomedical Research Information Technology Enhancement tool, which made its public debut in late January 2020.

“U-BRITE is designed to manage the puzzle of team-based research,” said James Cimino, M.D., director of the UAB Informatics Institute, which built U-BRITE. “It is a system to bring all your project pieces and team members together.”

U-BRITE takes on COVID-19 — learn more about the new biomedical data science web portal built to tackle the novel coronavirus.

Like the original, U-BRITE 2.0 gives investigators access to a secure, online workspace with high-volume storage and an analysis gateway that allows users to run data pipelines on UAB’s Cheaha supercomputer. New features take advantage of the fact that Cheaha is now certified as compliant with patient privacy laws, opening up rapid access to anonymized data on more than a million patients through UAB’s i2b2 database for hypothesis generation and other research questions. U-BRITE also includes these features:

  • access to anonymized data from thousands of participants in the Alabama Genomic Health Initiative;
  • UAB GitLab for version control of project code; and
  • built-in application programming interfaces (APIs) to programmatically pull clinical and research data from projects for analysis.

“We imagine we have three main types of U-BRITE users: investigators, data scientists and collaborators,” said Jake Chen, Ph.D., chief bioinformatics officer and associate director of the Informatics Institute.

What, exactly, might these groups use U-BRITE to do? Here are several use cases based on presentations at the Informatics Institute’s standing-room-only U-BRITE 2.0 forum Jan. 29.

1. You are already working on a big data project and need a place to bring it all together

“As a potential U-BRITE user, one of the first things you might think about doing is bringing in your siloed data — getting those big -omics flat files into our -omics repository and getting your source code out of your desktop and email and into version control with GitLab,” said Jelai Wang, an informatics architect with the Informatics Institute. “By joining U-BRITE, you can co-locate your data side by side with the biggest computational and storage resources here at UAB.”

U-BRITE manages more than 184,000 files and 39 analysis pipelines for 22 users. One 2.0 innovation, still in the prototype phase, is the DataLENS (Linking Exploration, Navigations and Search), which “can link together datasets to allow for flexible search” across them all, Chen said.

Radiation oncologist Christopher Willey, M.D., Ph.D., has been involved with the U-BRITE project since its beginning. His work with patient-derived xenografts (PDX) for glioblastoma contains comprehensive molecular and phenotypic information for more than 27 distinct tumors, and they have developed eight acquired radiation resistant PDX, the largest such panel in the United States. “We have a great deal of information,” Willey said. “Our goal is an integrated -omics platform to reveal common molecular pathways that we can one day use to associate a patient’s tumor with the best method of care.” The project takes advantage of many U-BRITE features, Willey said. “We use GitLab as our version control system, we store files in Box, we do team-based live interactions using Slack and Microsoft Teams.”

As part of U-BRITE 2.0, Willey’s former graduate student, Alex Dussaq, M.D., Ph.D., spearheaded an effort, with help from the Informatics Institute staff, to build a tool called the Xenoline Data Display, which keeps track of the myriad animal-model samples and human-matched tumor tissue that his lab has collected. Using U-BRITE’s new unified web services, research teams like Willey’s can programmatically pull clinical and research data from the platform’s databases using API (application programming interface) calls. That lets them develop interfaces to allow non-programmers to interact with the data “live” on Cheaha and generate new hypotheses.

U-BRITE takes on COVID-19


As U-BRITE 2.0 debuted at the beginning of 2020, researchers at UAB did not anticipate that it could be rapidly deployed to fight what has turned out to be the challenge of our time: stopping the spread of COVID-19. Since early March, the Informatics Institute has responded by creating a new COVID-19 biomedical data science web portal — — that gathers together large COVID-19 datasets (including anonymized data on 4,807 COVID-tested patients at UAB as of April 6, viral genome sequences, aggregated epidemiology data worldwide, patient CT-scan images and host genomic data from the Alabama Genome Health Initiative, among others); interactive data analysis tools; and virtual workspaces hosted on the U-BRITE computing platform for researchers to collaborate and practice “data-driven medicine,” said Jake Chen, Ph.D., chief bioinformatics officer and associate director of the Informatics Institute. The goals, Chen said, are to integrate a wide range of data, provide online training and accelerate knowledge discovery.


2. You want to identify cohorts to build new hypotheses for research

Cheaha received its HIPAA-compliant certification in December 2019, which means U-BRITE now can give any researcher access to a massive Clinical Data Repository in UAB’s i2b2 instance. Users can perform enterprisewide searches on de-identified clinical information from UAB’s Enterprise Data Warehouse, clinical registries for transplant and cancer, census data and information on study protocols from IRAP and OnCore.

Researchers also can search anonymized data on genetic variants and other clinical information from more than 6,000 participants in the Alabama Genomic Health Initiative. Four iterations of annotated AGHI data have been loaded into the i2b2 databases in U-BRITE’s Clinical Data Repository — or more than a billion facts from variant annotation, said Bruce Korf, M.D., Ph.D., UAB Chief Genomics Officer.

“If you have a gene variant of interest, you can query the dataset to see how many participants have that variant, for instance,” Korf said. “Then you can recruit these individuals to another study with IRB approval.”

Ninety-two percent of AGHI participants have consented to share data in the AGHI biobank, which now holds more than 18,600 plasma aliquots and 5,890 buffy coats (the portion of an anticoagulated blood sample containing primarily white blood cells and platelets).

The Clinical Data Repository includes data from:

  • 1.07 million patients
  • 22.4 million patient visits
    • encompassing
      • 796,000 concepts
      • 1.14 billion facts

To obtain i2b2 access, users must have up-to-date HIPAA and IRB Human Subjects Protection training. (Learn more and request i2b2 access here.) And with an approved protocol from UAB’s Institutional Review Board, researchers can obtain identifying information to contact patients for new studies.

3. You don’t want to reinvent the wheel

U-BRITE contains a directory of reusable data analysis pipelines, including microbiome analysis and single-cell RNA sequencing analysis. “All of the CCTS-developed pipelines are in there,” Wang said. “The whole idea is to reuse these tools so our investigators don’t have to reinvent the wheel with each new project.”

4. You want to save your hard drives

U-BRITE maintains a local repository hosting cached versions of massive public datasets that researchers can search instead of downloading their own copies. These include the National Center for Biotechnology Information’s ClinVar database on the clinical significance of genetic variants (100 gigabytes) and the NIH Human Microbiome Project (17 terabytes). Explore the currently available datasets:

  • Genotype-Tissue Expression (GTEx) V7
  • Genotype-Tissue Expression (GTEx) V6p
  • NIH NCBI ClinVar
  • Ensembl
  • Library of Integrated Network-based Cellular Signatures (LINCS) L1000
  • NIH NCBI HomoloGene
  • NIH Human Microbiome Project (HMP)