Data Science and Bioinformatics

David W. Craig, Ph.D
Professor of Translational Genomics
Co-Director, Institute for Translational Genomics

“By developing new tools for integrative analysis of genomic, epigenetic, proteomic, and clinical data and assimilating them into the clinic, we can enhance treatment decisions and make a real impact on the future of medicine.”

David W. Craig, Ph.D.
Co-Director, Institute of Translational Genomics

Empowering through genomics and bioinformatics

Advancing precision medicine means building and creating the informatics systems to gather, integrate, and analyze patient data at massive scales, across multiple dimensions and time points in decision making, for clinical value and utility. The scale of data our scientists sift through on each patient is massive and requires integration of bioinformatics, statistics, genetics, epidemiology, clinical medicine and public and global health reports. Working with David W. Craig, Enrique I. Velazquez-Villarreal, M.D., Ph.D., M.P.H., M.S., is integrating clinical and genomic data from a variety of technologies to assemble a more complete reference library in hopes of developing machine-learning tools that more rapidly help physicians access the critical decision-making datapoints they need.

Master of Science in Translational Biomedical Informatics

This program will take scientists’ knowledge of bioinformatics to the next level, enabling them to analyze, apply, and integrate the latest data tools in the laboratory. They will be able to extract information to better understand biomedical problems, and to design experiments to address those problems. In preparation for a spectrum of careers that span from research to the clinic, students will understand their critical role in working with data under a bevy of regulatory bodies. Graduates of the two-year program will be well suited to work as applied bioinformaticians within academic and clinical research laboratories, pharmaceutical companies, and biotechnology companies.

computers

High Performance Computing. USC's Institute of Translational Genomics is built as a hybrid with both dedicated local computing and fast connectivity to access on demand cloud computing both through Amazon and Google Cloud Services. A High-Performance Computing cluster is located at the USC ITS colocation facility consisting of approximately 800 cores on 50 nodes, with 600 Terabytes of storage, backed up through a automated pipeline for 3 year storage at Amazon Glacial.

Picture1

USC's Institute of Translational Genomics has end-to-end integration of the specimens and analysis sequencing genomcs. The system allows complete transparency of quality control metrics to informaticians, lab technicians, and collaborators. In practice, the bioinformatics pipeline is monitored and maintained through both queuing systems and through a NoSQL database layer that extracts key metrics (such as reads, Q30’s, etc.), and joins them with laboratory variables (such as operator, concentrations).  Additionally, access to raw data is available via real-time allowing all sites full access to data in a structured secure framework.