Basic Foundations in Translational Biomedical Informatics

Overview

The objective is to introduce individuals with strong backgrounds in biological and/or medicine with the foundations, basic principles, and core concepts in scripting and computing that are necessary in biomedical informatics. The abbreviated program will introduce concepts in data-science, command-line, scripting through Python and R. Upon successful completion of this abbreviated program, individuals will be able to: interface with the command line; create basic scripts in Python and R; and understand project management principles with strong underpinnings in data science. These introductory skills will serve as a foundational framework for individuals to build on in future courses or programs in biomedical data science.

Joining Data

Venn

Analyzing & Interpreting Data

Wordcloud

Visualizing Data

Graph Builder

Our Learning Environment: Jupyter Labs and Notebooks

These modules do not require you have a linux/unix based computer or have certain programs installed. All learning is done within browsers that run an environment where you can run Python, R, Command-line, and others: Jupyter Labs and Notebooks.

JupyterLab and Notebooks

JupyterLab and Jupyter Notebooks are a graphic interface for developing and writing Python code – and can be used for many other languages such as R, scala etc. A notebook usually means a single analysis or collection of scripts/code.  The JupyterLab is a place for storing and managing many Notebooks.

Using Browser-based Jupyter Notebook

You are welcome to install Jupyter on your computer, but we will generally just presume the Browser based version, which is here.

More Advanced Jupyter Lab!

Jupyter Lab provides more functionality and eventually is where you want to go.  In many parts of the tutorial, we just use the notebook. However, if you want full functionality and saving, you do want to move to installing locally Jupyter Lab. There is a version that can be run within your browser.

https://jupyter.org/try

A snapshot is shown below.  You can see in this view, you can see all files, you can edit and create what are called markdowns in the middle frame, and you have rendered markdowns in the right frame.  We have discussed markdowns before, but basically they are a way of putting code and text together in a way that looks like a web page.

 

The code is run in cells by pressing the forward arrow after clicking on the cell.

You can see there are a lot of different types of files that can be created, and we discuss these in other modules. For this exercise, we largely just focus on the notebook

Markdowns

Markdown is is a lightweight markup language that is plain text formatted, but is rendered to look like a web-page.  It has a way of creating code-blocks.  For example on the left we see Markdown. On the right we see the rendered view that could be published

Code blocks are magical cells – creating interactive computing.

Now those code blocks are a bit magical in Jupyter lab as you’ll see.  You’ll be edit and run them, seeing the results in a live interactive view. These types of markdowns that expect to be run and create a reproducible analysis are actually notebooks, and in Juptyer labs they end in .pynb. Notice that I can actually hit play or ‘run’ and see code by analyzed.

We only scratch the surface.

There is quite a bit to learn, and to make the materials accessible, we only touch the surface. I encourage you to learn more about Jupyter Labs, Notebooks, and so forth. Its an amazing data science vehicle – and also a great learning tool.

Resources to Learn Outside of these Modules

Learning Flavors of Unix and Command-line

UNIX is an operating system around in the 60s and broadly refers to a set of programs that all work the same way at the command-line.  They have the same feel. They have the same philosophy of design.  Ok, its a specific operating system owned by AT&T, however, these days it refers to a program that all follow a common framework. There are many types of Unix – MacOSX, Linux, and Solaris where each of those is essentially different sets of codes owned by different companies or groups to get the common Unix common framework.  MacOSX is owned and developed by Apple.  Solaris is owned by Sun and Oracle.  Linux is open-source and built from a community led by Linus Torvalds, and was meant to work on x86 PCs.  The x86 refers to a type of CPU architecture used across most personal computers today (both Mac and PC).   If I log into a Unix machine in 1980, 1990, 2000, 2010, 2017 – it will often feel and work the same.

R

R is a scripting language enabled and expanded by RStudio. There is the base environment and the R-Studio environment which enables a graphical user interface (GUI) for editing and building reproducible analysis.