Unix, Linux, and Command-line For Bioinformaticians

Let's start broadly and understand the command-line from the perspective of Microsoft Windows - the operating system behind the majority of desktop computers.

UNIX is an operating system around in the 60s and broadly refers to a set of programs that all work the same way at the command-line. They have the same feel. They have the same philosophy of design. Ok, it's a specific operating system owned by AT&T, however, these days it refers to a program that all follow a common framework. There are many types of Unix – MacOSX, Linux, and Solaris where each of those is essentially different sets of codes owned by different companies or groups to get the common Unix common framework. MacOSX is owned and developed by Apple. Solaris is owned by Sun and Oracle. Linux is open-source and built from a community-led by Linus Torvalds, and was meant to work on x86 PCs. The x86 refers to a type of CPU architecture used across most personal computers today (both Mac and PC). If I log into a Unix machine in 1980, 1990, 2000, 2010, 2017 – it will often feel and work the same. By comparison, Windows is not Unix. Even if you go to the command-line everything is different and changes over the years.

Windows over 20 years

Windows has generally seen many different Graphical User Interfaces (or GUI’s) over the years. Generally, the advent of Windows and key innovation was to remove the need for shell level computing. However a lot of scripting and ability to manipulate, wrangle, edit, and work with text was lost. Bioinformatics tends to need these tools, and these are generally found in the various flavors ‘nix. There are ways to get to a ‘nix environment within Windows such as through Cygwin or other VM devices but we generally don’t discuss that in this material.

Command-line shells

Command-line shells are started up from a terminal program. Every Mac computer has Terminal preloaded. Start that up and you'll see a prompt from the shell. The shell is actually a program that responds to you, and you can change its look and feel. Most people like the shell that is called bash. With Catalina, there is a recommendation of using zsh. However, 15 years ago, tcsh was more common. There are others like c-shell (csh) and ksh.

Bash is pretty handy in that things like up-arrow takes you the previous command and you can press 'tab' to autocomplete. Now an important thing is that when bash starts .bash_profile is executed for login shells, while .bashrc is executed for interactive non-login shells. We can store a lot of settings here. Settings for the shell are also called environmental variables. You can see some examples such as typing echo $HOME' where echo simply prints the variable. $PATH is really important because any program normally has had the full name or path to run it. However, those in the path don't. For example, let's say the program 'ls' is stored in /usr/bin/ls. To run it, you'd have to type /usr/bin/ls. However if add /usr/local/bin to our path, then we only have to type ls. We can have a lot of things in our path, and then separate them by colons. If you like, type echo $PATH to see what's in your current path. All of your favorite startup settings are in your .bashrc file. In some cases, a default setup is only calling up .bash_profile. Here, people usually have only one command in it - that is to source .bashrc.

Directories are something we've touched on, but its important to know that every file is within a directory. In Unix, these are separated by "/".. If we cd to the top level it would be cd /. The character ~ has a special meaning and it means the home directory. Typing cd ~ changes directory to our user's home. you can do that followed by pwd to figure out your home directory.

There are some conventions. The bin the directory is typically where you put executable programs. So the first good thing to do is to create a bin in your home directory. These are often called local executable files. To actually make the bin meaningful, you'd have to add it to your path, such as set PATH=$PATH:~/bin, would add that local bin to your existing path. There are going to some programs installed as superuser or root (who can read and write anywhere. These are typically in /usr/bin. The directory /etc is where settings are - and don't go here unless you know what you are doing. Don't worry - you shouldn't be able to do anything without becoming superuser.

There are many different resources for learning command line on Linux/Unix based systems. Typically, a user may need to know 20 to 40 commands, with cd, ls, less being common. We have provided a 1 cheat sheet below and we link to some provided by others. All of the commands have lots of options, and one can learn about them by typing 'man command', ( 'man grep' for example). However, most people just google Linux command options. It is important to know that there are thousands of Linux commands, but most people only remember a small subset that is specific to their field.

https://learncodethehardway.org/unix/bash_cheat_sheet.pdf

https://files.fosswire.com/2007/08/fwunixref.pdf

https://www.cheatography.com/davechild/cheat-sheets/linux-command-line/

https://www.tjhsst.edu/~dhyatt/superap/unixcmd.html

One of the most important parts of Unix is pipe and redirect. Piping is done using the | symbol, and sends the output from what's left of the pipe, to what's right of the pipe. Redirect > puts the output of the left into the file in the right.

An example of a pipe that takes our history and pipes it to grep which only prints lines that match.

history | grep ssh

An example of redirecting that output to a file

history | grep ssh > myhistory.txt

A few important examples. First to go up a directory, and then list the contents including permissions.

cd ..

ls -l

Change to a directory in the users home directory

mkdir mydir

cd ~/mydir

Permissions are an important early concept. A simple explanation is that files can be readable(+4), writable(+2), and executable (+1) to yourself, your group, the world (that can login), and in that order. If something is read-only, it's 4. If something is read and writable, its 6. If something is readable and executable its 5. A script needs to be executed, and thus it's possible to make a script executable using the chmod command, with 7 for the user, 5 for the group, and 5 for the world.

chmod 755 myscript.sh

Concept: File permissions

You are user.  A linux computer expects multiple users and they form the world of users. You can be assigned to a group or groups.  Other users on the computer may be in your group, and there are some groups you don't belong to.

Level 1: You the user, and your permissions.  Perhaps your username is john_doe.  If you want to be able to view a file, give yourself a point (+4).  If you want to be able to write to a file or change a file, give yourself four points (+2).  If you want to be run a file - such as a script as a program, give yourself a point.  Want to do all 3?  3+2+1=7.

Level 2: Groups.  Perhaps you belong to bioinformaticians.  If Jane_doe is also a bioinformatician.  Do you want Jane to be  execute a file?  Well she technically must be able to read it.  So she needs 4 + 1 points = 5 points

Level 3:  World.  What about others on the computer?  What if you don't want others to be able to read or execute - they get 0 points.

So to set permissions for User, Group, World - that would be 750

chmod is the command, chmod 700 myfile.txt would do it!

Concept: servers

When you get on the network, you assigned an internet address. Most often we think IP4, and its a combo of 4 numbers that range from 1 to 255. There is a newer address system. For example, I can look at my internet address from my home computer. Typically, we just grab an address for the moment - almost like staying at a hotel. We can also have a fixed address - also called static. There are only so many numbers, so a lot of places that have people log-in and out like to give out the numbers that are being unused.

Numbers are hard to remember, so the internet became the internet standards in registering and looking up names came out. The map from IP addresses to names is provided by domain name servers. Basically these place your computer to find out what "www.cnn.com"'s IP address is. As we can see in the news if you take these out the internet grinds to a halt. People have been taking these out by having a bunch of zombie machines makes lots of requests for them.

A server is just a computer you can login into using a protocol. The best way and the most common way we do this is by using a secure shell protocol or ssh. Thus I can login to a remote computer by opening the command-line shell,

ssh myname@mycomputer.com

A list of Unix commands that are helpful:

Exercise: Login to a server

ssh yourname@aserveraddress.com

This should put you into a server and you will be asked for a password.  Type a few unix commands to verify.  For example: whoami should match yourname.  Other commands to try:

ls

pwd

date

last

The final command in this list, last is important because it tells you who else is logged into your computer at the same time.  Another effective way to see what the computer is currently doing is top.

Now before we leave, lets change your passwd

Exercise: Learning how to edit text with vim

VIM Editor Commands

Vim is an editor to create or edit a text file. There are two modes in vim. One is the command mode and another is the insert mode. In the command mode, the user can move around the file, delete text, etc, whereas in the insert mode, the user can insert text.

From command mode to insert mode type a/A/i/I/o/O ( see details below)

From insert mode to command mode type Esc (escape key)

Text Entry Commands (Used to start text entry)

  • a Append text following the current cursor position
  • A Append text to the end of the current line
  • i Insert text before the current cursor position
  • I Insert text at the beginning of the cursor line
  • o Open up a new line following the current line and add text there
  • O Open up a new line in front of the current line and add text there

The following commands are used only in the commands mode.

  • ^F (CTRl F) Forward screenful
  • ^B Backward screenful
  • ^f One page forward
  • ^b One page backward
  • ^U Up half screenful
  • ^D Down half screenful
  • $ Move the cursor to the end of the current line
  • 0 (zero) Move the cursor to the beginning of the current line
  • w Forward one word
  • b Backward one word

Exit Commands

  • :wq Write file to disk and quit the editor
  • :q! Quit (no warning)
  • :q Quit (a warning is printed if a modified file has not been saved)
  • ZZ Save workspace and quit the editor (same as :wq)

Text Deletion Commands

  • x Delete character
  • dw Delete word from cursor on
  • db Delete word backward
  • dd Delete line
  • d$ Delete to end of line
  • d^ (d caret, not CTRL d) Delete to beginning of line

Yank (has most of the options of delete)-- VI's copy command

  • yy yank current line
  • y$ yank to end of the current line from cursor
  • yw yank from the cursor to end of the current word
  • 5y yank, for example, 5 lines

Paste (used after delete or yank to recover lines.)

  • p paste below cursor
  • P paste above cursor
  • u Undo last change
  • U Restore line
  • J Join next line down to the end of the current line

File Manipulation Commands

  • :w Write workspace to original file
  • :w file Write workspace to named file
  • . Repeat last command
  • r Replace one character at the cursor position
  • R Begin overstrike or replace mode � use ESC key to exit
  • :g/pat1/s//pat2/g replace every occurrence of pattern1 (pat1) with pat2

Examples

Opening a New File

  • Step 1 type vim filename (create a file named filename)
  • Step 2 type i ( switch to insert mode)
  • Step 3 enter text (enter your Ada program)
  • Step 4 hit Esc key (switch back to command mode)
  • Step 5 type :wq (write file and exit vim)

Exercise 2:  Install .bashrc and .bash_profile using vim

Open your terminal. open .bash_profile in vi/vim. If you don't see this or something similar where we source .bashrc, you should insert and save your file. Remember the '.' hides it from simple ls.