Homework 3
Please create a github called TRGN510_Assignment3.
It should contain 3 scripts called circumference.py
, area_code.py
, and gene_names.py
. Please submit the github repository as a URL to the blackboard homework.
- Create a basic python script called
circumference.py
that assignspi
to3.14159
and prints the circumference of a circle given a second variableradius
with the initial value of 3. In this example, the radius should be assigned within the script. The output should printThe circumference of a circle of radius 3 is ???
where???
is the answer following 2*pi*r - Create a script
area_code.py
which reads a file of phone-numbers such(602)-232-2322
and prints out the area codes (such as602
) for each line of the file. - Create a script called
gtf2json.py
. For this script, you’ll need to have access to a dataset and I would like you to put the dataset in a different directory. First, download the file,Homo_sapiens.GRCh37.75.gtf.gz
, from http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens usingwget
, and place this file within a directory calleddata
within your home directory. Unzip this file with gunzip,gunzip Homo_sapiens.GRCh37.75.gtf.gz
The commandhead ~/data/Homo_sapiens.GRCh37.75.gtf
should give the start of the file (#!genome-build GRCh37.p13
). Next, create a python script calledgtf2json.py
that takes a gtf file (which you downloaded one of them) as an argument, and spits out the gene_name, the chromosome (the first column), the starting position (the fourth column), and the ending position (the fifth column) for only those columns where the third column is “gene”. Columns within the file are tab-delimited. The result should be JSON format:./gtf2json.py ~/data/Homo_sapiens.GRCh37.75.gtf [ {"geneName":"OR4G4P","chr":"1","startPos":52473, "endPos":54936}, ... ]
You do not need to have the GTF within your repository. Only the link should be in your README.