Helpers:

PANDAS is really very different then scripting with Python. It’s about using columns and rows, and rarely to never using loops.

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Series

1-dimensional labeled array – can hold any data type.

s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])

DataFrame

2-Dim labeled data structure with columns and rows of potentially different types.

 

df=pd.DataFrame(
    [
        {'ExpressionDay1':5.3,'ExpressionDay2':2.3},
        {'ExpressionDay1':51.3,'ExpressionDay2':2.3},
        {'ExpressionDay1':3.3,'ExpressionDay2':0},
        {'ExpressionDay1':1.3}
    ],
    index=['KRAS','PTEN','APOE',"MTFMT"])

Importing

import numpy as np
import pandas as pd

I/O

In

cancer_county = pd.read_csv("cancer_county.csv")
med_count = pd.read_csv("med_county.csv")

Out

df.to_csv('myDataFrame.csv')

Subsetting

df.iloc[:,[0]]
df.iat([0],[0])
df.loc[[0], ['ExpressionDay1']]
df.loc[df['ExpressionDay1'] > 5, ['ExpressionDay2']]

Assign

df['ExpressionDay3']= df.ExpressionDay1+df.ExpressionDay2

Grouping

Joins

Functions

len(df)
sum()
count() # Count non-NA/null values of each object.
median()
quantile([0.25,0.75])
apply(function)
min()
max()
mean()
var()
std()