PANDAS
Helpers:
PANDAS is really very different then scripting with Python. It’s about using columns and rows, and rarely to never using loops.
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Series
1-dimensional labeled array – can hold any data type.
s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
DataFrame
2-Dim labeled data structure with columns and rows of potentially different types.
df=pd.DataFrame( [ {'ExpressionDay1':5.3,'ExpressionDay2':2.3}, {'ExpressionDay1':51.3,'ExpressionDay2':2.3}, {'ExpressionDay1':3.3,'ExpressionDay2':0}, {'ExpressionDay1':1.3} ], index=['KRAS','PTEN','APOE',"MTFMT"])
Importing
import numpy as np import pandas as pd
I/O
In
cancer_county = pd.read_csv("cancer_county.csv") med_count = pd.read_csv("med_county.csv")
Out
df.to_csv('myDataFrame.csv')
Subsetting
df.iloc[:,[0]] df.iat([0],[0]) df.loc[[0], ['ExpressionDay1']] df.loc[df['ExpressionDay1'] > 5, ['ExpressionDay2']]
Assign
df['ExpressionDay3']= df.ExpressionDay1+df.ExpressionDay2
Grouping
Joins
Functions
len(df) sum() count() # Count non-NA/null values of each object. median() quantile([0.25,0.75]) apply(function) min() max() mean() var() std()