Hypothesis Tests
Contents
9.2. Hypothesis Tests¶
9.2.1. Zero-Correlation Tests¶
This is the same as independence test.
Correlation tests
scipy.stats.pearsonr()
scipy.stats.spearmanr()
scipy.stats.kendalltau()
import scipy.stats as ss
import numpy as np
nobs = 100
x1 = np.random.random(nobs)
x2 = np.random.random(nobs)
x3 = np.random.random(nobs)
y1 = x1 + x2
y2 = x1 + x3
ss.pearsonr(y1, y2)
ss.spearmanr(x1, x2)
ss.kendalltau(y1, x2)
KendalltauResult(correlation=0.5204040404040405, pvalue=1.698257675229554e-14)
ss.pearsonr(x1, x2)
(0.028212121155297663, 0.7805290104114769)
9.2.2. Association Test for Categorical Variables¶
Pearson’s Chi-squared test
scipy.stats.chi2_contingency()
Fisher’s exact tests
scipy.stats.fisher_exact()
Let’s create categorical variables by cutting continuous variables.
import pandas as pd # for pd.cut
## y's are dependent
ybins = [0, .5, 1, 1.5, 2]
labels = ['A', 'B', 'C', 'D']
y1b = pd.cut(y1, bins = ybins, labels = labels)
y2b = pd.cut(y2, bins = ybins, labels = labels)
We know y1
and y2
are dependent. Let’s test it.
ytab = pd.crosstab(y1b, y2b)
chi2, p, dof, ex = ss.chi2_contingency(ytab)
print(p)
1.966114999636321e-06
We know x1
and x2
are independent. Let’s test it by dichotomizing
them.
## x's are independent
xbins = [0, 0.5, 1]
x1b = pd.cut(x1, bins = xbins, labels = ['A', 'B'])
x2b = pd.cut(x2, bins = xbins, labels = ['A', 'B'])
xtab = pd.crosstab(x1b, x2b)
chi2, p, dof, ex = ss.chi2_contingency(xtab)
Since this is a \(2\times 2\) table, we can use Fisher’s exact test.
## only for 2x2 tables
oddsratio, p = ss.fisher_exact(xtab)
print(p)
0.6874678860903651