9.2. Hypothesis Tests¶

9.2.1. Zero-Correlation Tests¶

This is the same as independence test.

Correlation tests scipy.stats.pearsonr() scipy.stats.spearmanr() scipy.stats.kendalltau()

import scipy.stats as ss
import numpy as np
nobs = 100
x1 = np.random.random(nobs)
x2 = np.random.random(nobs)
x3 = np.random.random(nobs)
y1 = x1 + x2
y2 = x1 + x3

ss.pearsonr(y1, y2)
ss.spearmanr(x1, x2)
ss.kendalltau(y1, x2)

KendalltauResult(correlation=0.5204040404040405, pvalue=1.698257675229554e-14)

ss.pearsonr(x1, x2)

(0.028212121155297663, 0.7805290104114769)

9.2.2. Association Test for Categorical Variables¶

Pearson’s Chi-squared test scipy.stats.chi2_contingency()

Fisher’s exact tests scipy.stats.fisher_exact()

Let’s create categorical variables by cutting continuous variables.

import pandas as pd # for pd.cut
## y's are dependent
ybins = [0, .5, 1, 1.5, 2]
labels = ['A', 'B', 'C', 'D']
y1b = pd.cut(y1, bins = ybins, labels = labels)
y2b = pd.cut(y2, bins = ybins, labels = labels)

We know y1 and y2 are dependent. Let’s test it.

ytab = pd.crosstab(y1b, y2b)
chi2, p, dof, ex = ss.chi2_contingency(ytab)
print(p)

1.966114999636321e-06

We know x1 and x2 are independent. Let’s test it by dichotomizing them.

## x's are independent
xbins = [0, 0.5, 1]
x1b = pd.cut(x1, bins = xbins, labels = ['A', 'B'])
x2b = pd.cut(x2, bins = xbins, labels = ['A', 'B'])
xtab = pd.crosstab(x1b, x2b)
chi2, p, dof, ex = ss.chi2_contingency(xtab)

Since this is a \(2\times 2\) table, we can use Fisher’s exact test.

## only for 2x2 tables
oddsratio, p = ss.fisher_exact(xtab)
print(p)

0.6874678860903651

Introduction to Data Science, Spring 2022

Hypothesis Tests

Contents

9.2. Hypothesis Tests¶

9.2.1. Zero-Correlation Tests¶

9.2.2. Association Test for Categorical Variables¶