Q&A 11 How do you use a chi-squared test to determine if two categorical variables are independent?

11.1 Explanation

This question demonstrates how to perform a chi-squared test of independence to assess the relationship between two categorical variables. The chi-squared statistic tests whether distributions of categorical variables differ from each other, based on a contingency table.

11.2 Python Code

import pandas as pd
import scipy.stats as stats

# Sample contingency table
data = pd.DataFrame({
    "A": [20, 15],
    "B": [30, 35]
}, index=["Yes", "No"])

# Chi-squared test
chi2, p, dof, expected = stats.chi2_contingency(data)
print(f"Chi2: {chi2:.2f}, p-value: {p:.4f}")
Chi2: 0.70, p-value: 0.4017

11.3 R Code

# Create a contingency table
data <- matrix(c(20, 30, 15, 35), nrow = 2, byrow = TRUE)
colnames(data) <- c("A", "B")
rownames(data) <- c("Yes", "No")

# Perform chi-squared test
test <- chisq.test(data)
test

    Pearson's Chi-squared test with Yates' continuity correction

data:  data
X-squared = 0.7033, df = 1, p-value = 0.4017