Q&A 2 How do you test if the mean of two groups is significantly different?
2.1 Explanation
You can use a t-test to compare the means of two independent groups. This test checks whether the difference in means is statistically significant.
2.2 Python Code
import pandas as pd
from scipy.stats import ttest_ind
# Load sample data
df = pd.read_csv("data/iris.csv")
# t-test between two species
group1 = df[df['species'] == 'setosa']['sepal_length']
group2 = df[df['species'] == 'versicolor']['sepal_length']
t_stat, p_val = ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_val}")
T-statistic: -10.52098626754911, P-value: 8.985235037487079e-18
2.3 R Code
library(readr)
library(dplyr)
# Load data
df <- read_csv("data/iris.csv")
# t-test between two species
setosa <- df %>% filter(species == "setosa") %>% pull(sepal_length)
versicolor <- df %>% filter(species == "versicolor") %>% pull(sepal_length)
t.test(setosa, versicolor)
Welch Two Sample t-test
data: setosa and versicolor
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.1057074 -0.7542926
sample estimates:
mean of x mean of y
5.006 5.936