Q&A 10 What is a confidence interval and how do you calculate it?

10.1 Explanation

A confidence interval provides a range of values that is likely to contain the population parameter. Typically, a 95% confidence interval is used to estimate the mean. ## Python Code

import pandas as pd
from scipy import stats
import numpy as np

# Load sample data
df = pd.read_csv("data/iris.csv")
data = df["sepal_length"]

# Compute 95% confidence interval
mean = np.mean(data)
sem = stats.sem(data)
conf_int = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem)

print(f"95% Confidence Interval: {conf_int}")
95% Confidence Interval: (np.float64(5.709732481507366), np.float64(5.976934185159301))

10.2 R Code

library(readr)

# Load sample data
df <- read_csv("data/iris.csv")
data <- df$sepal_length

# 95% confidence interval
t.test(data)$conf.int
[1] 5.709732 5.976934
attr(,"conf.level")
[1] 0.95