The t-test is a parametric test that assumes that the data is normally distributed and the variances of the two groups are equal. The t-test calculates the t-value by comparing the difference between the means of the two groups to the variation within the groups.
The resulting t-value is then compared to a t-distribution to determine the p-value, which is the probability of obtaining a t-value as extreme as the one observed, assuming that there is no significant difference between the means of the two groups. If the p-value is less than the chosen significance level (usually 0.05), then we reject the null hypothesis and conclude that there is a significant difference between the means of the two groups.
- t-test formula - A and B represent the two groups to compare - mA and mB represent the means of groups A and B, respectively - nA and nB represent the sizes of group A and B, respectively
S$^2$ is an estimator of the common variance of the two samples.
$$ S^2 = \frac{\sum{(x-m_A)^2}+\sum{(x-m_B)^2}}{n_A+n_B-2} $$Here’s an example of a two-sample t-test:
Python
import numpy as np
from scipy.stats import ttest_ind
# Generate two samples of data
sample1 = np.random.normal(10, 2, 100) # Mean of 10, Standard deviation of 2, 100 data points
sample2 = np.random.normal(12, 2, 100) # Mean of 12, Standard deviation of 2, 100 data points
# Perform t-test
t, p = ttest_ind(sample1, sample2)
# Output results
print("t = " + str(t))
print("p = " + str(p))
First, generating two random samples of data with different means and equal variances using NumPy’s random.normal()
function. We then use SciPy’s ttest_ind()
function to perform a two-sample t-test on the two samples. Finally, we print out the t-value and the p-value returned by the function. The t-value indicates the difference between the means of the two samples in terms of standard error units, while the p-value indicates the probability of observing the difference (or a more extreme one) if the null hypothesis (that the means are equal) is true.
R
# Load necessary libraries
library(stats)
# Generate two samples of data
set.seed(123) # for reproducibility
sample1 <- rnorm(100, mean = 10, sd = 2) # Mean of 10, Standard deviation of 2, 100 data points
sample2 <- rnorm(100, mean = 12, sd = 2) # Mean of 12, Standard deviation of 2, 100 data points
# Perform t-test
result <- t.test(sample1, sample2)
# Output results
cat("t = ", result$statistic, "\n")
cat("p = ", result$p.value, "\n")
Note that in R, we use rnorm()
function to generate normal distributed random variables instead of np.random.normal()
in Python.