t-Test

The t-test is a parametric test that assumes that the data is normally distributed and the variances of the two groups are equal. The t-test calculates the t-value by comparing the difference between the means of the two groups to the variation within the groups.

The resulting t-value is then compared to a t-distribution to determine the p-value, which is the probability of obtaining a t-value as extreme as the one observed, assuming that there is no significant difference between the means of the two groups. If the p-value is less than the chosen significance level (usually 0.05), then we reject the null hypothesis and conclude that there is a significant difference between the means of the two groups.

- t-test formula
  - A and B represent the two groups to compare
  - mA and mB represent the means of groups A and B, respectively
  - nA and nB represent the sizes of group A and B, respectively

$$ t = \frac{m_A - m_B}{\sqrt{ \frac{S^2}{n_A} + \frac{S^2}{n_B} }} $$

S$^2$ is an estimator of the common variance of the two samples.

$$ S^2 = \frac{\sum{(x-m_A)^2}+\sum{(x-m_B)^2}}{n_A+n_B-2} $$

Here’s an example of a two-sample t-test:

Python

import numpy as np
from scipy.stats import ttest_ind

# Generate two samples of data
sample1 = np.random.normal(10, 2, 100) # Mean of 10, Standard deviation of 2, 100 data points
sample2 = np.random.normal(12, 2, 100) # Mean of 12, Standard deviation of 2, 100 data points

# Perform t-test
t, p = ttest_ind(sample1, sample2)

# Output results
print("t = " + str(t))
print("p = " + str(p))

First, generating two random samples of data with different means and equal variances using NumPy’s random.normal() function. We then use SciPy’s ttest_ind() function to perform a two-sample t-test on the two samples. Finally, we print out the t-value and the p-value returned by the function. The t-value indicates the difference between the means of the two samples in terms of standard error units, while the p-value indicates the probability of observing the difference (or a more extreme one) if the null hypothesis (that the means are equal) is true.

# Load necessary libraries
library(stats)

# Generate two samples of data
set.seed(123) # for reproducibility
sample1 <- rnorm(100, mean = 10, sd = 2) # Mean of 10, Standard deviation of 2, 100 data points
sample2 <- rnorm(100, mean = 12, sd = 2) # Mean of 12, Standard deviation of 2, 100 data points

# Perform t-test
result <- t.test(sample1, sample2)

# Output results
cat("t = ", result$statistic, "\n")
cat("p = ", result$p.value, "\n")

Note that in R, we use rnorm() function to generate normal distributed random variables instead of np.random.normal() in Python.

Feedback