The Goodness-of-Fit (GOF) test is a statistical technique used to assess whether a sample of data comes from a specific probability distribution or not. In simpler terms, it is a way to check if the data you have collected fits a certain pattern that you expected or assumed.
For example, let’s say we want quality control at a candy factory, and want to make sure that the distribution of colors in a bag of candies matches the expected distribution. We expect the candies to be equally distributed among four colors: red, green, blue, and yellow. However, when we sample a bag of candies, you find that there are 60 red, 30 green, 5 blue, and 5 yellow candies.
To check whether this distribution fits our expected distribution, we can perform a GOF test. We start by calculating the expected frequency of each color if the candies were evenly distributed:
Expected frequency = Total number of candies / Number of colors
Expected frequency = (60 + 30 + 5 + 5) / 4 = 25
Next, we can use a statistical test, such as the chi-squared test, to calculate a p-value that indicates the probability of observing a distribution as different from the expected one, given that the expected distribution is true. In this case, if we perform the GOF test and find a p-value less than 0.05, we can conclude that the distribution of colors in the bag of candies is significantly different from what you expected.
To perform a GOF test in Python on this candy example, we can use the SciPy library, which contains statistical functions for hypothesis testing, including the chi-square test for GOF. We first define the observed values as the frequency of each color in the bag of candies. We also define the expected values as an equal distribution among the four colors. Next, we perform the chi-squared GOF test using the chisquare
function from the scipy.stats
module, which takes as input the observed values and the expected values. This function returns the chi-squared statistic and the associated p-value. Finally, we print the results and check the p-value to determine whether the observed distribution of colors fits the expected distribution. In this case, since the p-value is less than 0.05, we can conclude that the distribution of colors in the bag of candies is significantly different from what we expected.
Python code for GOF test:
import numpy as np
from scipy.stats import chisquare
# Define the observed values
observed_values = np.array([60, 30, 5, 5])
# Define the expected values (assuming an equal distribution)
expected_values = np.array([25, 25, 25, 25])
# Perform the chi-squared GOF test
statistic, p_value = chisquare(observed_values, f_exp=expected_values)
# Print the results
print("Chi-squared statistic:", statistic)
print("P-value:", p_value)
if p_value < 0.05:
print("The distribution does not fit the expected distribution.")
else:
print("The distribution fits the expected distribution.")