Example Of Chi Square Test For Goodness Of Fit

Article with TOC
Author's profile picture

penangjazz

Nov 27, 2025 · 11 min read

Example Of Chi Square Test For Goodness Of Fit
Example Of Chi Square Test For Goodness Of Fit

Table of Contents

    The Chi-Square Goodness-of-Fit test is a powerful statistical tool used to determine whether observed sample data fits a hypothesized distribution. It's particularly useful when dealing with categorical data, allowing us to assess if the proportions of different categories in a sample match a pre-defined expectation. This test relies on comparing the observed frequencies of each category with the frequencies we would expect if the null hypothesis (the hypothesized distribution) were true. A significant difference between these observed and expected frequencies suggests that the sample data does not support the null hypothesis.

    Understanding the Core Concepts

    Before diving into examples, let's solidify our understanding of the key components of the Chi-Square Goodness-of-Fit test:

    • Null Hypothesis (H0): This hypothesis states that there is no significant difference between the observed frequencies and the expected frequencies. In other words, the sample data fits the hypothesized distribution.

    • Alternative Hypothesis (H1): This hypothesis states that there is a significant difference between the observed frequencies and the expected frequencies. The sample data does not fit the hypothesized distribution.

    • Observed Frequencies (O): These are the actual counts of observations in each category from the sample data.

    • Expected Frequencies (E): These are the counts we expect to see in each category if the null hypothesis were true. They are calculated based on the hypothesized distribution and the total sample size.

    • Chi-Square Statistic (χ2): This statistic measures the discrepancy between the observed and expected frequencies. It's calculated using the formula:

      χ2 = Σ [(O - E)2 / E]

      where Σ represents the sum across all categories.

    • Degrees of Freedom (df): This value reflects the number of categories that are free to vary. It's calculated as:

      df = (number of categories) - (number of estimated parameters) - 1

      In the simplest cases where we aren't estimating parameters from the data, df = (number of categories) - 1.

    • Significance Level (α): This is the probability of rejecting the null hypothesis when it is actually true (Type I error). Common significance levels are 0.05 (5%) and 0.01 (1%).

    • P-value: This is the probability of obtaining a chi-square statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true. A small p-value (typically less than the significance level) provides evidence against the null hypothesis.

    Example 1: Testing a Fair Die

    Scenario: You suspect that a six-sided die is not fair. To test this, you roll the die 60 times and observe the following frequencies:

    • 1: 8 times
    • 2: 11 times
    • 3: 9 times
    • 4: 14 times
    • 5: 9 times
    • 6: 9 times

    Hypotheses:

    • H0: The die is fair (each number has an equal probability of being rolled).
    • H1: The die is not fair (the probabilities of rolling each number are not equal).

    Calculations:

    1. Expected Frequencies: If the die is fair, we would expect each number to appear approximately 60/6 = 10 times. So, E = 10 for each category.

    2. Chi-Square Statistic: We calculate the contribution to the chi-square statistic for each category:

      • (8 - 10)2 / 10 = 0.4
      • (11 - 10)2 / 10 = 0.1
      • (9 - 10)2 / 10 = 0.1
      • (14 - 10)2 / 10 = 1.6
      • (9 - 10)2 / 10 = 0.1
      • (9 - 10)2 / 10 = 0.1

      Summing these values, we get χ2 = 0.4 + 0.1 + 0.1 + 1.6 + 0.1 + 0.1 = 2.4

    3. Degrees of Freedom: df = (number of categories) - 1 = 6 - 1 = 5

    4. P-value: Using a chi-square distribution table or a statistical calculator with χ2 = 2.4 and df = 5, we find a p-value of approximately 0.79.

    Conclusion:

    Since the p-value (0.79) is greater than a typical significance level of 0.05, we fail to reject the null hypothesis. There is not enough evidence to conclude that the die is unfair. The observed frequencies are reasonably consistent with what we would expect from a fair die.

    Example 2: Mendel's Pea Experiment

    Scenario: Gregor Mendel, the father of genetics, conducted experiments with pea plants. In one experiment, he crossed pea plants with round, yellow seeds with plants with wrinkled, green seeds. He predicted that the F2 generation would have the following phenotypic ratio:

    • Round, Yellow: 9/16
    • Round, Green: 3/16
    • Wrinkled, Yellow: 3/16
    • Wrinkled, Green: 1/16

    Mendel observed the following counts in the F2 generation (let's assume these are hypothetical observed counts for this example):

    • Round, Yellow: 315
    • Round, Green: 108
    • Wrinkled, Yellow: 101
    • Wrinkled, Green: 32

    Hypotheses:

    • H0: The observed phenotypic ratios match Mendel's predicted ratios.
    • H1: The observed phenotypic ratios do not match Mendel's predicted ratios.

    Calculations:

    1. Total Number of Peas: 315 + 108 + 101 + 32 = 556

    2. Expected Frequencies: We calculate the expected frequencies for each phenotype based on Mendel's predicted ratios and the total number of peas:

      • Round, Yellow: (9/16) * 556 = 312.75
      • Round, Green: (3/16) * 556 = 104.25
      • Wrinkled, Yellow: (3/16) * 556 = 104.25
      • Wrinkled, Green: (1/16) * 556 = 34.75
    3. Chi-Square Statistic:

      • (315 - 312.75)2 / 312.75 = 0.016
      • (108 - 104.25)2 / 104.25 = 0.137
      • (101 - 104.25)2 / 104.25 = 0.100
      • (32 - 34.75)2 / 34.75 = 0.215

      Summing these values, we get χ2 = 0.016 + 0.137 + 0.100 + 0.215 = 0.468

    4. Degrees of Freedom: df = (number of categories) - 1 = 4 - 1 = 3

    5. P-value: Using a chi-square distribution table or a statistical calculator with χ2 = 0.468 and df = 3, we find a p-value of approximately 0.926.

    Conclusion:

    Since the p-value (0.926) is much greater than a typical significance level of 0.05, we fail to reject the null hypothesis. There is strong evidence to support the claim that the observed phenotypic ratios are consistent with Mendel's predicted ratios.

    Example 3: Testing for Uniform Distribution of Birthdays

    Scenario: You want to investigate if birthdays are uniformly distributed throughout the year. You collect data on the number of births in each month for a particular year. Let's say you have the following (hypothetical) data:

    • January: 260
    • February: 230
    • March: 280
    • April: 250
    • May: 270
    • June: 240
    • July: 290
    • August: 300
    • September: 260
    • October: 280
    • November: 240
    • December: 280

    Hypotheses:

    • H0: Birthdays are uniformly distributed throughout the year.
    • H1: Birthdays are not uniformly distributed throughout the year.

    Calculations:

    1. Total Number of Births: 260 + 230 + 280 + 250 + 270 + 240 + 290 + 300 + 260 + 280 + 240 + 280 = 3280

    2. Expected Frequencies: If birthdays are uniformly distributed, we would expect approximately the same number of births in each month. So, E = 3280 / 12 = 273.33 for each month.

    3. Chi-Square Statistic:

      • (260 - 273.33)2 / 273.33 = 0.638
      • (230 - 273.33)2 / 273.33 = 7.027
      • (280 - 273.33)2 / 273.33 = 0.164
      • (250 - 273.33)2 / 273.33 = 2.027
      • (270 - 273.33)2 / 273.33 = 0.040
      • (240 - 273.33)2 / 273.33 = 4.082
      • (290 - 273.33)2 / 273.33 = 1.027
      • (300 - 273.33)2 / 273.33 = 2.653
      • (260 - 273.33)2 / 273.33 = 0.638
      • (280 - 273.33)2 / 273.33 = 0.164
      • (240 - 273.33)2 / 273.33 = 4.082
      • (280 - 273.33)2 / 273.33 = 0.164

      Summing these values, we get χ2 = 0.638 + 7.027 + 0.164 + 2.027 + 0.040 + 4.082 + 1.027 + 2.653 + 0.638 + 0.164 + 4.082 + 0.164 = 22.706

    4. Degrees of Freedom: df = (number of categories) - 1 = 12 - 1 = 11

    5. P-value: Using a chi-square distribution table or a statistical calculator with χ2 = 22.706 and df = 11, we find a p-value of approximately 0.017.

    Conclusion:

    Since the p-value (0.017) is less than a typical significance level of 0.05, we reject the null hypothesis. There is evidence to suggest that birthdays are not uniformly distributed throughout the year. Some months have significantly more or fewer births than expected under a uniform distribution. This could be due to various factors like seasonal trends in conception rates.

    Example 4: Preference for Colors

    Scenario: A marketing company wants to know if there's a preference for certain colors in packaging. They survey 200 consumers and ask them to choose their favorite color from a list of four options: Red, Blue, Green, and Yellow. The observed results are:

    • Red: 60
    • Blue: 55
    • Green: 45
    • Yellow: 40

    Hypotheses:

    • H0: There is no preference for any of the colors (i.e., each color is equally likely to be chosen).
    • H1: There is a preference for at least one of the colors (i.e., the colors are not equally likely to be chosen).

    Calculations:

    1. Total Number of Consumers: 200

    2. Expected Frequencies: If there is no preference, we would expect each color to be chosen approximately 200/4 = 50 times. So, E = 50 for each category.

    3. Chi-Square Statistic:

      • (60 - 50)2 / 50 = 2
      • (55 - 50)2 / 50 = 0.5
      • (45 - 50)2 / 50 = 0.5
      • (40 - 50)2 / 50 = 2

      Summing these values, we get χ2 = 2 + 0.5 + 0.5 + 2 = 5

    4. Degrees of Freedom: df = (number of categories) - 1 = 4 - 1 = 3

    5. P-value: Using a chi-square distribution table or a statistical calculator with χ2 = 5 and df = 3, we find a p-value of approximately 0.172.

    Conclusion:

    Since the p-value (0.172) is greater than a typical significance level of 0.05, we fail to reject the null hypothesis. There is not enough evidence to conclude that there is a significant preference for any of the colors. The observed differences in frequencies could be due to random chance.

    Example 5: Testing Genetic Ratios (More Complex)

    Scenario: In a dihybrid cross involving two genes, each with two alleles (A/a and B/b), the expected phenotypic ratio in the F2 generation is 9:3:3:1. Suppose you observe the following phenotypes in a sample of 400 offspring:

    • A_B_ (Dominant for both traits): 210
    • A_bb (Dominant for A, recessive for B): 70
    • aaB_ (Recessive for A, dominant for B): 80
    • aabb (Recessive for both traits): 40

    Hypotheses:

    • H0: The observed phenotypic ratios match the expected 9:3:3:1 ratio.
    • H1: The observed phenotypic ratios do not match the expected 9:3:3:1 ratio.

    Calculations:

    1. Total Number of Offspring: 400

    2. Expected Frequencies: Calculate the expected frequencies based on the 9:3:3:1 ratio:

      • A_B_: (9/16) * 400 = 225
      • A_bb: (3/16) * 400 = 75
      • aaB_: (3/16) * 400 = 75
      • aabb: (1/16) * 400 = 25
    3. Chi-Square Statistic:

      • (210 - 225)2 / 225 = 1
      • (70 - 75)2 / 75 = 0.333
      • (80 - 75)2 / 75 = 0.333
      • (40 - 25)2 / 25 = 9

      Summing these values, we get χ2 = 1 + 0.333 + 0.333 + 9 = 10.666

    4. Degrees of Freedom: df = (number of categories) - 1 = 4 - 1 = 3

    5. P-value: Using a chi-square distribution table or a statistical calculator with χ2 = 10.666 and df = 3, we find a p-value of approximately 0.014.

    Conclusion:

    Since the p-value (0.014) is less than a typical significance level of 0.05, we reject the null hypothesis. There is evidence to suggest that the observed phenotypic ratios do not match the expected 9:3:3:1 ratio. This could indicate linkage between the genes, epistasis, or other factors influencing the inheritance patterns.

    Key Considerations and Cautions

    • Sample Size: The Chi-Square Goodness-of-Fit test is sensitive to sample size. Small sample sizes can lead to inaccurate results. A general rule of thumb is that the expected frequency in each category should be at least 5. If this condition is not met, consider combining categories or using a different statistical test (like Fisher's exact test).
    • Independence: The observations must be independent of each other. This means that one observation should not influence another.
    • Mutually Exclusive Categories: The categories must be mutually exclusive, meaning that an observation can only belong to one category.
    • Interpretation: A statistically significant result (small p-value) indicates that the observed data does not fit the hypothesized distribution. However, it does not tell you why the data doesn't fit. Further investigation is needed to understand the underlying reasons.
    • Alternatives: If the assumptions of the Chi-Square Goodness-of-Fit test are not met, consider using alternative statistical tests, such as the Kolmogorov-Smirnov test (for continuous data) or Fisher's exact test (for small sample sizes).

    Advantages of the Chi-Square Goodness-of-Fit Test

    • Versatility: Can be used with various types of categorical data.
    • Ease of Calculation: The formula is relatively simple to apply.
    • Widely Available: Supported by most statistical software packages.

    Disadvantages of the Chi-Square Goodness-of-Fit Test

    • Sensitivity to Sample Size: Requires sufficiently large sample sizes.
    • Limited Information: Only indicates whether the data fits the hypothesized distribution, not why.
    • Assumptions: Requires independent observations and mutually exclusive categories.

    In conclusion, the Chi-Square Goodness-of-Fit test is a valuable tool for analyzing categorical data and assessing how well observed frequencies align with expected frequencies. By understanding the underlying principles, calculations, and limitations of this test, you can effectively apply it to a wide range of research and practical applications. Always remember to check the assumptions of the test and interpret the results carefully.

    Related Post

    Thank you for visiting our website which covers about Example Of Chi Square Test For Goodness Of Fit . We hope the information provided has been useful to you. Feel free to contact us if you have any questions or need further assistance. See you next time and don't miss to bookmark.

    Go Home