Hypothesis Testing With Two Samples Examples

Hypothesis testing with two samples allows us to determine if there's a significant difference between the means or proportions of two independent groups. It's a powerful tool in statistics, enabling us to make informed decisions based on data. This method is frequently used in various fields like healthcare, marketing, and social sciences to compare different treatments, strategies, or characteristics.

Understanding the Fundamentals

Before diving into examples, it's crucial to grasp the core concepts. In hypothesis testing, we formulate two opposing statements: the null hypothesis (H₀) and the alternative hypothesis (H₁).

Null Hypothesis (H₀): This is the statement of no effect or no difference. It assumes that any observed difference is due to random chance.
Alternative Hypothesis (H₁): This statement contradicts the null hypothesis. It proposes that there is a significant difference between the two groups being compared.

We then collect data, calculate a test statistic, and determine the p-value. The p-value represents the probability of observing the data (or more extreme data) if the null hypothesis were true. We compare the p-value to a pre-determined significance level (alpha, typically 0.05).

If the p-value is less than alpha, we reject the null hypothesis and conclude that there is statistically significant evidence to support the alternative hypothesis.
If the p-value is greater than alpha, we fail to reject the null hypothesis. This doesn't mean the null hypothesis is true, just that we don't have enough evidence to reject it.

Choosing the right test is crucial. The most common tests for two samples are:

Independent Samples t-test: Used to compare the means of two independent groups when the data is continuous and approximately normally distributed.
Paired Samples t-test: Used when comparing the means of two related groups (e.g., pre-test and post-test scores for the same individuals). Since we're focusing on two independent samples, we won't delve into paired samples t-tests in this article.
Z-test: Used to compare the means of two independent groups when the population standard deviations are known, or the sample sizes are large enough to approximate them.
Two-Proportion Z-test: Used to compare the proportions of two independent groups.

The choice depends on the type of data (continuous vs. categorical), whether the data is paired or independent, and whether population standard deviations are known or unknown.

Example 1: Comparing Exam Scores of Two Different Teaching Methods

Scenario: A professor wants to compare the effectiveness of two different teaching methods: a traditional lecture-based approach and an interactive, project-based approach. She randomly divides her students into two groups: Group A (traditional) and Group B (interactive). After the semester, she collects their final exam scores.

Data:

Group A (Traditional): n₁ = 35, Mean₁ = 75, Standard Deviation₁ = 8
Group B (Interactive): n₂ = 40, Mean₂ = 82, Standard Deviation₂ = 6

Steps for Hypothesis Testing:

State the Hypotheses:
- H₀: μ₁ = μ₂ (There is no significant difference in the mean exam scores between the two teaching methods.)
- H₁: μ₁ ≠ μ₂ (There is a significant difference in the mean exam scores between the two teaching methods.) This is a two-tailed test.
Choose the Significance Level (Alpha): Let's use α = 0.05.
Select the Appropriate Test: Since we are comparing the means of two independent groups and the population standard deviations are unknown (but we have sample standard deviations), we can use an independent samples t-test.
Calculate the Test Statistic: The formula for the t-statistic is:

t = (Mean₁ - Mean₂) / √((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))

t = (75 - 82) / √((8²/35) + (6²/40))

t = -7 / √(1.8286 + 0.9)

t = -7 / √2.7286

t = -7 / 1.6519

t ≈ -4.237
Determine the Degrees of Freedom: For an independent samples t-test, the degrees of freedom (df) can be approximated using Welch's t-test formula (since we don't assume equal variances):

df ≈ ((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))² / (((Standard Deviation₁²/n₁)² / (n₁ - 1)) + ((Standard Deviation₂²/n₂)² / (n₂ - 1)))

df ≈ (2.7286)² / (((1.8286)² / 34) + ((0.9)² / 39))

df ≈ 7.4452 / ((3.3438 / 34) + (0.81 / 39))

df ≈ 7.4452 / (0.0984 + 0.0208)

df ≈ 7.4452 / 0.1192

df ≈ 62.46

We can round this down to df = 62.
Find the p-value: Using a t-table or statistical software with df = 62, we find the p-value associated with t = -4.237. Since it's a two-tailed test, we need to consider both tails. The p-value is extremely small, much less than 0.001. (Statistical software would give a more precise p-value).
Make a Decision: Since the p-value (< 0.001) is less than the significance level (α = 0.05), we reject the null hypothesis.
Conclusion: There is statistically significant evidence to suggest that there is a difference in the mean exam scores between the two teaching methods. The interactive, project-based teaching method (Group B) appears to be more effective.

Example 2: Comparing Customer Satisfaction Between Two Different Website Designs

Scenario: A company wants to determine if a new website design leads to higher customer satisfaction compared to the old design. They randomly assign website visitors to either the old design (Group A) or the new design (Group B) and then survey them about their satisfaction level.

Data:

Group A (Old Design): n₁ = 150, Mean Satisfaction Score₁ = 6.8, Standard Deviation₁ = 1.5
Group B (New Design): n₂ = 180, Mean Satisfaction Score₂ = 7.5, Standard Deviation₂ = 1.2

Steps for Hypothesis Testing:

State the Hypotheses:
- H₀: μ₁ = μ₂ (There is no significant difference in the mean satisfaction scores between the two website designs.)
- H₁: μ₁ < μ₂ (The new website design leads to a significantly higher mean satisfaction score than the old design.) This is a one-tailed test.
Choose the Significance Level (Alpha): Let's use α = 0.01.
Select the Appropriate Test: Since we are comparing the means of two independent groups, the data is continuous (satisfaction scores), and we have sample standard deviations, we can use an independent samples t-test. Because the sample sizes are relatively large, we could also consider using a z-test, assuming the sample standard deviations are good estimates of the population standard deviations. For this example, we'll stick with the t-test.
Calculate the Test Statistic:

t = (Mean₁ - Mean₂) / √((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))

t = (6.8 - 7.5) / √((1.5²/150) + (1.2²/180))

t = -0.7 / √(0.015 + 0.008)

t = -0.7 / √0.023

t = -0.7 / 0.1517

t ≈ -4.614
Determine the Degrees of Freedom: Using Welch's t-test formula again:

df ≈ ((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))² / (((Standard Deviation₁²/n₁)² / (n₁ - 1)) + ((Standard Deviation₂²/n₂)² / (n₂ - 1)))

df ≈ (0.023)² / (((0.015)² / 149) + ((0.008)² / 179))

df ≈ 0.000529 / ((0.000225 / 149) + (0.000064 / 179))

df ≈ 0.000529 / (0.00000151 + 0.00000036)

df ≈ 0.000529 / 0.00000187

df ≈ 282.89

We can round this down to df = 282.
Find the p-value: Using a t-table or statistical software with df = 282, we find the p-value associated with t = -4.614. Since it's a one-tailed test (we're only interested in whether the new design is better), we only look at the left tail. The p-value is extremely small, much less than 0.001.
Make a Decision: Since the p-value (< 0.001) is less than the significance level (α = 0.01), we reject the null hypothesis.
Conclusion: There is statistically significant evidence to suggest that the new website design leads to a higher mean customer satisfaction score compared to the old design.

Example 3: Comparing Success Rates of Two Different Marketing Campaigns

Scenario: A marketing company runs two different online advertising campaigns (Campaign A and Campaign B) to promote a new product. They want to determine which campaign is more effective in generating sales.

Data:

Campaign A: n₁ = 500, Number of Sales₁ = 50, Proportion of Sales₁ = 50/500 = 0.10
Campaign B: n₂ = 600, Number of Sales₂ = 78, Proportion of Sales₂ = 78/600 = 0.13

Steps for Hypothesis Testing:

State the Hypotheses:
- H₀: p₁ = p₂ (There is no significant difference in the proportion of sales between the two marketing campaigns.)
- H₁: p₁ ≠ p₂ (There is a significant difference in the proportion of sales between the two marketing campaigns.) This is a two-tailed test.
Choose the Significance Level (Alpha): Let's use α = 0.05.
Select the Appropriate Test: Since we are comparing the proportions of two independent groups, we use a two-proportion z-test.
Calculate the Test Statistic: The formula for the z-statistic is:

z = (p₁ - p₂) / √(p̂(1 - p̂)(1/n₁ + 1/n₂))

Where p̂ is the pooled sample proportion:

p̂ = (Number of Sales₁ + Number of Sales₂) / (n₁ + n₂) = (50 + 78) / (500 + 600) = 128 / 1100 ≈ 0.1164

Now we can calculate the z-statistic:

z = (0.10 - 0.13) / √(0.1164(1 - 0.1164)(1/500 + 1/600))

z = -0.03 / √(0.1164 * 0.8836 * (0.002 + 0.001667))

z = -0.03 / √(0.1028 * 0.003667)

z = -0.03 / √0.000377

z = -0.03 / 0.0194

z ≈ -1.546
Find the p-value: Using a z-table or statistical software, we find the p-value associated with z = -1.546. Since it's a two-tailed test, we consider both tails. The p-value is approximately 2 * 0.061 = 0.122.
Make a Decision: Since the p-value (0.122) is greater than the significance level (α = 0.05), we fail to reject the null hypothesis.
Conclusion: There is not enough statistically significant evidence to suggest that there is a difference in the proportion of sales between the two marketing campaigns.

Example 4: Comparing the Heights of Two Different Plant Species

Scenario: A biologist wants to investigate whether there's a difference in the average height of two different species of plants (Species A and Species B) grown under the same environmental conditions.

Data:

Species A: n₁ = 45, Mean Height₁ = 25 cm, Standard Deviation₁ = 4 cm
Species B: n₂ = 50, Mean Height₂ = 28 cm, Standard Deviation₂ = 5 cm

Steps for Hypothesis Testing:

State the Hypotheses:
- H₀: μ₁ = μ₂ (There is no significant difference in the mean height between the two plant species.)
- H₁: μ₁ ≠ μ₂ (There is a significant difference in the mean height between the two plant species.) This is a two-tailed test.
Choose the Significance Level (Alpha): Let's use α = 0.05.
Select the Appropriate Test: Since we are comparing the means of two independent groups and the population standard deviations are unknown (but we have sample standard deviations), we can use an independent samples t-test.
Calculate the Test Statistic:

t = (Mean₁ - Mean₂) / √((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))

t = (25 - 28) / √((4²/45) + (5²/50))

t = -3 / √(0.3556 + 0.5)

t = -3 / √0.8556

t = -3 / 0.925

t ≈ -3.243
Determine the Degrees of Freedom: Using Welch's t-test formula again:

df ≈ ((Standard Deviation₁²/n₁) + (Standard Deviation₂²/n₂))² / (((Standard Deviation₁²/n₁)² / (n₁ - 1)) + ((Standard Deviation₂²/n₂)² / (n₂ - 1)))

df ≈ (0.8556)² / (((0.3556)² / 44) + ((0.5)² / 49))

df ≈ 0.7319 / ((0.1265 / 44) + (0.25 / 49))

df ≈ 0.7319 / (0.002875 + 0.005102)

df ≈ 0.7319 / 0.007977

df ≈ 91.75

We can round this down to df = 91.
Find the p-value: Using a t-table or statistical software with df = 91, we find the p-value associated with t = -3.243. Since it's a two-tailed test, we consider both tails. The p-value is approximately 0.0016.
Make a Decision: Since the p-value (0.0016) is less than the significance level (α = 0.05), we reject the null hypothesis.
Conclusion: There is statistically significant evidence to suggest that there is a difference in the mean height between the two plant species. Species B appears to be taller on average.

Key Considerations and Assumptions

Independence: The samples must be independent of each other. This means that the data points in one sample should not be related to the data points in the other sample.
Normality: The t-test assumes that the data is approximately normally distributed. If the sample sizes are large enough (generally n > 30), the Central Limit Theorem can help mitigate the impact of non-normality. If the data is severely non-normal and the sample sizes are small, non-parametric tests like the Mann-Whitney U test might be more appropriate.
Equal Variances (Homogeneity of Variance): The standard t-test assumes that the variances of the two populations are equal. If this assumption is violated, Welch's t-test (which we used in the examples) should be used. Welch's t-test does not assume equal variances. Levene's test can be used to formally test for equal variances.
Sample Size: Larger sample sizes provide more statistical power, increasing the likelihood of detecting a true difference if one exists.
Type I and Type II Errors:
- Type I Error (False Positive): Rejecting the null hypothesis when it is actually true. The probability of a Type I error is equal to the significance level (α).
- Type II Error (False Negative): Failing to reject the null hypothesis when it is actually false. The probability of a Type II error is denoted by β. The power of a test is 1 - β, which represents the probability of correctly rejecting a false null hypothesis.

Conclusion

Hypothesis testing with two samples is a valuable statistical method for comparing groups and making data-driven decisions. By understanding the fundamental principles, choosing the appropriate test, and carefully interpreting the results, you can effectively use this technique to answer a wide range of research questions. Remember to always consider the assumptions of the test and potential limitations of your data. These examples provide a foundation for applying these techniques in various contexts, empowering you to draw meaningful conclusions from your data analysis.

Hypothesis Testing With Two Samples Examples

Table of Contents

Understanding the Fundamentals

Example 1: Comparing Exam Scores of Two Different Teaching Methods

Example 2: Comparing Customer Satisfaction Between Two Different Website Designs

Example 3: Comparing Success Rates of Two Different Marketing Campaigns

Example 4: Comparing the Heights of Two Different Plant Species

Key Considerations and Assumptions

Conclusion

Latest Posts

Latest Posts

Related Post