Hypothesis testing with a population proportion
Setting up a hypothesis test for a population proportion
Up to now we’ve been focused mostly on hypothesis testing for the mean, but we can also perform hypothesis tests for a proportion.
In order for the test to work, we’ll need ???np\geq10??? and ???n(1-p)\geq10???, where ???p??? is the probability of a success, and ???1-p??? is the probability of failure.
We want the number of successes and failures to be at least ???10???, because that threshold means that the probability distribution will approximate the normal curve. Any fewer than either ???10??? successes or ???10??? failures and the distribution will be non-normal.
Keep in mind that we have to be careful about distinguishing between the proportion ???p??? and the ???p???-value of the test.
One-tail test
Just like for the mean, a one-tail test for the population proportion indicates directionality, so our hypothesis statements will either be
???H_0???: ???p\leq k???
???H_a???: ???p>k???
if we believe that ???p>k???, or
???H_0???: ???p\geq k???
???H_a???: ???p<k???
for some assumed value of ???k???. Once you set the hypothesis tests, you’ll pick a significance level, calculate the test statistic, and state the conclusion.
Working through a full hypothesis test with a population proportion
Take the course
Want to learn more about Probability & Statistics? I have a step-by-step course for that. :)
Hypothesis testing with population proportions
Example
We want to test the hypothesis that more than ???32\%??? of Americans watch the Super Bowl, so we collect a random sample of ???1,000??? Americans and find that ???350??? of them watched the game. What can you conclude at a significance level of ???\alpha=0.05????
First build the hypothesis statements.
???H_0???: At most ???32\%??? of Americans watched the Super Bowl, ???p\leq0.32???
???H_a???: More than ???32\%??? of Americans watched the Super Bowl, ???p>0.32???
The sample proportion is
???\hat p=\frac{x}{n}=\frac{350}{1,000}=0.35???
Then find the standard error of the proportion.
???\sigma_{\hat p}=\sqrt{\frac{p_0(1-p_0)}{n}}=\sqrt{\frac{0.32(1-0.32)}{1,000}}=\sqrt{\frac{0.2176}{1,000}}\approx0.0148???
Now we have enough to find the ???z???-value of the test-statistic.
???z=\frac{\hat p-p_0}{\sigma_{\hat p}}=\frac{0.35-0.32}{0.0148}\approx2.0337???
The critical ???z???-value for ???95\%??? confidence with a one-tail upper tail test is ???z=1.65???.
Our ???z???-value exceeds ???z=1.65??? and therefore falls in the region of rejection, which means we’ll reject the null hypothesis and conclude that more than of Americans watch the Super Bowl.
We know our findings are significant at ???\alpha=0.05???, but we can find the ???p???-value to state a higher level of significance that corresponds to ???z\approx1.99??? and not just ???z=1.65???. The test statistic ???z\approx1.99??? gives a value of ???0.9767??? in the ???z???-table.
Which means the conclusion isn’t only significant at ???\alpha=0.05???, but it’s actually significant at
???1-0.9767=0.0233???
The result is significant at the ???0.0233??? level. As long as ???\alpha\geq0.0233???, we’ll be able to reject ???H_0???.
Two-tail test
The two-tailed test for the population proportion follows the same steps as the one-tailed test, other than the fact that we split the alpha value into both tails.
Let’s continue with the same Super Bowl example we were using, but this time we’ll say that we don’t have a guess about directionality, and instead will simply hypothesize that the proportion of Americans who watch the Super Bowl isn’t ???32\%???.
Example (cont’d)
We want to test the hypothesis that ???32\%??? of Americans watch the Super Bowl, so we collect a random sample of ???1,000??? Americans and find that ???350??? of them watched the game. What can you conclude at a significance level of ???\alpha=0.05????
First build the hypothesis statements.
???H_0???: ???32\%??? of Americans watched the Super Bowl, ???p=0.32???
???H_a???: The proportion of Americans who watched the Super Bowl was not ???32\%???, ???p\neq0.32???
We already calculated that the standard error of the proportion is ???\sigma_p\approx0.0148???, the sample proportion is ???\hat p=0.35???, and the ???z???-value of the test-statistic is ???z\approx2.0337???.
Because we’re doing a two-tail test, ???\alpha=0.05??? needs to be split as ???0.025??? in the lower tail and ???0.025??? in the upper tail. Which means we’re looking for the value in the ???z???-table that corresponds to ???1-0.025=0.975???.
So ???z=\pm1.96??? will be the critical values. Our ???z???-value exceeds ???z=1.96??? and therefore still falls in the region of rejection (even though we’ve switched from a one-tail test to a two-tail test), which means we’ll again reject the null hypothesis and conclude that the proportion of Americans who watch the Super Bowl is not ???32\%???.