Confidence interval for the population proportion
The confidence interval for a population tells us how confident we can be that a sample proportion represents the actual population proportion
In real life, we usually won’t know the population proportion ???p???, because we won’t be able to survey or test every subject within our population.
Instead, we’ll have to take a smaller sample of our larger population, and then compute the sample proportion ???\hat p???. Once we find ???\hat p???, we can use it to make inferences about the value of the population proportion ???p???.
We’ll find the sample proportion ???\hat p??? by taking a sample with ???n??? subjects and surveying the number of those subjects that meet our criteria. Out of that sample, the percentage that meet our criteria will be ???\hat p???.
???\hat p=\frac{\text{number of subjects that meet our criteria}}{n}???
The standard deviation of the sampling distribution will be
???\sigma_{\hat p}=\sqrt{\frac{p(1-p)}{n}}???
If we don’t know the population proportion ???p??? (which we usually don’t), we’ll substitute the sample proportion ???\hat p??? for the population proportion ???p??? and use this formula instead for standard deviation:
???SE_{\hat p}=\sigma_{\hat p}=\sqrt{\frac{\hat p(1-\hat p)}{n}}???
To distinguish sample standard deviation from population standard deviation, we call it the standard error (which is why we wrote ???SE_{\hat p}???, and we show in the formula that we’re using ???\hat p??? instead of ???p???.
Then if we’re sampling from an infinite population ???N??? or if we’re sampling without replacement, we can say that the confidence interval for the population proportion is
???(a,b)=\hat p\pm z^*\cdot \sqrt{\frac{\hat p(1-\hat p)}{n}}???
If we’re sampling without replacement from a population of finite size ???N???, then the confidence interval for the population proportion is
???(a,b)=\hat p\pm z^*\cdot \sqrt{\frac{\hat p(1-\hat p)}{n}}\sqrt{\frac{N-n}{N-1}}???
So if we know how we’re sampling, what confidence level we want to use, and we know the sample proportion and standard error, then we can plug these values into the correct formula, find the critical value associated with the confidence level, and then calculate the confidence interval directly.
Calculating confidence intervals for population proportions
Take the course
Want to learn more about Probability & Statistics? I have a step-by-step course for that. :)
Finding a 90% confidence interval around a proportion
Example
There are ???500??? sea turtles that live in a bay off of Maui, Hawaii, and we want to estimate the proportion that are male. Let’s say we take a random sample of ???50??? turtles and find that ???20??? of them are male.
Based on this sample, what is a ???90\%??? confidence interval for the proportion of male sea turtles in the bay.
As always, we have to first check for normality. We were told that the sample we took was random.
Based on the sample proportion ???\hat p=20/50=0.4???, we’ll get at least ???10??? “successes” (???50\cdot0.4=20???) and at least ???10??? “failures” (???50\cdot0.6=30???), so we’ve met the normal condition. And even though it looks like we’re sampling without replacement, our sample is only ???10\%??? of the total population (???50/500=10\%???), so we’ve met the independence condition as well.
We’ll use the formula ???\hat p\pm z^*SE_{\hat p}??? for the confidence interval. The proportion of male sea turtles in the sample is ???\hat p=20/50=0.4???. Which means we can already say
???\hat p\pm z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}???
???0.4\pm z^*\sqrt{\frac{0.4(0.6)}{50}}???
For a ???90\%??? confidence interval, we’re looking in a normal distribution at the middle ???90\%??? of probability, which means we’ll only have ???10\%??? probability in the two little tails, or just ???5\%??? in the top tail, which means we’re interested in the ???z???-score that puts us at ???95\%??? probability. If we look for approximately ???0.9500??? in the ???z???-table, we get about ???1.645???. So the critical value ???z^*??? is approximately ???1.645???, and we can say that our confidence interval is
???0.4\pm 1.65\sqrt{\frac{0.4(0.6)}{50}}???
???0.4\pm 1.65\sqrt{\frac{0.24}{50}}???
???0.4\pm 1.65\sqrt{0.0048}???
???0.2857??? to ???0.5143???
We interpret this to mean that about ???90\%??? of the confidence intervals we construct this way (with ???50???-turtle samples) will contain the actual population proportion ???p??? of male sea turtles in the bay.
Margin of error
Margin of error is simply the right-hand side of the formula we looked at earlier:
???z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}???
If we want to keep our margin of error under a certain value, then we can set up an inequality that will allow us to find the minimum possible sample size we’d need to use.
Example
We want the margin of error in our sea turtle study to be no more than ???\pm4\%??? at a ???90\%??? confidence level. Find the smallest possible sample size we can use to stay within that margin of error.
First, we’ll set up the inequality.
???z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}\le0.04???
If we want to find the smallest possible sample size ???n??? that keeps us within this margin of error, then we need to optimize ???\hat p(1-\hat p)???, since making the numerator of a fraction as large as possible will make the entire fraction as large as possible. In turn, that will make the value of the square root as large as possible, which will make the entire value on the left side of the inequality as large as possible, thereby minimizing the value of ???n???.
We could prove this algebraically, but the value of ???\hat p??? that optimizes ???\hat p(1-\hat p)??? is always ???\hat p=0.5???. Therefore, we’ll plug everything we know into the margin of error inequality, remembering that a ???90\%??? confidence level has a critical value of approximately ???1.65???.
???1.65\sqrt{\frac{0.5(0.5)}{n}}\le0.04???
???\frac{\sqrt{0.5^2}}{\sqrt{n}}\le\frac{0.04}{1.65}???
???\frac{0.5}{\sqrt{n}}\le\frac{0.04}{1.65}???
We can invert both fractions if we flip the inequality sign.
???\frac{\sqrt{n}}{0.5}\ge\frac{1.65}{0.04}???
???\sqrt{n}\ge\frac{1.65}{0.04}(0.5)???
???n\ge\left(\frac{1.65}{0.04}(0.5)\right)^2???
???n\ge425.39???
If we need to sample more than ???425.39??? members of our population, that means we need to sample at least ???426??? of them, because only ???425??? wouldn’t meet our threshold.
???n\ge426???
Required sample size for fixed margin of error
Just like with the mean, we’ll want to determine the smallest possible sample we can take in order to stick to a specific margin of error when we’re working with confidence intervals for a population proportion. We can easily find the sample size by manipulating the margin of error formula and then plugging in a few values. The margin of error formula is
???ME=z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}???
Since we want to find a sample size, solve this for ???n??? (which represents sample size).
???ME=z^*\frac{\sqrt{\hat p(1-\hat p)}}{\sqrt{n}}???
???ME\sqrt{n}=z^*\sqrt{\hat p(1-\hat p)}???
???\sqrt{n}=\frac{z^*\sqrt{\hat p(1-\hat p)}}{ME}???
???n=\left(\frac{z^*\sqrt{\hat p(1-\hat p)}}{ME}\right)^2???
Now, let’s say for instance that we’re solving a problem where we want a ???99\%??? confidence interval (corresponding to a ???z???-score of ???2.58???), we know the sample proportion is ???0.50??? (if you don’t have an estimate of the population proportion, you should use ???\hat p=0.50???), and we want a margin of error of ???\pm4??? percent (???0.04???), then the smallest possible sample size we can take to ensure that margin of error is
???n=\left(\frac{2.58\sqrt{0.50(1-0.50)}}{0.04}\right)^2???
???n=\left(\frac{2.58\sqrt{0.25}}{0.04}\right)^2???
???n=32.25^2???
???n\approx1,040.06???
To meet that threshold, and keep a margin of error of ???\pm0.04??? at ???99\%??? confidence, we’d need to take a sample size of at least ???n=1,041???.