Combinations of random variables
What happens when we combine linear random variables
Linear combinations of random variables
We just reviewed what happens when we shift or scale a data set by a constant value. But now we want to look at what happens when we combine two data sets, either by adding them or subtracting them.
For example, let’s say I have two variables: how much time I spend each day walking and biking, ???W??? and ???B??? respectively. And let’s say that I have data on my walking and biking habits for a full year, and I’ve already found the mean and standard deviation for both variables.
But now I want to know the mean for the sum of my walking and biking time together. In other words, I spend some time walking and biking each day, so I’d like to get an average for my total daily activity time.
We’ll call the total activity ???A???, which means we’re looking for ???\mu_A??? (or the expected value ???E(A)???). We know that ???A=W+B???. Here’s the rule we need to remember: when we want to find the mean of the sum, we just find the sum of the means. So because ???A=W+B???,
???\mu_A=\mu_W+\mu_B???
???\mu_A=1.1+0.6???
???\mu_A=1.7???
But if we want to find the standard deviation of these two variables, we can’t simply add the standard deviations together. In other words, ???\sigma_A=\sigma_W+\sigma_B??? is not a valid equation.
Instead, to find the standard deviation of the total activity, we need to square the two standard deviations. Remember that this is really giving us the variation for both walking and biking.
???\sigma^2_W=0.2^2=0.04???
???\sigma^2_B=0.1^2=0.01???
Then we add these together to get the sum of the variances, which gives us the variance for total activity.
???\sigma^2_A=\sigma^2_W+\sigma^2_B???
???\sigma^2_A=0.04+0.01???
???\sigma^2_A=0.05???
Then to find standard deviation for total activity, we take the square root of both sides.
???\sqrt{\sigma^2_A}=\sqrt{0.05}???
???\sigma_A\approx0.22???
Instead of the sum, we could also find the difference in my walking and biking times. We could define a new variable for the difference and call it ???D???. Then the difference is ???D=W-B???, and the expected value of the difference would be
???E(D)=\mu_D=\mu_W-\mu_B???
???E(D)=\mu_D=1.1-0.6???
???E(D)=\mu_D=0.5???
And the standard deviation of the difference would be
???\sqrt{\sigma^2_D}=\sqrt{\sigma^2_W+\sigma^2_B}???
???\sqrt{\sigma^2_D}=\sqrt{0.04+0.01}???
???\sqrt{\sigma^2_D}=\sqrt{0.05}???
???\sigma_D\approx0.22???
One important thing to note is that, regardless of whether we’re finding the sum of the variables, or the difference of the variables, in both cases we take the sum of the variances ???\sigma^2_W+\sigma^2_B???. We don’t use the sum of the variances for the sum, and the difference of the variances for the difference; we always use the sum for both.
When we find the mean of the sum or difference of variables, it doesn’t matter whether or not the variables are dependent or independent. In other words, if the variables are dependent, we can find a valid mean of their sum or difference. And if the variables are independent, we can find a valid mean of their sum or difference.
But in order to find the standard deviation of the sum or difference of two variables, the variables must be independent. So we can summarize what we know about the formulas this way:
Combinations of normally distributed variables
When we combine variables that are both normally distributed, the combination will be normally distributed as well.
So if we’re given the mean and standard deviation of two normally distributed variables, we can calculate the mean and standard deviation of the new combination.
But then, since the combination is normally distributed, we can use what we know about the probability under normal distributions to answer probability questions about the combination.
How to find the mean and standard deviation of a combination of random variables
Take the course
Want to learn more about Probability & Statistics? I have a step-by-step course for that. :)
Answering probability questions with random variable combinations
Example
A popcorn company fills each of its variety popcorn tins with three flavors of popcorn: white cheddar, caramel, and chocolate covered. The amount of each flavor of popcorn that gets packed in the tin is normally distributed with a mean of ???1??? pound and a standard deviation of ???0.1??? pounds. The amount of each popcorn flavor is independent from the other flavors.
If ???W??? is the total weight of popcorn in a randomly selected tin, find the probability that the tin contains less than ???3.25??? pounds.
We have three normally distributed variables, one for each flavor. Their means are
Therefore, the mean of the combination (the mean weight of a full tin) is
???\mu_W=\mu_D+\mu_M+\mu_C???
???\mu_W=1+1+1???
???\mu_W=3???
The standard deviations of the three normally distributed variables are
To find the standard deviation of the combination (the standard deviation of the weight of a full tin), we’ll find the variance of the combination.
???\sigma_W^2=\sigma_D^2+\sigma_M^2+\sigma_C^2???
???\sigma_W^2=0.1^2+0.1^2+0.1^2???
???\sigma_W^2=0.01+0.01+0.01???
???\sigma_W^2=0.03???
So the standard deviation is
???\sigma_W=\sqrt{0.03}???
???\sigma_W\approx0.1732???
Now that we have the mean ???\mu_W=3??? and standard deviation ???\sigma_W\approx0.1732??? of the normally distributed weight of the full tin, we can answer probability questions about the combined normal distribution. We want to find the probability that the tin contains less than ???3.25??? pounds.
The distance of ???3.25??? from the mean of ???3??? is
???3.25-3=0.25???
Expressed in standard deviations, that’s
???\frac{0.25}{0.1732}\approx1.44???
standard deviations above the mean. If we look up ???z=1.44??? in a ???z???-table, we find the value ???0.9251???. Which means there’s an approximately ???93\%??? chance that the weight of the full tin is less than ???3.25??? pounds.