Analyzing the variances of dependent variables and the sums of those variances is an essential aspect of statistics and actuarial science. The concept of covariance is an indispensable tool for such analysis.
Let us assume that there are two random variables, X and Y. We can call the mathematical expectations of each of these variables E(X) and E(Y) respectively, and their variances Var(X) and Var(Y) respectively. What do we do when we want to find the variance of the sum of the random variables, X+Y? If X and Y are independent variables, this is easy to determine; in that case, simple addition accomplishes the task: Var(X+Y) = Var(X) + Var(Y).
But what if X and Y are dependent? Then the variance of the sum most often does not simply equal sum of the variances. Instead, the idea of covariance must be applied to the analysis. We shall denote the covariance of X and Y as Cov(X, Y).
Two crucial formulas are needed in order to deal effectively with the covariance concept:
Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y)
Cov(X, Y) = E(XY) – E(X)E(Y)
We note that these formulas work for both independent and dependent variables. For independent variables, Var(X+Y) = Var(X) + Var(Y), so Cov(X, Y) = 0. Similarly, for independent variables, E(XY) = E(X)E(Y), so Cov(X, Y) = 0.
This leads us to the general insight that the covariance of independent variables is equal to zero. Indeed, this makes conceptual sense as well. The covariance of two variables is a tool that tells us how much of an effect the variation in one of the variables has on the other variable. If two variables are independent, what happens to one has no effect on the other, so the variables’ covariance must be zero.
Covariances can be positive or negative, and the sign of the covariance can give useful information about the kind of relationship that exists between the random variables in question. If the covariance is positive, then there exists a direct relationship between two random variables; an increase in the values of one tends to also increase the values of the other. If the covariance is negative, then there exists an inverse relationship between two random variables; an increase in the values of one tends to decrease the values of the other, and vice versa.
In some problems involving covariance, it is possible to work from even the most basic information to determine the solution. When given random variables X and Y, if one can compute E(X), E(Y), E(X2), E(Y2), and E(XY), one will have all the data necessary to solve for Cov(X, Y) and Var(X+Y). From the way each random variable is defined, one can derive the mathematical expectations above and use them to arrive at the covariance and the variance of the sums for the two variables.