For my brother who was
decades ahead of me
in terms of knowledge
albeit being only two years older.
In a previous blog post, I have discussed the joint density of a random vector of two elements that are normally distributed. I was able to prove the expression for the joint probability, not without fighting against some nasty integrals. In the end, I introduced the expression of the joint probability for a random vector of m normally distributed elements and I left my four readers saying “I have no idea about how it could be proved“. We were in June, I was in the North of Italy back then, hosted by friends but mainly alone with my books in a pleasant room with pink walls and some dolls listening to my speculations without a blink; a student of engineering was sharing with me, via chat, her difficulties with the surprisingly interesting task of analyzing the data from some accelerometers put in the mouth of fat people while they stand on an oscillating platform; my birthday was approaching and I was going to come back in Rome after a short stop in Florence, where I was for the first time fully aware of how astonishingly beautiful a woman in a dress can be (and where I saw the statues that Michelangelo crafted for the Tomb of the Medicis, the monument to Lorenzo in particular, which is sculpted in our culture, more profoundly than we usually realize).
But on a day in late December, while I was planning my own cryopreservation (a thought I often indulge in when my health declines even further), I realized that the covariance matrix is a symmetrical one so it can be diagonalized, and this is the main clue in order to prove the expression of this density. As obvious as it is, I couldn’t think of that when I first encountered the multivariate normal distribution, and the reason for this fault is my continuous setbacks, the fact that for most of the last 20 years I have not only been unable to study but even to think and read. And this is also the reason why I write down these proofs in my blog: I fear that I will leave only silence after my existence, because I have not existed at all, due to my encephalopathy. I can’t do long term plans, so as soon as I finish a small project, such as this proof, I need to share it because it might be the last product of my intellect for a long time. So, what follows is mainly a proof of my own existence, more than it is a demonstration of the multivariate normal distribution.
Before introducing the math, two words about the importance of the multivariate normal distribution. Many biological parameters have a normal distribution, so the normal density is the most important continuous distribution in biology (and in medicine). But what happens when we are considering more than one parameter at the time? Suppose to have ten metabolites that follow a normal distribution each, and that you want to calculate the probability that they are all below ten respective maximal values. Well, you have to know about the multivariate normal distribution! This is the reason why I believe that anyone who is interested in biology or medicine should, at least once in her lifetime, go through the following mathematical passages.
Can’t solve a problem? Change the variables!
In this paragraph, I present a bunch of properties that we need in order to carry out our demonstration. The first one derives directly from the theorem of change of variables in multiple integrals. The second and the third ones are a set of properties of symmetrical matrices in general, and of the covariance matrix in particular. Then, I collect a set of integrals that have been introduced or calculated in the already cited blog post about the bivariate normal distribution. The last proposition is not so obvious, but I won’t demonstrate it here, and those who are interested in its proof, can contact me.
PROPOSITION 1 (change of variables). Given the continuous random vector X = (X_1, X_2, …, X_m) and the bijective function Y = Φ(X) (figure 1), where Y is a vector with m dimensions, then the joint density of Y can be expressed through the joint density of X:
PROPOSITION 2 (symmetrical matrix). Given the symmetrical matrix C, we can always write:
where λ_1, λ_2, …, λ_m are the eigenvalues of matrix C and the columns of P are the respective eigenvectors. It is also easy to see that for the inverse matrix of C we have:
Moreover, the quadratic form associated with the inverse matrix is
PROPOSITION 3 (covariance matrix). If C is the covariance matrix of the random vector X = (X_1, X_2, …, X_m), which means that
then, with the positions made in Prop. 2, we have
where σ_j is the standard deviation of X_j and ρ_i,j is the correlation coefficient between X_i and X_j.
PROPOSITION 4 (some integrals). It is possible to calculate the integrals in the following table. Those who are interested in how to calculate the table can contact me.
PROPOSITION 5 (other integrals). It is possible to calculate the two following integrals from the table. Those who are interested in how to calculate them can contact me.
PROPOSITION 6 (sum of normal random variables). Given the random vector X = (X_1, X_2, …, X_m) whose components are normally distributed, then the density of the random variable Y = X_1 + X_2 + … + X_m is a normal law whose average and standard deviations are respectively given by:
Multivariate normal distribution
PROPOSITION 7. The joint probability density in the case of a random vector whose m components follow a normal distribution is:
Demonstration, first part. The aim of this proof will be to demonstrate that if we calculate the marginal distribution of X_i from the given joint distribution, we obtain a normal distribution with an average given by μ_i. Moreover, we will prove that if we use this joint distribution to calculate the covariance between X_i and X_j, we obtain σ_iσ_jρ_i,j (I have to apologize with the reader for this weird way of writing subscripts, but WordPress doesn’t provide an equation editor). We start operating the following change of variables:
whose Jacobian is the identity matrix. So we obtain for the joint density in Eq. 9 the expression:
We then consider the substitution
whose Jacobian is the determinant of P which is again the identity matrix, since P is an orthogonal matrix (P is the matrix introduced in Prop. 2, whose columns are eigenvectors of the covariance matrix). Then we have
And, according to Prop. 1, we obtain for the joint distribution in Eq. 9 the expression:
So, the marginal distribution of the first random variable is
We recognize the integrals in Prop. 4, for n = 0. So we have for the marginal distribution:
while the joint distribution becomes
Let’s now consider another change of variable, the following one:
whose Jacobian is given by:
Then, according to Prop. 1, we have
This proves that the variables X_1”, X_2”, … ,X_m” are independent. But they are also normally distributed random variables whose average is zero and whose standard deviation is
for i that goes from 1 to m. Since we have
we can calculate the marginal distribution of ξ_j according to Prop. 6:
Remembering the very first substitution (Eq. 10) we then draw the following conclusion:
Now, if you remember Prop. 3, you can easily conclude that the marginal density of X_j is, in fact, a normal distribution with average given by μ_j and standard deviation given by σ_j. This concludes the first part of the demonstration. It is worth noting that we have calculated, in the previous lines, a very complex integral (the first collected in the following paragraph), and we can be proud of ourselves.
Demonstration, second part. We have now to prove that the covariance coefficient of between X_i and X_j, is given by ρ_i,j. In order to do that, we’ll calculate the covariance between X_i and X_j with the formula
19) Cov[X_i, X_j] = E[X_i×X_j] – E[X_i]×E[X_j] = E[X_i×X_j] – μ_i×μ_j
For E[X_i×X_j] we have
Considering the substitution in Eq. 10 we have
To simplify the writing, let’s assume i=1 and j=2. For I_1 we have:
Now, considering again Prop. 4, we easily recognize that:
So, the integral I_1 becomes:
For I_2 we have:
So, I_2 is zero and the same applies to I_3, as the reader can easily discover by herself, using Eq. 20. Hence, we have found:
Now, just consider Eq. 7 (the second one) in Prop. 3, and you will recognize that we have found
which is exactly what we were trying to demonstrate. The reader has likely realized that we have just calculated another complex integral, the second one in the following paragraph. It can be also verified that the joint density is, in fact, a density: in order for that to be true it must be
Now, if we use the substitutions in Eq. 10 and in Eq. 12 we obtain:
And our proof is now complete.
Prop. 7 is nothing more than the calculation of three very complex integrals. I have collected these results in what follows. Consider that you can substitute the covariance matrix with any symmetrical one, and these formulae still hold.