“Las matemàticas comprenden todas la ciencias”

Miguel Alvàrez Osorio, 1775

For my brother who was

decades ahead of me

in terms of knowledge

albeit being only two years older

Introduction

In a previous blog post, I have discussed the joint density of a random vector of two elements that are normally distributed. I was able to prove the expression for the joint probability, not without fighting against some nasty integrals. In the end, I introduced the expression of the joint probability for a random vector of m normally distributed elements and I left my four readers saying “I have no idea about how it could be proved“. We were in June, I was in the North of Italy back then, hosted by friends but mainly alone with my books in a pleasant room with pink walls and some dolls listening to my speculations without a blink; a student of engineering was sharing with me, via chat, her difficulties with the surprisingly interesting task of analyzing the data from some accelerometers put in the mouth of fat people while they stand on an oscillating platform; my birthday was approaching and I was going to come back in Rome after a short stop in Florence, where I was for the first time fully aware of how astonishingly beautiful a woman in a dress can be (and where I saw the statues that Michelangelo crafted for the Tomb of the Medicis, the monument to Lorenzo in particular, which is sculpted in our culture, more profoundly than we usually realize).

But on a day in late December, while I was planning my own cryopreservation (a thought I often indulge in when my health declines even further), I realized that the covariance matrix is a symmetrical one so it can be diagonalized, and this is the main clue in order to prove the expression of this density. As obvious as it is, I couldn’t think of that when I first encountered the multivariate normal distribution, and the reason for this fault is my continuous setbacks, the fact that for most of the last 20 years I have not only been unable to study but even to think and read. And this is also the reason why I write down these proofs in my blog: I fear that I will leave only silence after my existence, because I have not existed at all, due to my encephalopathy. I can’t do long term plans, so as soon as I finish a small project, such as this proof, I need to share it because it might be the last product of my intellect for a long time. So, what follows is mainly a proof of my own existence, more than it is a demonstration of the multivariate normal distribution.

Before introducing the math, two words about the importance of the multivariate normal distribution. Many biological parameters have a normal distribution, so the normal density is the most important continuous distribution in biology (and in medicine). But what happens when we are considering more than one parameter at the time? Suppose to have ten metabolites that follow a normal distribution each, and that you want to calculate the probability that they are all below ten respective maximal values. Well, you have to know about the multivariate normal distribution! This is the reason why I believe that anyone who is interested in biology or medicine should, at least once in her lifetime, go through the following mathematical passages.

Can’t solve a problem? Change the variables!

In this paragraph, I present a bunch of properties that we need in order to carry out our demonstration. The first one derives directly from the theorem of change of variables in multiple integrals. The second and the third ones are a set of properties of symmetrical matrices in general, and of the covariance matrix in particular. Then, I collect a set of integrals that have been introduced or calculated in the already cited blog post about the bivariate normal distribution. The last proposition is not so obvious, but I won’t demonstrate it here, and those who are interested in its proof, can contact me.

Prop 1_2.PNG
Figure1. The domains of the bijective function Y = Φ(X).

PROPOSITION 1 (change of variables). Given the continuous random vector X = (X_1, X_2, ..., X_m) and the bijective function Y = Φ(X) (figure 1), where Y is a vector with m dimensions, then the joint density of Y can be expressed through the joint density of X:

1)

Prop 1_1.PNG

PROPOSITION 2 (symmetrical matrix). Given the symmetrical matrix C, we can always write:

2)

Prop 1_3.PNG

where \lambda_1, \lambda_2, ..., \lambda_m are the eigenvalues of matrix C and the columns of P are the respective eigenvectors. It is also easy to see that for the inverse matrix of C we have:

3)

Prop 1_4.PNG

Moreover, the quadratic form associated with the inverse matrix is

4)

Prop 2_3.PNG

where

5)

Prop 2_4.PNG

PROPOSITION 3 (covariance matrix). If C is the covariance matrix of the random vector X = (X_1, X_2, ..., X_m ), which means that

6)

Prop 3_1.PNG

then, with the positions made in Prop. 2, we have

7)

Prop 3_2.PNG

where \sigma_j is the standard deviation of X_j and \rho_{ij} is the correlation coefficient between X_i and X_j.

PROPOSITION 4 (some integrals). It is possible to calculate the integrals in the following table. Those who are interested in how to calculate the table can contact me.

exponential integral 4

PROPOSITION 5 (other integrals). It is possible to calculate the two following integrals from the table. Those who are interested in how to calculate them can contact me.

8)

exponential integral 3

PROPOSITION 6 (sum of normal random variables). Given the random vector X = (X_1, X_2, ..., X_m) whose components are normally distributed, then the density of the random variable Y = X_1 + X_2 + ... + X_m is a normal law whose average and standard deviations are respectively given by:

extra.PNG

Multivariate normal distribution

PROPOSITION 7. The joint probability density in the case of a random vector whose m components follow a normal distribution is:

9)Prop 6_1.PNG

Demonstration, first part. The aim of this proof will be to demonstrate that if we calculate the marginal distribution of X_i from the given joint distribution, we obtain a normal distribution with an average given by \mu_i. Moreover, we will prove that if we use this joint distribution to calculate the covariance between X_i and X_j, we obtain \sigma_i\sigma_j\rho_{ij}. We start operating the following change of variables:

10)  Prop 6_2.PNG

whose Jacobian is the identity matrix. So we obtain for the joint density in Eq. 9 the expression:

11)

Prop 6_5.PNG

We then consider the substitution

12)

Prop 6_3.PNG

whose Jacobian is the determinant of P which is again the identity matrix, since P is an orthogonal matrix (P is the matrix introduced in Prop. 2, whose columns are eigenvectors of the covariance matrix). Then we have

Prop 6_6.PNG

And, according to Prop. 1, we obtain for the joint distribution in Eq. 9 the expression:

13)

prop 6_4

So, the marginal distribution of the first random variable is

prop 6_7

We recognize the integrals in Prop. 4, for n = 0. So we have for the marginal distribution:

14)

Prop 6_9.PNG

while the joint distribution becomes

15)

prop 6_8

Let’s now consider another change of variable, the following one:

16)Prop 6_10.PNG

whose Jacobian is given by:

Prop 6_11.PNG

Then, according to Prop. 1, we have

Prop 6_12.PNG

This proves that the variables {X''}_1, {X''}_2, ... , {X''}_m are independent. But they are also normally distributed random variables whose average is zero and whose standard deviation is

Prop 6_13.PNG

for i that goes from 1 to m. Since we have

Prop 6_14.PNG

we can calculate the marginal distribution of ξ_j according to Prop. 6:

17)

Prop 6_15.PNG

Remembering the very first substitution (Eq. 10) we then draw the following conclusion:

18)Prop 6_16.PNG

Now, if you remember Prop. 3, you can easily conclude that the marginal density of X_j is, in fact, a normal distribution with average given by \mu_j and standard deviation given by \sigma_j. This concludes the first part of the demonstration. It is worth noting that we have calculated, in the previous lines, a very complex integral (the first collected in the following paragraph), and we can be proud of ourselves.

Demonstration, second part. We have now to prove that the covariance coefficient of between X_i and X_j, is given by \rho_{ij}. In order to do that, we’ll calculate the covariance between X_i and X_j with the formula

19)   Cov[X_i,X_j] = E[X_i,X_j]E[X_i]E[X_j] = E[X_iX_j]\mu_i,\mu_j

For E[X_iX_j] we have

Prop 6_18.PNG

Considering the substitution in Eq. 10 we have

prop 6_19
prop 6_20

To simplify the writing, let’s assume i=1 and j=2. For I_1 we have:

prop-6_21.png

Now, considering again Prop. 4, we easily recognize that:

20)

Prop 6_22.PNG

For I_1 we then have:

21)

Prop 6_23.PNG

For I_2 we have:

Prop 6_24.PNG

So, I_2 is zero and the same applies to I_3, as the reader can easily discover by herself, using Eq. 20. Hence, we have found:

Prop 6_25.PNG

Now, just consider Eq. 7 (the second one) in Prop. 3, and you will recognize that we have found

Prop 6_26.PNG

which is exactly what we were trying to demonstrate. The reader has likely realized that we have just calculated another complex integral, the second one in the following paragraph. It can be also verified that the joint density is, in fact, a density: in order for that to be true it must be

prop 6_29

Now, if we use the substitutions in Eq. 10 and in Eq. 12 we obtain:

Prop 6_30.PNG

And our proof is now complete.

Integrals

Prop. 7 is nothing more than the calculation of three very complex integrals. I have collected these results in what follows. Consider that you can substitute the covariance matrix with any symmetrical one, and these formulae still hold.

prop 6_17
Prop 6_27.PNG
Prop 6_28.PNG
Advertisement

One thought on “Multivariate normal distribution, a proof of existence

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s