Ratio between two Random Variables: the case of the Beta Distribution

Ratio between two Random Variables: the case of the Beta Distribution

We have here a paradox: what theory is it that is not correct unless it leaves open the possibility that it may be incorrect? Answer: the theory of probability.

P.W. Bridgman, The Nature of the Physical Theory

While reasoning on one of the biological problems I have been spending most of my time on, I encounterd a random variable that was the ratio between two random variables with a Beta distribution. Let’s say that X_1\,\sim\,B(\alpha_1,\beta_1) and X_2\,\sim\,B(\alpha_2,\beta_2). Then, if we define the random variable Y\,=\,\frac{X_1}{X_2}, what is its density? What about its cumulative distribution function? And what can we say about its moments? I answered all these questions in this paper. The density, in particualr, is given by:

On the exact same subject there is a 1999 paper I am currently comparing to mine (Pham-Gia 1999). While writing my own paper I did not check for other works on the same topic, but if I had had to bet I would have said that this calculation had been performed a century ago; so I was astonished when I discovered that this density was first published only 20 years ago! This is further proof of how recent the application of calculus to statistics is.

PS. I have just checked the density by Thu Pham-Gia: it is equivalent to the one I found, even though they appear different at a first glance. I have added a paragraph to my paper to show the equivalence between these two densities. It is also worth mentioning that in addition, I calculated the cumulative distribution function, the expectation, the second-order moment, and the variance, while Pham-Gia presented the density along with related subjects and applications, in his 1999 paper.

P-value threshold for genetic studies

P-value threshold for genetic studies

“This it seems to me is characteristic of most judgments involving physical processes — the law of the excluded middle is not a valid description of our actual physical experience — there has to be a third category of doubtful, in addition to positive or negative.”

P. W. Bridgman, The Nature of Physical Theory

1) The multiple testing issue

With the emergence of titanic biological datasets generated by the latest omics technologies, the problem of correction of the significance threshold (the conventional \alpha = 0.05) for multiple testing has required a careful revision and update. The simplest way to correct for multiple testing is the Bonferroni correction (a derivation is provided in par. 5): in this method, the threshold for statistical significance is given by

1)\;\;p\,<\,\frac{\alpha}{m}

where the denominator represents the number of independent variables. Now, in the case of genetic testing, how should we set the value of m? Let’s say that we are comparing two populations of individuals (one with a disease, the other made up by healthy subjects) and we have genotyped N variants in each one of them. Once we have associated a p value to every single variant, we must face the problem of correcting it for multiple testing by using Eq. 1. What should we do? Well, it is simple, we put m\,=\,N, right? Wrong!

2) Linkage disequilibrium

In general, variants belonging to the same chromosome are not inherited in an independent fashion: we know that whole chunks of chromosomes of about 50 kb pass from one generation to the other. These blocks are called haplotypes and there has been a considerable effort in classifying them for various human populations in the last 20 years. A map of human haplotypes is now available (HapMap) and it allows for the calculation of the real number of independent variants (often called tag SNPs) on the human Genome, and for other pivotal purposes I won’t write here about, like for instance imputation (a technique that dramatically reduces the number of variants we must genotype in order to scan the genome of a human being) (The International HapMap Consortium 2005), (The International HapMap Consortium, 2009). Visit the page of the International HapMap Consortium here. For us, what is important here is the fact that the knowledge of the haplotypes gives us a measure of the linkage disequilibrium between any possible pair of variants on the same chromosome. But what is linkage disequilibrium? Let’s say that we have two loci L_1 and L_2 on the same chromosome. L_1 has the two alleles A and a, while L_2 presents the two alleles B and b of frequencies f(A),\,f(a),\,f(B),\,f(b), respectively. Let’s define now the random variable X whose value is one if L_1 is occupied by A, zero otherwise; similarly, let’s define the random variable Y such that it is one if L_2 is occupied by B, zero otherwise. We can say that both these variables have a Bernoulli distribution with expectations given by E[X]\,=\,f(A) and E[Y]\,=\,f(B), respectively. If we now define f(A,\,B) the frequency of the haplotype AB, and if we similarly define f(A,\,b),\,f(a,\,b),\,f(a,B), we can then write the following relationships:

2)\;\;\begin{cases} f(Ab)\,=\,f(A)\,-\,f(aB)\\f(aB)\,=\,f(B)\,-\,f(AB)\\f(ab)\,=\,1\,-\,f(B)\,-\,f(A)\,+\,f(AB)\\f(AB)\,+\,f(Ab)\,+\,f(aB)\,+\,f(ab)\,=\,1\\f(A)\,+\,f(a)\,=\,f(B)\,+\,f(b)\,=\,1 \end{cases}\

where we have considered that f(A,\,B)\,=\,E[XY], f(A,\,b)\,=\,E[X(1-Y)]\,= E[X]\,-\,E[XY]\,.... That said, the covariance between X,\,Y is given by:

3)\;\; Cov[X,\,Y]\,=\,E[X\,Y]\,-\,E[X]E[Y]\,=

\,=\, f(AB)\,-\,f(A)f(B)\,=\,f(Ab)\,-\,f(A)f(b)\,=

=\,f(aB)\,-\,f(a)f(B)\,=\,f(ab)\,-\,f(a)f(b)

Or also (with some algebraic manipulation, by substituting the relationship of Eq. 2):

4)\;\;Cov[X,\,Y]\,=\,f(AB)f(ab)\,-\,f(Ab)f(aB)

That said, one definition of Linkage disequilibrium is just

5)\;\;d\,=\,Cov[X,\,Y]

Figure 1. Left: the values admitted for d are those that fall within the shell; in particular, for f(aB) = f(Ab) = 0 (coupling) we have that d assumes the values of the curve coloured blue, while for f(AB) = f(ab) = 0 (repulsion) then d describes the curve in orange. These two curves are plotted in Figure 2, too.

This definition presents a problem, namely the fact that when we have a complete dependence between L_1 and L_2, its value varies in ]0,0.25] (see Figure 1, Figure 2, and paragraph 6 for a detailed discussion) and this does not seem to have a particular biological meaning. It would be more interesting to have a definition such that Linkage disequilibrium assumes a unique value (for example 1) when we have complete dependence between the two loci. Note that we can accomplish this by considering the correlation coefficient between X\,=\,Y instead, which is given by:

6)\;\;\rho_{XY}\,=\,\frac{Cov[X,\,Y]}{\sqrt {Var(X)Var(Y)}}\,=\frac{d}{\sqrt {Var(X)Var(Y)}},

Thus, in the present article we consider the following definition for linkage disequilibrium:

7)\;\;linkage\,disequilibrium\,=\,\rho^2\,=\,\frac{d^2}{f(A)f(a)f(B)f(b)}

where we have considered that Var[X]\,=\,E[X](1-E[X])\,=\,f(A)f(a) and Var[Y]\,=\,E[Y](1-E[Y])\,=\,f(B)f(b). For further details on this topic, go to paragraph 6. See also (Reginald D Smith 2020).

Figure 2. For f(aB) = f(Ab) = 0 (coupling) we have that d assumes the values of the curve coloured blue, while for f(AB) = f(ab) = 0 (repulsion) then d describes the curve in orange.

3) The search for the number of independent loci

Consider now that the number of loci in linkage equilibrium one with the other is the value m in Eq. 1, the one we are interested in. From an operational point of view, how can we determine if two loci are in linkage equilibrium so that they can be considered independent one from the other? We can randomly build the genome of 2,000 human beings with the only restrain that haplotypes must be respected (this is where the HapMap comes into play), we then randomly assign them to two populations and we calculate for each locus the p value for the comparison of these two populations. Let’s say that locus L_i has a p value p_i. Since these two groups are identical by construction, it must be p\,>\,\frac{\alpha}{m} or, in other words:

8)\;\;m\,>\,\frac{\alpha}{p_i}

If we repeat this inequality for each locus, we have a lower bound for m. That number is the number of independent loci we were searching for (Pe’er, I. et al. 2008).

4) Experimental results

I discuss here the calculation of m performed in a recent study, the best I have been able to find on this topic (Fadista, J. et al. 2016). The strength of this study is that they calculated m for two kinds of sequencing techniques: whole-genome sequencing (WGS) and whole-exome sequencing (WES). Not only that, but they also provided the results for different European and non-European populations, for different definitions of linkage disequilibrium, and for different thresholds for minor allele frequency (MAF). These results are plotted in their paper and they are also available in xlsx format (here, even though I must say that they omitted part of raw data on different ethnicities). Nevertheless, I have elaborated them in my own way (the script that draws the diagrams is available below) by using a different arrangement of the data and by providing a fancy polynomial interpolation, by cubic splines, as to have smooth and elegant surfaces and curves. In Figure 3 I plotted m in function of the threshold for the minor allele frequency (MAF) and for the linkage disequilibrium, in the case of whole-genome sequencing (WGS). In that plot, it is assumed that MAF is the lower bound for the minor allele frequency while the axis ‘Linkage disequilibrium’ is the upper limit for the value \rho^2 in Eq. 7. Figure 4 is exactly the same diagram, but in this case, it has been built by considering data from whole-exome sequencing (WES). Some observations: the number of independent variants is bigger in WGS than it is in WES (which is due to the fact that in the former case we are considering more genetic material than in the latter one); when we reduce MAF, m increases (reducing MAF means considering more variants, so again, this leads to a high number of independent variants); m decreases with Linkage disequilibrium (the threshold for considering two variants as independent one from the other). This last point deserves a further explanation: in this context, when we say that LD is, let’s say, 0.8 it means that we consider two variants to be independent when their linkage disequilibrium (calculated as in Eq. 7) is below 0.8; as we reduce this value, we define a more stringent criterion for independence between pair of variants, thus the number m must decrease.

Figure 3. The number of independent variants in whole-genome sequencing, in function of the lower limit for minor allele frequency and of the upper limit for Linkage disequilibrium.
Figure 4. The number of independent variants in whole-exome sequencing, in function of the lower limit for minor allele frequency and of the upper limit for Linkage disequilibrium.
MAFmplog_{10}\frac{1}{p}
\geq\,0.05 782,3616.39\cdot10^{-8} 7.19
\geq\,0.01 1,580,6503.16\cdot10^{-8} 7.50
\geq\,0.005 2,093,8872.39\cdot10^{-8} 7.62
\geq\,0.001 4,126,1251.21\cdot10^{-8} 7.92
Table 1. This table gives the values m, p in function of the lower limit for minor allele frequency, in the case of whole-genome sequencing. It is assumed that \rho^2\,\leq\,0.8 .

If we consider the upper limit for \rho^2 to be 0.8, we obtain Table 1 and Table 2, where both m and corresponding Bonferroni corrections are collected, for various MAF thresholds.

MAFmplog_{10}\frac{1}{p}
\geq\,0.05 39,3831.27\cdot10^{-6} 5.90
\geq\,0.01 69,2657.22\cdot10^{-7} 6.14
\geq\,0.005 86,9275.75\cdot10^{-7} 6.24
\geq\,0.001 161,8333.09\cdot10^{-7} 6.51
Table 2. This table gives the values m, p in function of the lower limit for minor allele frequency, in the case of whole-exome sequencing. It is assumed that \rho^2\,\leq\,0.8 .

In Figures 5, 6 you can find these same data after interpolation, arranged as a curve where p is expressed in function of m and MAF.

Figure 5. The corrected threshold for statistical significance in function of the number of independent variants and of the lower limit for minor allele frequency, for WGS.
Figure 6. The corrected threshold for statistical significance in function of the number of independent variants and of the lower limit for minor allele frequency, for WES.

5) How to derive the Bonferroni correction

If we have a threshold \alpha' for statistical significance and we perform m tests, then the probability that one test is positive by chance is P\,<\,\alpha' and the probability that one test is negative by chance is 1\,-\,P\,>\,1\,-\,\alpha'. This also means that the probability that all the m tests are negative is (1\,-\,P)^m\,>\,(1\,-\,\alpha')^m. In conclusion, the probability that at least one test is positive pureley by chance is

9)\;\;p\,=\,1\,-\,(1\,-\,P)^m\,<\,1\,-\,(1\,-\,\alpha')^m

We want this probability to be lower than \alpha\,=\,0.05 , which leads to the following inequality:

10)\;\;\alpha'\,<\,1\,-\,\sqrt[m]{1\,-\,\alpha}

This value \alpha' is the corrected threshold for statistical significance, accorrding to the Bonferroni correction. But this is not the correction in Eq. 1, you might note. Yes, that is right. But let’s consider the well known expansion by series:

11)\;\;(1\,-\,\alpha)^{\frac{1}{m}}\,=\,\sum_{k\,=\,0}^{\infty}(-\,\alpha)^k\begin{pmatrix} \frac{1}{m}\\k \end{pmatrix}

If we take only the first two addends of the sum and if we substitute them in the expression already found for \alpha', we obtain Eq. 1. The error (its absolute value) that we make when we consider Eq. 1 instead of Eq. 8 is the one plotted in Figure 7, in function of m, for \alpha\,=\,0.05 .

Figure 7. The absolute value of 1\,-\,\sqrt[m]{1\,-\,\alpha}\,-\,\frac{\alpha}{m} in function of m for \alpha\,=\,0.05

6) Mathematical notes on linkage disequilibrium

By substituting the fourth relationship of Eq. 2 in Eq. 4, we obtain the surface

f(ab)\,-\,f(ab)f(Ab)\,-\,f(ab)f(aB)\,-

-\,f^2(ab)\,-\,f(Ab)f(aB)\,-\,d\,=\,0

which is a quadratic form in the space of the frequencies of the haplotypes ab, Ab, aB. More precisely, and even though this is of no importance for us, this is a hyperbolic cylinder (as the reader can easily verify by studying its associated matrix). If we now ask that f(Ab)\,\in\,[0,1] and f(AB)\,\in\,[0,1] for f(aB)\,\in\,[0,1], f(ab)\,\in\,[0,1], we find the relationships

\begin{cases} d\,\geq\,-f(aB)\,+\,f(aB)f(ab)\,+\,f(aB)^2\\d\,\leq\,f(ab)\,+\,f(aB)f(ab)\,+\,f(aB)^2\\d\,\geq\,-\,f(aB)\,-\,f(aB)f(ab)\,-\,f(ab)^2\\d\,\leq\,f(ab)\,-\,f(aB)f(ab)\,-\,f(ab)^2\\f(aB)\,\in\,[0,1],\,f(ab)\,\in\,[0,1]\end{cases}

Taken togheter, these inequations say that d can assume only the values included within the elegant surface in Figure 1. In particular, when f(Ab)\,=\,f(aB)\,=\,0 (a condition of perfect disequilibrium which goes under the name of coupling), then d assumes the values of the curve coloured blue, while when f(AB)\,=\,f(ab)\,=\,0 (repulsion) then d assumes the value on the curve coloured orange. These two courves are plotted in Figure 2 too. Note that f(Ab)\,=\,0\,\Rightarrow\,f(aB)\,=\,0; moreover, f(AB)\,=\,0\,\Rightarrow\,f(ab)\,=\,0 (this can be proved easily by using Eq. 2).

Note that the application of the method of the Lagrange multipliers for restrained optimization to Eq. 4, with the restrains defined by the fourth relationship of Eq. 2, would not lead to any conclusion in this case, because it is equal to the search for the extremes of a function whose gradient is never zero.

7) Supplementary Material

The script that plots Figures 1 and 2 can be downloaded here.

The script that plots Figures 3, 4, and 5 can be downloaded here.

The script that plots Figure 7 can be downloaded here.

Data from (Fadista, J. et al. 2016) can be downloaded here.


The equations of this blog post were written using \LaTeX (see this article)

Testing the prophecy of Babylon

Testing the prophecy of Babylon

Some days ago it came to my attention the fact that there is a religious group that considers the presence of propositions within the Old Testament that seem to describe events that have happened after the propositions were written as proof of the divine origin of such a text. Not only that: to my understanding, they seem to infer from those successful prophecies that also the other predictions present in the Bible must turn out to happen somewhere in the future.

Now, in these statements, there are several problems. One can note, for instance, that successful scientific theories like the Newtonian gravitation continuously predict with great accuracy the future, and yet this doesn’t seem to be generally considered proof that Isaac Newton had a divine nature. On the other hand, the fact that his gravitational law predicts the orbits of the Earth, of Mars, of the modules of the Apollo missions, etc., doesn’t prevent the same theory to fail when it comes to the orbit of Mercury. So, this single counter-example might be considered enough to prove that the first paragraph of this article contains false statements.

But then there is another problem. How do I evaluate whether a prophecy is true or false? Despite the obvious difficulties due to the fact that this kind of predictions are expressed in a human language (translated from one tongue into another) and not in mathematical language and as such ambiguous by its very nature, one can still try – I reasoned – to apply the standard method used in science. In other words, one can calculate the probability of the null hypothesis (that, in this case, is: the prophecy is true by chance) and see what kind of number he gets.

Well, I discovered right away that this calculation had in fact been attempted by Peter Stoner, a math teacher at Pasadena City College (R), and it can be found in his book, Science Speaks (freely available here). Let’s consider one of those prophecies, the one that seems to be particularly close to the heart of Marina, the Jehovah’s Witness who is currently persecuting me, the prophecy of Babylon:

And Babylon … shall never be inhabited, neither shall it be dwelt in from generation to generation: neither shall the Arabian pitch tent there; neither shall the shepherds make their fold there. But wild beasts of the desert shall lie there; and their houses shall be full of doleful creatures.

(Isa. 13:19-21 – written 712 B.C.)

And they shall not take of thee a stone for a corner, nor a stone for foundations; but thou shalt be desolate forever, saith the Lord … Neither doth any son of man pass thereby.

(Jer. 51:26,43 – written 600 B.C.)

Now, in his book, Peter Stoner translates this prophecy in the following seven events or propositions:

E_1: Babylon shall be destroyed. P(E_1)=1/10
E_2: It shall never be reinhabited (once destroyed). P(E_2)=1/100
E_3: The Arabs shall not pitch their tents there (once destroyed). P(E_3)=1/200
E_4: There shall be no sheepfolds there (once destroyed). P(E_4)=1/5
E_5: Wild beasts shall occupy the ruins (once destroyed). P(E_5)=1/5
E_6:  The stones shall not be taken away for other buildings (once destroyed). P(E_6)=1/100   
E_7: Men shall not pass by the ruins (once destroyed). P(E_7)=1/10

I have added the expression within brackets, to enhance clarity. The probabilities of the single events (on the right) are those proposed by Peter Stoner, and I am not going to discuss them. Now, if we indicate the whole prophecy with E, according to Stoner the probability of the null hypothesis is

P(E) = P(\bigcap_{i=1}^{7}E_i) = \prod_{i=1}^{7} P(E_i) = \frac{10^ { -9 } }{5}

This is a really small probability for the null hypothesis, so we must accept that a power beyond human comprehension has… But let’s read again the prophecy: it seems articulated but, in fact, it only says that the place where Babylon once raised will be basically abandoned by human beings. In other words, it seems reasonable to say that E_i\subset E_7 for i = 2, 3, 4, 6 . If that is the case, then we have

P(E) = P(\bigcap_{i=1}^{7}E_i) =  P( E_1 \bigcap( \bigcap_{i=2}^{7}E_i) ) =

= P( E_1 \bigcap( \bigcap_{i=2}^{6}(E_i \bigcap E_7) ) \bigcap (E_5 \bigcap E_7) ) =

= P( E_1 \bigcap  E_7  \bigcap ( E_5  \bigcap E_7 ) ) =

= P( E_1 \bigcap ( E_5 \bigcap E_7 ) ) = P( E_1)P ( E_5 \bigcap E_7 )

A further observation is that if the place is desolated with no human beings (E_7) then it is reasonable to assume that it becomes the reign of wild animals. In other words: P(E_5|E_7) >  P(E_5)  . Not only that, I would guess that it is safe to assume that P(E_5|E_7) is about one. Then we have found

P(E) =  P( E_1) P(E_5|E_7) P(E_7) =  P( E_1) P(E_7) =  \frac{1}{10^2}

In other words, the mistake that leads Stoner to a completely wrong value for P(E) is the fact that he considered the events as independent one from the other, while this is obviously not the case.

Now, is this the probability of the null hypothesis? Well, it depends, because this is a case in which we have a prediction that has been got from a very long book with thousands of propositions, some of which look very much like predictions. Now, of course, when one picks a prophecy among all these propositions, he might be unconsciously tempted to pick the one that looks more like a fulfilled prophecy. In other words, we have to check for multiple comparisons in this case. So, let us consider that we have a number of N propositions similar to the one about Babylon. The probability p(N) that at least one of these propositions is true purely by chance is

p(N) =  1 -  \frac{99^N}{100^N}

The function p(N) is plotted by the following code and as you can see, for N >4 we have that the null hypothesis becomes very likely. In other words, if we pick this prophecy among 5 other similar sentences, its resemblance to reality is just due to chance. In the figure below the red line indicates a probability of 0.05. A probability equal to or higher than 0.05 is associated with null hypotheses that must be considered true.

Figure. Probability that at least one among N prophecies is true purely by chance, for increasing values of N. The red line indicates a probability of 0.05 above which the null hypothesis must be considered true.

It should be added that the fact that the prophecy of Babylon has to be considered true is highly questionable. The reader can do his own research on that, for example starting from the notes collected in this text.

% file name = Babylon
% date of creation = 20/03/2021
% it calculates the probability that N prophecies are true by chance
clear all
% array for p(N)
N = 30;
for i=1:N
p(i) = 1-(99/100)^i;
p2(i) = 0.05;
endfor
% plotting
plot ([1:N],p(:),’-k’, “linewidth”, 2)
hold on
plot ([1:N],p2(:),’-r’, “linewidth”, 2)
xlabel(‘number of prophecies’);
ylabel(‘probability of the null hypothesis’);
grid minor
grid on

My first time with LaTeX

Each mathematical formula I have used in my blog till now was an image: I wrote them in a Word document using the equation editor by Microsoft (that I like very much), then I captured the screen and saved it as an image, and then I uploaded them in my blog.

In the past, I tried to find a way to write them as text, without succeeding. But it is in fact possible, by using the LaTeX syntax and a few HTML commands.

These are the first “true” mathematical formulae present in this blog. I will use this page to test the LaTeX syntax for mathematical expressions (a handbook is available here). In the table that follows, I present a series of relevant examples of LaTeX syntax. A useful resource for testing out LaTeX instructions is this website.

OutputString (to be included between ‘$latex’ and ‘$’)
\mp \mp&s=3
\cdot \cdot&s=3
\neq \neq&s=3
\leq \leq&s=3
\geq \geq&s=3
\sim \sim&s=3
\ll \ll&s=3
\in \in&s=3
\infty \infty&s=3
\Rightarrow \Rightarrow&s=3
\Leftarrow \Leftarrow&s=3
f_{X_1}f_{X_1}&s=2
\sqrt{b^2\,-\,4ac}\sqrt{b^2\,-\,4ac}&s=2
\frac{\partial \Psi(x,y)}{\partial x}\frac{\partial \Psi(x,y)}{\partial x}&s=3
\sum_{k\,=\,0}^{\infty}\sum_{k\,=\,0}^{\infty}&s=2
\int_{A_1}\frac{\partial \Psi(x,y)}{\partial x}dx\int_{A_1}\frac{\partial \Psi(x,y)}{\partial x}dx&s=2
\int_{A_1}\int_{A_2}\frac{\partial \Psi(x,y)}{\partial x}dxdy\int_{A_1}\int_{A_2}\frac{\partial \Psi(x,y)}{\partial x}dxdy&s=2
i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>&s=2
\begin{cases}x = b\\ y = 1 + x\end{cases}\begin{cases}x = b\ \y = 1 + x\end{cases}&s=2
\begin{matrix} f(Ab)\,=\,f(A)\,-\,f(AB)\\f(aB)\,=\,f(B)\,-\,f(AB)\end{matrix}\begin{matrix} f(Ab)\,=\,f(A)\,-\,f(AB)\\f(aB)\,=\,f(B)\,-\,f(AB)\end{matrix}&s=2
\begin{pmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{pmatrix}\begin{pmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{pmatrix}&s=2
\begin{bmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{bmatrix}\\begin{bmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{bmatrix}&s=2
\begin{Bmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{Bmatrix}\begin{Bmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{Bmatrix}&s=2
\begin{vmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{vmatrix}\begin{vmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{vmatrix}&s=2
\begin{Vmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{Vmatrix}latex \begin{Vmatrix} f(Ab)\,&\,f(A)\\f(aB)\,&\,f(B)\end{Vmatrix}&s=2

A Mathematical model for the diffusion of Coronavirus 19 among the Italian population

A Mathematical model for the diffusion of Coronavirus 19 among the Italian population

Abstract

In this document, I propose a distribution for the number of infected subjects in Italy during the outbreak of Coronavirus 19. To do that I find a logistic curve for the number of deaths due to this virus, in Italy. I also use a density of probability for the fatality rate and one for the number of days from infection to death. These functions have been built using recently published statistical data on the Chinese outbreak.   

Go to the article

Maximum of a normal random vector

Maximum of a normal random vector

Ettore Majorana hadn’t got MATLAB

When Ettore Majorana first met Enrico Fermi, between the end of 1927 and the beginning of 1928, Fermi – who was already an acclaimed scientist in the field of nuclear physics – had just solved an ordinary differential equation of the second-order (whose solution is now commonly named the Thomas-Fermi function) – by numerical integration. It required a week of assiduous work for him to accomplish this task, with the aid of a hand calculator. Fermi showed the results (a table with several numbers) to Majorana, who was a 21 years old student of electrical engineering who had some vague idea of switching from the field of boring ways of providing electrical energy for boring human activities, to the quest for the intimate structure of matter, under the guide of Fermi, the brightest Italian scientific star of that period.

Majorana looked at the numerical table, as I wrote, and said nothing. After 24 hours he came back to Fermi’s lab and compared his own results with the table made by Fermi: he concluded that Fermi didn’t make any mistake, and he decided that it could be worth working with him, so he switched from engineering to physics (Segrè E. 1995, page 69-70).

Only recently it has been possible to clarify what kind of approach to the equation Majorana had in those hours. It is worth mentioning that he not only was able to prove that the equation admitted in fact a unique solution (this is not trivial, because the theorem of Cauchy can not be applied directly to that equation and Fermi did not cared to solve this problem); he also solved the equation in a semianalytic way (by series expansion), with a method that has the potential to be generalized to a whole family of differential equations and that has been published only 75 years later (Esposito S. 2002), after the discovery of the notes Majorana wrote in those hours. He used this solution, not a numerical one, to derive a table that could be used for a comparison with the solution by Fermi. This mathematical discovery has been possible only because the notes that Majorana wrote in those two days have been found and studied by Salvatore Esposito, with the help of other physicists.

I won’t mention here the merits that Majorana has in theoretical physics, mainly because I am very very far from understanding even a bit of his work. But as Erasmo Recami wrote in his biography of Majorana (R), a paper published by Majorana in 1932 about the relativistic theory of particles with arbitrary spin (Majorana E. 1932) contained a mathematical discovery that has been made independently in a series of papers by Russian mathematicians only in the years between 1948 and 1958, while the application to physics of that method – described by Majorana in 1932 – has been recognized only years later. The fame of Majorana has been constantly growing for the last decades.

The notes that Majorana took between 1927 and 1932 (in his early twenties) have been studied and published only in 2002 (Esposito S. et al. 2003). These are the notes in which the solution of the above-mentioned differential equation has been discovered, by the way. In these 500 pages, there are several brilliant calculations that span from engineering to statistics, from advanced mathematical methods for physics to, of course, theoretical physics. Some notes on classical mechanics are also included. In what follows I will go through what is probably the least difficult and important page among them, the one where Majorana presents an approximated expression for the maximum value of density of the largest of the components of a normal random vector. I have already written in this blog some notes about the multivariate normal distribution (R). But how can we find the maximum component of such a vector and how does it behave? Let’s assume that each component has a mean of zero and a standard deviation of one. Then we easily find  that the analytical expressions of the cumulative distribution function and of the density of the largest component (let’s say Z) of an m-dimensional random vector are

Eq 1.JPG

We can’t have an analytical expression for the integral, but it is relatively easy to use Simpson’s method (see the code at the end of this paragraph) to integrate these expressions and to plot their surfaces (figure 1).

surfaces.JPG
Figure 1. Density (left) and cumulative distribution function (right) for the maximum of the m components of a normal random vector. Numerical integration with MATLAB (by Paolo Maccallini).

Now, what about the maximum reached by the density of the largest among the m components? It is easy, again, using our code, to plot both the maximum and the time in which the maximum is reached, in function of m (figure 2, dotted lines). I have spent probably half an hour in writing the code that gives these results, but we usually forget how fortunate we are in having powerful computers on our desks. We forget that there was a time in which having an analytical solution was almost the only way to get mathematical work done. Now we will see how Majorana obtained the two functions in figure 2 (continuous line), in just a few passages (a few in his notes, much more in mine).

maximum.JPG
Figure 2.  The time in which Z reaches its largest value (left) and the largest value (right) in the function of the dimension of the normal random vector. The two curves in dotted lines are obtained through numerical integration (Simpson’s method) while the continuous lines are the function obtained by Majorana in an analytical way.
% file name = massimo_vettore_normale
% date of creation = 22/05/2019

clear all
delta = 0.01;
n(1) = 0.

for i=2:1:301;
  n(i) = delta + n(i-1);
  n_2(i) = - n(i);
end

for i=1:1:301
  f(i) = 0.39894228*( e^(  (-0.5)*( n(i)^2 )  ) );
end

for i=1:1:3
  sigma(1) = 0.;
  sigma(3) = sigma(1) + delta*( f(1) + ( 4*f(2) ) + f(3) )/3;
  sigma(2) = sigma(3)*0.5;
  for j=2:1:299
    sigma(j+2) = sigma(j) + delta*( f(j) + ( 4*f(j+1) ) + f(j+2) )/3;
  end  
end

for i=1:1:301
  F(i) =  0.5 + sigma(i);
  F_2(i) = 1-F(i);
end

for i=1:1:100;
  m(i) = i;
end

for i=1:1:301
  for j=1:1:100
    F_Z (i,j) = F(i)^j;
    F_Z_2 (i,j) = F_2(i)^j;
    f_Z (i,j) = 0.39894228*j*( F(i)^(j-1) )*( e^(  (-0.5)*( n(i)^2 )  ) );
    f_Z_2 (i,j) = 0.39894228*j*( F_2(i)^(j-1) )*( e^(  (-0.5)*( n(i)^2 )  ) );
  endfor
endfor

figure (1)
mesh(m(1:2:100),n(1:10:301),F_Z(1:10:301,1:2:100));
grid on
hold on
mesh(m(1:2:100),n_2(2:10:301),F_Z_2(2:10:301,1:2:100));
xlabel('m');
ylabel('t'); 
legend('F',"location","NORTHEAST");

figure (2)
mesh(m(1:2:100),n(1:10:301),f_Z(1:10:301,1:2:100));
grid on
hold on
mesh(m(1:2:100),n_2(2:10:301),f_Z_2(2:10:301,1:2:100));
xlabel('m');
ylabel('t'); 
legend('f',"location","NORTHEAST");

Asymptotic series

I have always been fascinated by integrals since I encountered them a lifetime ago. I can still remember the first time I learned the rule of integration by parts. I was caring for my mother who was dying. That night I was in the hospital with her, but she couldn’t feel my presence, she had a tumour in her brain and she was deteriorating. And yet I was not alone, because I had my book of mathematics and several problems to solve. But when my mind was hit by the disease for the first time, about a year later, and I lost the ability to solve problems, then real loneliness knocked at my door.

Now, why am I talking about the integration by parts? Well, I have discovered a few days ago, while studying Majorana’s notes, that integration by parts – well known by students to be a path towards recursive integrations that usually leads to nowhere – is, in fact, a method that can be useful for developing series that approximate a function for large values of x (remember that Taylor’s polynomials can approximate a function only for values of x that are close to a finite value x_0, so we can’t use them when x goes to ∞). Majorana used one such a series for the error function. He developed a general method, which I tried to understand for some time, without being able to actually get what he was talking about. His reasoning remained in the back of my mind for days, while I moved from Rome to Turin, where I delivered a speech about a paper on the measure of electric impedance in the blood of ME/CFS patients; and when I cried, some minutes later, looking at my drawings put on the screen of a cinema, Majorana was with me, with his silence trapped behind dark eyes. A couple of days later, I moved to a conference in London, searching for a cure that could perhaps allow my brain to be normal again and I talked with a famous scientist who worked on the human genome project. Majorana was there too, in that beautiful room (just a few metres from Parliament Square), sitting next to me. I could feel his disappointment, I knew that he would have found a cure, had he had the chance to examine that problem. Because as Fermi once said to Bruno Pontecorvo, “If a problem has been proposed, no one in the world can resolve it better than Majorana” (Esposito S. et al. 2003). Back in Rome, I gave up with the general method by Majorana and I found the way to calculate the series from another book. The first tip is to write the error function as follows:

Eq 2.JPG

Now by integrating by parts, one gets

Eq 3.JPG

But we can integrate by parts one other time, and we get

Eq 4.JPG

And we can go on and on with integration by parts. This algorithm leads to the series

Eq 5.JPG

whose main property is that the last addend is always smaller (in absolute value) than the previous one. And even though this series does not converge (it can be easily seen considering that the absolute value of its generic addend does not go to zero for k that goes to ∞, so the Cauchy’s criteria for convergence is not satisfied) it gives a good approximation for the error function. From this series, it is easy to calculate a series for the Gaussian function (which is what we are interested in):

Eq 6.JPG

A clever way to solve a transcendental equation if you don’t want to disturb Newton

Taking only the first two terms of the series, we have for the cumulative distribution function of Z the expression:

Eq 7.JPG

The further approximation on the right is interesting, I think that it comes from a well-known limit:

Eq 8.JPG

Now we can easily calculate the density of Z by deriving the cumulative distribution function:

Eq 9.JPG

With a further obvious approximation, we get:

Eq 10.JPG

In order to find the value of x in which this density reaches its largest value, we have to search for the value of x in which its derivative is zero. So we have to solve the following equation:

Eq 11.JPG

Which means that we have to solve the transcendental equation:

Eq 12.JPG

Majorana truncated the second member of the equation on the right and proposed as a solution the following one:

Eq 13.JPG

Then he substituted again this solution in the equation, in order to find ε:

Eq 14.JPG

With some further approximations, we have

Eq 15.JPG

So Majorana’s expression for the value of x in which the density of Z reaches its maximum value is

Eq 16.JPG

I have tried to solve the transcendental equation with Newton’s method (see the code below) and I found that Majorana’s solution is a very good one (as you can see from figure 3). Now, If we compare the approximation by Majorana with what I obtained using numerical integration at the beginning (figure 2) we see that Majorana found a very good solution, particularly for the value of x_M. Note: the transcendental equation that has been solved here seems the one whose solution is the Lambert W function, but it is not the same!

trascendental.JPG
Figure 3. The value of x in which the density of Z reaches its largest value. Majorana’s solution (continuous line) and a numerical solution obtained by means of Newton’s method (dotted line).
Vol II.JPG
Figure 4. From the original manuscript by Majorana, which can be found here (page 70). These are some of the passages that I have discussed here.
% file name = tangenti
% date of creation = 08/06/2019

clear all

x(1) = 1;            %the initial guess
for i=1:1:100
  m(i) = i;
end

for i=1:1:100
  for j = 2:1:1000
   f(j-1) = exp( 0.5*( x(j-1)^2 ) ) - ( m(i)/( x(j-1)*sqrt(2*pi) ) );
   f_p(j-1) = x(j-1)*exp( 0.5*( x(j-1)^2 ) ) + ( m(i)/( (x(j-1)^2)*sqrt(2*pi) ) );
   x(j) = x(j-1) - ( f(j-1)/f_p(j-1) );
   if ( abs(x(j)) < 0.001 )
    break;
   endif
   max_t (i) = x(j);
  endfor
endfor

% the aproximations by Majorana

for j=1:1:100
  max_t_M (j) = sqrt(log(j^2)) - ( log(sqrt(2*pi*log(j^2)))/sqrt(log(j^2)) );
endfor

% it plots the diagrams

plot(m(1:1:100),max_t (1:1:100),'.k','Linewidth', 1)
xlabel('m')
ylabel('time for maximum value')
grid on
hold on
plot(m(1:1:100),max_t_M (1:1:100),'-k','Linewidth', 1)

legend('numerical integration',"Majorana's approximation", "location", 'southeast')

Epilogue

From 1934 to 1938 Majorana continued his studies in a variety of different fields (from game theory to biology, from economy to quantistic electrodynamics), but he never published again (R), with the exception for a work on the symmetric theory of electrons and anti-electrons (Majorana E. 1937). But it has been concluded by biographers that the discoveries behind that work were made by Majorana about five years earlier and yet never shared with the scientific community until 1937 (Esposito S. et al. 2003). And in a spring day of the year 1938, while Mussolini was trying his best to impress the world with his facial expressions, Ettore became a subatomic particle: his coordinates in space and their derivatives with respect to time became indeterminate. Whether he had lived in a monastery in the south of Italy or he had helped the state of Uruguay in building its first nuclear reactor; whether he had seen the boundless landscapes of Argentina or the frozen depth of the abyss, I hope that he found, at last, what he was so desperately searching for.

He had given his contribution to humanity, so whatever his choice has been, his soul was already safe. And as I try to save my own soul, going back and forth from mathematics to biology, in order to find a cure, I can feel his presence. The eloquence of his silence trapped behind dark eyes can always be clearly heard if we put aside the noise of the outside world. And it tells us that Nature has a beautiful but elusive mathematical structure which can nevertheless be understood if we try very hard.

In the meanwhile, I write these short stories, like a mathematical proof of my own existence, in case I don’t have further chances to use my brain.

Until time catches me.

Ettore Majorana

On the module of random vectors

On the module of random vectors

“I have hopes of being able to achieve

something of value through my current studies or

with any new ideas that come in the future.”

J. F. Nash

The bridge over the Arno

In 1999 I was wandering in Pisa with a booklet in the pocket of a worn coat, too short for my frame. That coat was dark blue on the outside, green and red inside, with one mended sleeve and an austere cowl: I was much like a young monk, with his holy book (figure 1). I can remember neither its title nor the author, though. It was an introduction to statistical thermodynamics, with beautiful figures, a coloured cover, and less than 100 pages. It contained the work by Maxwell on the kinetic theory of ideal gasses, along with other material. I borrowed it from the University Library because I was fascinated by the way in which Maxwell was able to describe the properties of gasses with just a few hypotheses and some relatively easy mathematical passages. I felt that there was an enormous attraction in these methods, I realized with pleasure that math could give the power to completely understand and hold in hand physical systems and even, I started speculating, biological ones.

Analisi, coraggio, umiltà, ottimismo.jpg
Figure 1. A small self-portrait I drew in 1999 in an empty page of my pocket-sized French dictionary.

My second favourite composer back then was Gustav Mahler (the favourite one being Basil Poledouris): he represented my own way to classical music and I chose him because he wasn’t among the musicians my father and my brother shared a love for. I felt, during my teens, that I had to find my private space, and I met it one day on a used book stand: a cassette tape of Das Lied von The Erde, with a few sheets containing the translation to Italian of the songs. Mahler was born in 1860, a few weeks after Maxwell published his pivotal work about ideal gasses in The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science (R) (figure 2). But in 1999 I was repeatedly listening to a CD with a collection of songs sung by Edith Piaf and Charles Trenet, because I was studying French, and I was having a hard time with pronunciation. So, imagine a secular monk in his prime who listens to old French songs while keeping one hand on a book of statistical thermodynamics hidden in his pocket, wherever he goes, wandering in the streets of Pisa, a city which gave birth to Galileo Galilei. This seems a beautiful story, much like a dream, right? Wrong.

cover
Figure 2. Original cover of the journal in which Maxwell published his work on the kinetic theory of ideal gasses.

I had already started my struggle against the mysterious disease that would have completely erased my life in the following years. In the beginning, it had a relapsing-remitting course, so I could sometimes host the hope that I was recovering, only to find myself caught by the evil curse again. At the end of the year 1999, I was losing my mind, I knew that and I was also aware that my holy book couldn’t save me. I clearly remember one evening, I was walking on Ponte di Mezzo, a simple and elegant bridge over the Arno, and I felt that I couldn’t feel sorrow for the loss of my mind: I realized that not only the functions of my brain assigned to rational thinking were gone, but my feelings couldn’t function properly either. In fact, I noted without a true manifestation of desperation that I had lost my emotions. One day, after spending in vain about eleven hours on a single page of a textbook of invertebrate palaeontology, I accepted that I couldn’t read anymore, at least for the moment.

Had I known for sure that I wouldn’t have recovered in the following twenty years, I would have quite certainly taken my own life, jumping from a building; a fate that I have been thinking about almost every day ever since. I considered this possibility during the endless sequence of days in which there has been nothing other than the absence of my thoughts.

The distribution of velocities of an ideal gas and the one hundred years gap

In the already mentioned paper by Maxwell, he derived the probability density of the speed of the molecules of a gas, granted that the three components of the vector of speed are independent random variables (hyp. 1) and that they share the same density (hyp. 2), let’s say f. Moreover, the density of the speed has to be a function only of its module (hyp. 3). These three hypotheses together say that there is a function Φ such that

Cattura.PNG

This is a functional equation (i.e. an equation in which the unknown is a function) whose solution is not detailed in Maxwell’s work. But it can be easily solved moving to polar coordinates (see figure 3) and deriving with respect to θ both members (the second one gives naught since it depends only to the distance from the origin).

space of velocities.png
Figure 3. The space of velocities and its polar coordinates.

Another way to solve the functional equation is to use the method of Lagrange’s multipliers, searching for the extremes of the density of the velocity, when its module is fixed. In either case, we obtain the differential equation:

differential equation.png

which leads to the density for each component of the speed:

density.PNG

where σ can’t be determined using only the three hypotheses mentioned above. Considering then the well-known law of ideal gasses (pV=nRT) and an expression for p derived from the hypothesis that the collisions between the molecules of gas and the container are completely elastic, Maxwell was able to conclude that:

varianza.png

where m is the mass of the molecule of gas, T is the absolute temperature and K_B is the Boltzmann’s constant. It was 1860, Mahler’s mother was going to deliver in Kaliště, Charles Darwin had just released his masterpiece “On the origin of species”, forced to publish much earlier than what he had planned because of the letter he had received from Wallace, in which he described about the same theory Darwin had been working on for the previous 20 years. In the same point in time, Italy was completing its bloody process of unification, with the Mille expedition, led by Giuseppe Garibaldi.

But the functional equation I have mentioned at the beginning of this paragraph has brought with it a mystery for years, until 1976, when an employee at General Motors Corporation published a short note in the American Journal of Physics [R] in which he showed how Maxwell’s functional equation is, in fact, an example of the well known Cauchy’s functional equation:

Cauchy.png

In order to prove that, you just have to consider the following definition:

Nash.png

given that

Nash2.png

The name of the mathematician who made this observation is David H. Nash, and he has the merit of finding something new in one of the most known equation of physics, an equation that is mentioned in every book of thermodynamics, an equation that has been considered by millions of students in more than a century. It was 1976, my mother was pregnant with my brother; Alma, Gustav Mahler’s wife, had died about ten years before.

sphere.PNG
Figure 4. A sphere with a radius of z. The integrals of the density of within this sphere has the obvious meaning of the repartition function of Z.

Module of random vectors

Once Maxwell found the density of probability for each component of the speed of the molecules of an ideal gas, he searched for the density of the module of the speed. There is a relatively simple way of doing that. With the following notation

notazione.png

we have that the repartition function of Z is given by the integrals of the density of X within the sphere in figure 4. We have:

repartition function.PNG

The second expression is the same as above but in polar coordinates. Then we can obtain the density of Z by derivation of the repartition function. And this method can be extended to an m-dimensional space. This was the method used by Maxwell in his paper. And yet, there is another way to obtain the expression of the module of a random vector: I have explored it in the last months, during the rare hours in which I could function. By the way, only in the Summer of 2007 I was able to study the kinetic theory by Maxwell, eight years after I borrowed the holy book. Such a waste.

The hard way towards the density of the module of a random vector

When a student starts studying statistics, she encounters a list of densities: the normal distribution, the gamma distribution, the exponential distribution etc. Then there are several derived distributions that arise when you operate sums, roots extractions etc. on random variables. In particular, if f_X is the densty of X and Y = X², then we have

Quadrato.PNG

On the other hand, if Y = √X we have

Radice.PNG

Another important result that we have to consider is that given

linera combination.PNG

then

linear combination 2.PNG

By using these results I have been able to find that the expression of the density for  the module of an m-dimensional random vector is:

module.PNG

In particular, for m = 3 we have

module 2.PNG

Bessel surface.PNG
Figure 5. A plot of the modified Bessel function.

The case of normal random vectors: the modified Bessel function

In particular, if the random vector has dimension 3 and its components are normal random variables with the same expected value and variance, we have that the density of its module is given by

modulo normale.PNG

where I_0 is the modified Bessel function, which is one solution of the differential equation:

Bessel equation.PNG

whose name is modified Bessel equation. The integral expression of the modified Bessel function is:

Bessel function.PNG

I have coded a script in Matlab which integrates numerically this function (available here for download) which plots the surface in figure 5 and also gives the following table of values for this function.

Bessel table.PNG

The following is the flowchart of the script I coded.

flux chart.png

The case of normal random vectors with a naught expected value: the upper incomplete gamma function

If we consider random variables with an average that is zero (this is the case with the components of speed in ideal gasses), then the density is given by

Chi 3 scaled.PNG

which is a Chi distribution with 3 degrees of freedom, scaled with a scale parameter given by s = 1/σ. In the expression of the repartition function, ϒ is the lower incomplete gamma function, which is defined as follows:

incomplete gamma.PNG

I have written a code for its numerical integration (available here for download), the output of which is in figure 6.

lower incomlete.PNG
Figure 6. Lower incomplete gamma function.

maxwell surface.PNG
Figure 7. Maxwell-Boltzmann distribution for N2 in function of temperature and speed.

maxwell surface 2.PNG
Figure 8. Maxwell-Boltzmann distribution for Helium, Neon, Argon, at room temperature.

Conclusion, twenty years later

The density of the module of the velocity of the molecules of an ideal gas is, in fact, a scaled Chi distribution with 3 degrees of freedom, and it is given by

maxwell.PNG

It can be numerically integrated with the following script I made for Octave/Matlab, which gives the plot in figure 7. Another similar script gives the plot in figure 8. These plots represent the Maxwell-Boltzmann distribution, the centre of the holy book that an unfortunate boy was carrying in his pocket, all alone, some twenty years ago. He could have easily died by his own hand in one of the several thousand days of mental and physical disability that he had to face alone. Instead, he has survived. Had it been only for finding the Maxwell-Boltzmann distribution following another path, it would have been worth it. But he has found much more, including a bright girl, the magnificent next stage of evolution of human beings.

% file name = legge_Maxwell-Boltzmann_2

% date of creation = 22/02/2019

% it plots the density and the distribution function for the

% Maxwell-Boltzmann distribution considered as a function of temperature

% and speed

clear all

% we define some parameters

K_B = 1.381*10^(-23)                                                 % Boltzmann's constant

m = 4.65*10^(-26)                                                     % mass of the molecule N_2

% we define the array of temperature from 0° C to 1250° C

T (1) = 273.15

for i = 2:1:1250

T (i) = T (i-1) + 1.;

endfor

% it defines f_gamma in 3/2

f_gamma = sqrt(pi)/2.

% delta of integration

delta = 1.0

% it defines the array for the abscissa

z (1) = 0.;

for i = 2:1:2500

z (i) = z(i-1)+delta;

endfor

% it defines the density

for j = 1:1:1250

% it defines a constant

c = ( m/(K_B*T(j)) );

for i = 1:1:2500

f (j,i) = ( c^1.5 )*sqrt(2./pi)*( z(i)^2. )*( e^( -0.5*c*z(i)^2. ) );

endfor

% it calculates the ripartition function for He

F (j,1) = 0.;

F (j,3) = F (j,1) + delta*( f(j,1) + ( 4*f(j,2) ) + f(j,3) )/3;

F (j,2) = F (j,3)*0.5;

for k=2:1:2500-2

F (j,k+2) = F (j,k)+delta*( f(j,k)+( 4*f(j,k+1) )+f(j,k+2) )/3;

end

endfor

% It plots f and F

figure (1)

mesh(z(1:100:2500), T(1:100:1250), f(1:100:1250,1:100:2500));

legend('Density',"location","NORTHEAST");

xlabel('speed (m/s)');

ylabel('temperature (Kelvin)');

grid on

figure (2)

mesh(z(1:100:2500), T(1:100:1250), F(1:100:1250,1:100:2500));

legend('Probability',"location","NORTHEAST");

xlabel('speed (m/s)');

ylabel('temperature (Kelvin)');

grid on

Un modello matematico per la ME/CFS

Un modello matematico per la ME/CFS

La versione in inglese di questo articolo è disponibile qui.

Introduzione

Molti dei miei lettori sono probabilmente a conoscenza dei tentativi attualmente fatti per simulare matematicamente il metabolismo energetico dei pazienti ME/CFS, integrando i dati metabolici con i dati genetici. In particolare, il dr. Robert Phair ha sviluppato un modello matematico delle principali vie metaboliche coinvolte nella conversione dell’energia, dall’energia immagazzinata nei legami chimici di grandi molecole come glucosio, acidi grassi e amminoacidi, all’energia immagazzinata nell’adenosina trifosfato (ATP), pronta per l’uso. Phair, che è un ingegnere, ha determinato le equazioni differenziali che regolano questa enorme quantità di reazioni chimiche e le ha adattate al profilo genetico trovato nei pazienti ME/CFS. Ma già alcuni anni fa due fisici pubblicarono un interessante modello matematico del metabolismo energetico durante e dopo l’esercizio, nei pazienti ME/CFS (Lengert N. et Drossel B. 2015). In quanto segue descriverò questo modello e le sue previsioni e vedremo da vicino queste equazioni differenziali.

Le vie metaboliche che sono state analizzate

Il modello di Lengert e Drossel estende due sistemi di equazioni differenziali precedentemente pubblicati che descrivono il comportamento della glicolisi, del ciclo di Krebs (enormemente semplificato come una singola reazione!), della catena di trasporto degli elettroni mitocondriale (descritta in dettaglio), del sistema della creatina chinasi e della conversione di adenosina difosfato (ADP) in ATP, nei muscoli scheletrici (Korzeniewski B. et Zoladz JA. 2001), (Korzeniewski B. et Liguzinski P. 2004). Gli autori hanno aggiunto equazioni per l’accumulo di lattato e il suo efflusso fuori dalla cellula, per la sintesi de novo di inosina monofosfato (IMP) durante il recupero, per la degradazione dell’adenosina monofosfato (AMP) in IMP, per la degradazione di IMP in inosina e ipoxantina. Tutte le vie coinvolte sono raccolte nella figura 1. Queste reazioni sono descritte da 15 equazioni differenziali e la soluzione è un insieme di 15 funzioni del tempo che rappresentano la concentrazione dei principali metaboliti coinvolti (come il lattato, il piruvato, l’ATP, ecc.). Diamo ora uno sguardo più da vicino a una di queste equazioni e alla struttura generale dell’intero sistema di equazioni.

1-s2.0-S0301462215000630-fx1.jpg
Figura 1. Questa è una rappresentazione schematica dei percorsi metabolici descritti dal modello matematico sviluppato da Lengert e Drossel. In dettaglio: sintesi citosolica e degradazione di ADP, AMP e IMP (a sinistra), via della protein chinasi e glicolisi (centro), catena di trasporto degli elettroni e ciclo TCA (a destra). Da Lengert N. et Drossel B. 2015.

lactate dehydrogenase.PNG
Figure 2. La lattato deidrogenasi è l’enzima coinvolto nella catalisi della conversione del lattato in piruvato. Questa reazione procede in entrambe le direzioni.

Equazioni differenziali per reazioni chimiche

Consideriamo l’equazione utilizzata dagli autori per la reazione catalizzata dalla lattato deidrogenasi (la trasformazione del piruvato in lattato, figura 2) dove si è anche tenuto conto dell’efflusso di lattato dal citosol. L’equazione differenziale è la seguente:

equation.PNG

dove i tre parametri sono determinati sperimentalmente e i loro valori sono

equations.PNG

Il primo descrive l’attività dell’enzima lattato deidrogenasi: più questo parametro è elevato, più l’enzima è attivo. Il secondo descrive la reazione inversa (dal lattato al piruvato). Il terzo è una misura di quanto lattato la cellula è in grado di trasportare al di fuori della sua membrana. Forse il lettore si è reso conto che l’equazione del lattato è una equazione differenziale ordinaria del primo ordine. Si dice “primo ordine” perché nell’equazione compare solo la derivata prima della funzione che dobbiamo determinare (lattato, in questo caso); “ordinario” si riferisce al fatto che il lattato è funzione di una sola variabile (il tempo, in questo caso). Si vede immediatamente che un’equazione come questa può essere scritta come segue:

equation bis.PNG

Supponiamo ora di avere altre due equazioni differenziali di questo tipo, una per il piruvato e una per i protoni (le altre due funzioni del tempo che sono presenti nell’equazione):

equations.PNG

Allora avremmo un sistema di tre equazioni differenziali ordinarie come questo:System.PNG

I valori iniziali delle funzioni che dobbiamo determinare sono raccolti nell’ultima riga: questi sono i valori che le funzioni incognite assumono all’inizio della simulazione (t = 0). In questo caso, questi valori sono le concentrazioni di lattato, piruvato e protoni nel citosol, a riposo. Le tre funzioni del tempo sono chiamate la soluzione del sistema. Questo tipo di sistema di equazioni è un esempio di problema di Cauchy, e sappiamo dalla teoria matematica che non solo ha una soluzione, ma che questa soluzione è unica. Inoltre, mentre questa soluzione  può non essere sempre facilmente trovata con metodi rigorosi, è abbastanza facile risolvere il problema con metodi approssimati, come il  metodo di Runge-Kutta o il metodo di Heun. Detto questo, il sistema di equazioni differenziali ordinarie proposto da Lengert e Drossel per il metabolismo energetico è proprio come quello qui sopra, con l’eccezione che comprende 15 equazioni anziché tre. Quindi, la principale difficoltà in questo tipo di simulazione non è l’aspetto computazionale, ma la determinazione dei parametri (come quelli enzimatici) e dei valori iniziali, che devono essere raccolti dalla letteratura medica o devono essere determinati sperimentalmente, se non sono già disponibili. L’altro problema è come progettare le equazioni: esistono spesso diversi modi per costruire un modello matematico di una reazione chimica o di qualsiasi altro processo biologico.

Il modello matematico della ME/CFS

Come adattiamo ai pazienti ME/CFS un modello del metabolismo energetico che è stato impostato con parametri presi da esperimenti condotti su soggetti sani? Questa è un’ottima domanda, e abbiamo visto che Robert Phair ha dovuto usare i dati genetici dei pazienti ME/CFS relativi agli enzimi chiave del metabolismo energetico, al fine di impostare il suo modello. Ma questi dati non erano disponibili quando Lengert e Drossel hanno progettato le loro equazioni. E allora? I due fisici hanno cercato studi sulla fosforilazione ossidativa nei pazienti ME/CFS e hanno scoperto che qusto processo cellulare era stato misurato con diverse impostazioni sperimentali e da diversi gruppi e che il denominatore comune di tuti gli studi era una riduzione di funzione che andava da circa il 35% (Myhill S et al. 2009), (Booth, N et al 2012), (Argov Z. et al. 1997), (Lane RJ. et al. 1998) a circa il 20% (McCully KK. et al. 1996), (McCully KK. et al. 1999). Quindi l’idea degli autori è stata di moltiplicare i parametro enzimatici di ciascuna reazione appartenente alla fosforilazione ossidativa per un numero compreso tra 0,6 (grave ME / CFS) a 1,0 (persona sana). In particolare, i due fisici hanno scelto un valore di 0,7 per la ME/CFS, nei loro esperimenti in silico (cioè esperimenti virtuali condotti nel processore di un computer).

Previsioni del modello matematico

Il modello matematico è stato utilizzato per eseguire prove di esercizio in silico con varie lunghezze e intensità. Quello che Lengert e Drossel hanno trovato è stato che il tempo di recupero nel paziente ME/CFS medio era sempre maggiore se confrontato con quelli di una persona sana. Il tempo di recupero è definito come il tempo necessario affinché una cellula ripristini il suo contenuto di ATP (97% del livello in stato di riposo) dopo lo sforzo. Nella figura 3 si vedono i risultati della simulazione per un esercizio molto breve (30 secondi) e molto intenso. Come potete vedere, nel caso di una cellula sana (a sinistra) il tempo di recupero è di circa 600 minuti (10 ore) mentre una cellula di una persona con ME/CFS (a destra) richiede più di 1500 minuti ( 25 ore) per recuperare.

half minute 1.png
Figura 3. Risultati della simulazione per un esercizio con una durata di 30 secondi e un’intensità elevata (consumo iniziale di ATP 300 volte il valore di riposo). A sinistra, il caso di una cellula muscolare scheletrica sana, a destra il caso di una cellula di una persona con ME/CFS le cui reazioni mitocondriali hanno una velocità ridotta al 70% della velocità del controllo sano. I grafici li ho ottenuti utilizzando la versione online del software, disponibile qui.

Un altro risultato interessante della simulazione è un aumento di AMP nei pazienti rispetto al controllo (figura 3, linea arancione). Ciò è dovuto all’uso compensativo delle due vie metaboliche in figura 4: la reazione catalizzata dall’adenilato chinasi, in cui due molecole di ADP sono utilizzate per produrre una molecola di ATP e una molecola di AMP; e la reazione catalizzata dalla deaminasi AMP, che degrada AMP in IMP (che viene quindi convertito in inosina e ipoxantina). Queste due reazioni sono utilizzate dai pazienti ME/CFS più che dal controllo sano, al fine di aumentare la produzione di ATP al di fuori dei mitocondri.

adenylate-kinase-and-amp-deaminase1.png
Figura 4. La via metabolica a sinistra è utilizzata dai pazienti ME/CFS più che nel controllo per aumentare la produzione di ATP al di fuori dei mitocondri, secondo questo modello matematico. Il percorso sulla destra degrada l’AMP in IMP.

Se diamo un’occhiata più da vicino alle concentrazioni di AMP e IMP nelle 4 ore successive allo sforzo (figura 5), vediamo effettivamente una maggiore produzione di IMP (linea verde) e AMP (linea arancione) nei muscoli scheletrici dei pazienti (destra) rispetto ai controlli (sinistra).

half minute 3.png
Figura 5. Lo stesso della figura 3, ma ingrandito per dare uno sguardo più da vicino alle concentrazioni durante le 4 ore successive allo sforzo. La cellula sana è a sinistra, mentre le cellule di una persona con ME/CFS sono sulla destra.

Un’ulteriore via di compensazione utilizzata dai pazienti (secondo questo modello) è la produzione di ATP da ADP da parte dell’enzima creatina chinasi (figura 6). Questo è un altro modo che abbiamo per produrre ATP nel citosol senza l’aiuto dei mitocondri. In questo modello di ME/CFS, vi è un aumento nell’uso di questo percorso, che porta a una diminuzione della concentrazione cellulare di fosfocreatina e un aumento della concentrazione cellulare di creatina (figura 7).

creatine kinase
Figura 6. La reazione catalizzata dalla creatina chinasi: una molecola di ADP viene convertita in ATP grazie al gruppo fosfato trasportato dalla fosfocreatina.

half minute 4.png
Figura 7. La concentrazione di fosfocreatina nel citosol delle cellule muscolari scheletriche è inferiore nella ME/CFS (a destra) rispetto al controllo (a sinistra) durante e dopo l’esercizio. Ciò è dovuto al maggiore uso di questa molecola per produrre ATP in modo anaerobico nel metabolismo ME/CFS rispetto al controllo. I parametri per questa simulazione sono gli stessi descritti nella figura 3.

Multivariate normal distribution, a proof of existence

Multivariate normal distribution, a proof of existence

“Las matemàticas comprenden todas la ciencias”

Miguel Alvàrez Osorio, 1775

For my brother who was

decades ahead of me

in terms of knowledge

albeit being only two years older

Introduction

In a previous blog post, I have discussed the joint density of a random vector of two elements that are normally distributed. I was able to prove the expression for the joint probability, not without fighting against some nasty integrals. In the end, I introduced the expression of the joint probability for a random vector of m normally distributed elements and I left my four readers saying “I have no idea about how it could be proved“. We were in June, I was in the North of Italy back then, hosted by friends but mainly alone with my books in a pleasant room with pink walls and some dolls listening to my speculations without a blink; a student of engineering was sharing with me, via chat, her difficulties with the surprisingly interesting task of analyzing the data from some accelerometers put in the mouth of fat people while they stand on an oscillating platform; my birthday was approaching and I was going to come back in Rome after a short stop in Florence, where I was for the first time fully aware of how astonishingly beautiful a woman in a dress can be (and where I saw the statues that Michelangelo crafted for the Tomb of the Medicis, the monument to Lorenzo in particular, which is sculpted in our culture, more profoundly than we usually realize).

But on a day in late December, while I was planning my own cryopreservation (a thought I often indulge in when my health declines even further), I realized that the covariance matrix is a symmetrical one so it can be diagonalized, and this is the main clue in order to prove the expression of this density. As obvious as it is, I couldn’t think of that when I first encountered the multivariate normal distribution, and the reason for this fault is my continuous setbacks, the fact that for most of the last 20 years I have not only been unable to study but even to think and read. And this is also the reason why I write down these proofs in my blog: I fear that I will leave only silence after my existence, because I have not existed at all, due to my encephalopathy. I can’t do long term plans, so as soon as I finish a small project, such as this proof, I need to share it because it might be the last product of my intellect for a long time. So, what follows is mainly a proof of my own existence, more than it is a demonstration of the multivariate normal distribution.

Before introducing the math, two words about the importance of the multivariate normal distribution. Many biological parameters have a normal distribution, so the normal density is the most important continuous distribution in biology (and in medicine). But what happens when we are considering more than one parameter at the time? Suppose to have ten metabolites that follow a normal distribution each, and that you want to calculate the probability that they are all below ten respective maximal values. Well, you have to know about the multivariate normal distribution! This is the reason why I believe that anyone who is interested in biology or medicine should, at least once in her lifetime, go through the following mathematical passages.

Can’t solve a problem? Change the variables!

In this paragraph, I present a bunch of properties that we need in order to carry out our demonstration. The first one derives directly from the theorem of change of variables in multiple integrals. The second and the third ones are a set of properties of symmetrical matrices in general, and of the covariance matrix in particular. Then, I collect a set of integrals that have been introduced or calculated in the already cited blog post about the bivariate normal distribution. The last proposition is not so obvious, but I won’t demonstrate it here, and those who are interested in its proof, can contact me.

Prop 1_2.PNG
Figure1. The domains of the bijective function Y = Φ(X).

PROPOSITION 1 (change of variables). Given the continuous random vector X = (X_1, X_2, ..., X_m) and the bijective function Y = Φ(X) (figure 1), where Y is a vector with m dimensions, then the joint density of Y can be expressed through the joint density of X:

1)

Prop 1_1.PNG

PROPOSITION 2 (symmetrical matrix). Given the symmetrical matrix C, we can always write:

2)

Prop 1_3.PNG

where \lambda_1, \lambda_2, ..., \lambda_m are the eigenvalues of matrix C and the columns of P are the respective eigenvectors. It is also easy to see that for the inverse matrix of C we have:

3)

Prop 1_4.PNG

Moreover, the quadratic form associated with the inverse matrix is

4)

Prop 2_3.PNG

where

5)

Prop 2_4.PNG

PROPOSITION 3 (covariance matrix). If C is the covariance matrix of the random vector X = (X_1, X_2, ..., X_m ), which means that

6)

Prop 3_1.PNG

then, with the positions made in Prop. 2, we have

7)

Prop 3_2.PNG

where \sigma_j is the standard deviation of X_j and \rho_{ij} is the correlation coefficient between X_i and X_j.

PROPOSITION 4 (some integrals). It is possible to calculate the integrals in the following table. Those who are interested in how to calculate the table can contact me.

exponential integral 4

PROPOSITION 5 (other integrals). It is possible to calculate the two following integrals from the table. Those who are interested in how to calculate them can contact me.

8)

exponential integral 3

PROPOSITION 6 (sum of normal random variables). Given the random vector X = (X_1, X_2, ..., X_m) whose components are normally distributed, then the density of the random variable Y = X_1 + X_2 + ... + X_m is a normal law whose average and standard deviations are respectively given by:

extra.PNG

Multivariate normal distribution

PROPOSITION 7. The joint probability density in the case of a random vector whose m components follow a normal distribution is:

9)Prop 6_1.PNG

Demonstration, first part. The aim of this proof will be to demonstrate that if we calculate the marginal distribution of X_i from the given joint distribution, we obtain a normal distribution with an average given by \mu_i. Moreover, we will prove that if we use this joint distribution to calculate the covariance between X_i and X_j, we obtain \sigma_i\sigma_j\rho_{ij}. We start operating the following change of variables:

10)  Prop 6_2.PNG

whose Jacobian is the identity matrix. So we obtain for the joint density in Eq. 9 the expression:

11)

Prop 6_5.PNG

We then consider the substitution

12)

Prop 6_3.PNG

whose Jacobian is the determinant of P which is again the identity matrix, since P is an orthogonal matrix (P is the matrix introduced in Prop. 2, whose columns are eigenvectors of the covariance matrix). Then we have

Prop 6_6.PNG

And, according to Prop. 1, we obtain for the joint distribution in Eq. 9 the expression:

13)

prop 6_4

So, the marginal distribution of the first random variable is

prop 6_7

We recognize the integrals in Prop. 4, for n = 0. So we have for the marginal distribution:

14)

Prop 6_9.PNG

while the joint distribution becomes

15)

prop 6_8

Let’s now consider another change of variable, the following one:

16)Prop 6_10.PNG

whose Jacobian is given by:

Prop 6_11.PNG

Then, according to Prop. 1, we have

Prop 6_12.PNG

This proves that the variables {X''}_1, {X''}_2, ... , {X''}_m are independent. But they are also normally distributed random variables whose average is zero and whose standard deviation is

Prop 6_13.PNG

for i that goes from 1 to m. Since we have

Prop 6_14.PNG

we can calculate the marginal distribution of ξ_j according to Prop. 6:

17)

Prop 6_15.PNG

Remembering the very first substitution (Eq. 10) we then draw the following conclusion:

18)Prop 6_16.PNG

Now, if you remember Prop. 3, you can easily conclude that the marginal density of X_j is, in fact, a normal distribution with average given by \mu_j and standard deviation given by \sigma_j. This concludes the first part of the demonstration. It is worth noting that we have calculated, in the previous lines, a very complex integral (the first collected in the following paragraph), and we can be proud of ourselves.

Demonstration, second part. We have now to prove that the covariance coefficient of between X_i and X_j, is given by \rho_{ij}. In order to do that, we’ll calculate the covariance between X_i and X_j with the formula

19)   Cov[X_i,X_j] = E[X_i,X_j]E[X_i]E[X_j] = E[X_iX_j]\mu_i,\mu_j

For E[X_iX_j] we have

Prop 6_18.PNG

Considering the substitution in Eq. 10 we have

prop 6_19
prop 6_20

To simplify the writing, let’s assume i=1 and j=2. For I_1 we have:

prop-6_21.png

Now, considering again Prop. 4, we easily recognize that:

20)

Prop 6_22.PNG

For I_1 we then have:

21)

Prop 6_23.PNG

For I_2 we have:

Prop 6_24.PNG

So, I_2 is zero and the same applies to I_3, as the reader can easily discover by herself, using Eq. 20. Hence, we have found:

Prop 6_25.PNG

Now, just consider Eq. 7 (the second one) in Prop. 3, and you will recognize that we have found

Prop 6_26.PNG

which is exactly what we were trying to demonstrate. The reader has likely realized that we have just calculated another complex integral, the second one in the following paragraph. It can be also verified that the joint density is, in fact, a density: in order for that to be true it must be

prop 6_29

Now, if we use the substitutions in Eq. 10 and in Eq. 12 we obtain:

Prop 6_30.PNG

And our proof is now complete.

Integrals

Prop. 7 is nothing more than the calculation of three very complex integrals. I have collected these results in what follows. Consider that you can substitute the covariance matrix with any symmetrical one, and these formulae still hold.

prop 6_17
Prop 6_27.PNG
Prop 6_28.PNG

Sum of independent exponential random variables with the same parameter

Sum of independent exponential random variables with the same parameter

Introduction

Let X_1, X_2, …, X_m be independent random variables with an exponential distribution. We have already found the expression of the distribution of the random variable Y = X_1 + X_2 + …+ X_m when X_1>0, X_2>0, …, X_m>0 have pairwise distinct parameters (here). The reader will easily recognize that the formula we found in that case has no meaning when the parameters are all equal to λ. I will derive here the law of Y in this circumstance. This can be done with a demonstration by induction, with no particular effort, but I will follow a longer proof. Why? Perhaps because I like pain? Yes, this might be true; but the main reason is that, to me,  this longer demonstration is quite interesting and it gives us the chance to introduce the following proposition. In this blog post, we will use some of the results from the previous one on the same topic and we will follow the same enumeration for propositions.

PROPOSITION 8 (sum of m independent random variables). Let X_1, X_2, …, X_m be independent random variables. The law of Y = X_1 + X_2 + …+ X_m is given by:

Proof. We know that the thesis is true for m=2 (PROP. 1). Consider now that:

 But we know that X_1, X_2, …, X_m are independent. So we have:

We now operate the substitution

And we obtain:

But this has to be true for any possible interval [a, b], which means that:

This is one of the equations in the thesis. The other ones can be derived in the same way ♦

Guessing the solution

I will solve the problem for m = 2, 3, 4 in order to have an idea of what the general formula might look like. 

PROPOSITION 9 (m=2). Let X_1, X_2 be independent exponential random variables with the same parameter λ. The law of Y_1 = X_1 + X_2 is given by:

for y > 0. It is zero otherwise.

Proof. We just have to substitute f_{X_1}, f_{X_2} in Prop. 1. We obtain:

And we find the thesis by solving the integral ♦

PROPOSITION 10 (m=3). Let X_1, X_2, X_3 be independent exponential random variables with the same parameter λ. The law of Y = X_1 + X_2 + X_3 is given by:

for y>0. It is zero otherwise.

Proof. If we define Y_1 = X_1 + X_2 and Y_2= X_3, then we can say – thanks to Prop. 2 – that Y_1 and Y_2 are independent. This means that – according to Prop. 1 – we have

And the proof is concluded ♦

PROPOSITION 11 (m=4). Let X_1, X_2, X_3, X_4 be independent exponential random variables with the same parameter λ. The law of Y = X_1 + X_2 + X_3 + X_4 is given by:

Proof. We can demonstrate the thesis in the same way we did in Prop. 10. We can draw the same conclusion by directly applying Prop. 8: 

Figure 1. This tetrahedron is defined by the conditions y - x_2 - x_3 - x_4 >0 and x_i >0 for i = 2, 3, 4. It is the domain of integration for the integral in the proof of proposition 11.

The domain of integration is the tetrahedron in Figure 1. The generic point P within the tetrahedron has to belong to the segment AC that, in turns, belongs to the triangle shaded grey. This means that the domain of integration can be written [0, y]×[0, yx_2]×[0, yx_2x_3]. Thus, we calculate:

and the proof is concluded ♦

Proof

The reader has now likely guessed what the density of Y looks like when m is the generic integer number. But before we can rigorously demonstrate that formula, we need to calculate an integral.

PROPOSITION 12 (lemma). It is true that:

Proof. We have already found that the thesis is true when m = 4. Let us now assume that it is true for m – 1; if we write the thesis for the m – 2 variables x_3, x_4, … , x_m we have:

Now we put η = yx_2:

By integrating both members in [0, y] we obtain:

and the proof is concluded ♦

PROPOSITION 13. Let X_1, X_2, …, X_m be independent random variables. The law of Y = X_1 + X_2 + …+ X_m is given by:

Proof. By directly applying Prop. 8 we have:

But:

Thus, we can write:

The domain of integration can be written in a more proper way (as we did in Prop. 11) as follows:

But this is the integral calculated in Prop. 12, and the proof is concluded ♦

A numerical application

The reader might have recognized that the density of Y in Prop. 13 is, in fact, a Gamma distribution: it is the distribution Γ(m, λ), also known as Erlang’s distribution. There might even be a reader who perhaps remembers that I have discussed that distribution in a post of mine (here) and that I have coded a program that plots both the density and the distribution of such a law. In Figure 2 we see the density of Y (left) and the distribution function (right) for λ = 1.5 and for m that goes from 1 to 3 (note that m is indicated with α in that picture).

Figure 2. Density (left) and distribution function (right) of Y for λ = 1.5 and for m that goes from 1 to 3 (note that m is indicated with α in this picture).

Sum of independent exponential random variables

Sum of independent exponential random variables

For Chiara,

who once encouraged me

to boldly keep trying

Introduction

For the last four months, I have experienced the worst level of my illness: I have been completely unable to think for most of the time. So I could do nothing but hanging in there, waiting for a miracle, passing from one medication to the other, well aware that this state could have lasted for years, with no reasonable hope of receiving help from anyone. This has been the quality of my life for most of the last two decades. Then, some days ago, the miracle happened again and I found myself thinking about a theorem I was working on in July. And once more, with a great effort, my mind, which is not so young anymore, started her slow process of recovery. I concluded this proof last night. This is only a poor thing but since it is not present in my books of statistics, I have decided to write it down in my blog, for those who might be interested.

I can now come back to my awkward studies, which span from statistics to computational immunology, from analysis of genetic data to mathematical modelling of bacterial growth. Desperately searching for a cure.

The problem

Let X_1, X_2, ..., X_m be independent random variables with an exponential distribution with pairwise distinct parameters \lambda_1, \lambda_2, ..., \lambda_m , respectively. Our problem is: what is the expression of the distribution of the random variable Y = X_1 + X_2 + ... + X_m? I faced the problem for m = 2, 3, 4. Then, when I was quite sure of the expression of the general formula of f_Y (the distribution of Y) I made my attempt to prove it inductively. But before starting, we need to mention two preliminary results that I won’t demonstrate since you can find these proofs in any book of statistics.

PROPOSITION 1. Let X_1, X_2 be independent random variables. The distribution of  Y = X_1 + X_2 is given by:

where f_X is the distribution of the random vector [X_1, X_2 ].

PROPOSITION 2. Let X_1, X_2, X_3 be independent random variables. The two random variables Y _1 = X_1 + X_2 + ...+ X_n and Y _2 = X_1 + X_2 + ...+ X_m (with n<m) are independent.

DEFINITION 1. For those who might be wondering how the exponential distribution of a random variable X_i with a parameter \lambda_i looks like, I remind that it is given by:

Guessing the solution

As mentioned, I solved the problem for m = 2, 3, 4 in order to understand what the general formula for f_Y might have looked like.

PROPOSITION 3 (m = 2). Let X_1, X_2 be independent exponential random variables with distinct parameters \lambda_1, \lambda_2, respectively. The law of Y _1 = X_1 + X_2 is given by:

Proof. We just have to substitute f_{X_1}, f_{X_2} in Prop. 1. We obtain:

And the demonstration is complete ♦

PROPOSITION 4 (m = 3). Let X_1, X_2, X_3 be independent exponential random variables with pairwise distinct parameters \lambda_1, \lambda_2, \lambda_3, respectively. The law of Y _1 = X_1 + X_2 + X_3 is given by:

Proof. If we define Y _1 = X_1 + X_2 and Y _2 = X_3, then we can say – thanks to Prop. 2 – that Y _1 and Y _2 are independent. This means that – according to Prop. 1 – we have

The reader will now recognize that we know the expression of  f_{Y_1} because of Prop. 3. So, we have: 

For the first integral we find:

For the second one we have:

Hence, we find:

And the thesis is proved

PROPOSITION 5 (m = 4). Let X_1, X_2, X_3, X_4 be independent exponential random variables with pairwise distinct parameters \lambda_1, \lambda_2, \lambda_3, \lambda_4, respectively. The law of Y = X_1 + X_2 + X_3 + X_4 is given by:

for y>0, while it is zero otherwise.

Proof. Let’s consider the two random variables Y_1 = X_1 +  X_2, Y_2 = X_3 + X_4. Prop. 2 tells us that Y_1, Y_2 are independent. This means that – according to Prop. 1 – we can write:

The reader has likely already realized that we have the expressions of f_{Y_2} and f_{Y_2}, thanks to Prop. 3. So we have:

For the four integrals we can easily calculate what follows:

Adding these four integrals together we obtain:

And this proves the thesis

We are now quite confident in saying that the expression of f_Y for the generic value of m is given by:

for y>0, while being zero otherwise. But we aim at a rigorous proof of this expression.

Proof

In order to carry out our final demonstration, we need to prove a property that is linked to the matrix named after Vandermonde, that the reader who has followed me till this point will likely remember from his studies of linear algebra. The determinant of the Vandermonde matrix is given by:

PROPOSITION 6 (lemma). The following relationship is true:

Proof. In the following lines, we calculate the determinant of the matrix below, with respect to the second line. In the end, we will use the expression of the determinant of the Vandermonde matrix, mentioned above:

But this determinant has to be zero since the matrix has two identical lines, which proves the thesis

PROPOSITION 7. Let X_1, X_2, ... , X_m be independent exponential random variables with pairwise distinct parameters \lambda_1, \lambda_2, ... , \lambda_m, respectively. The law of Y = X_1 + X_2 + ... + X_m is given by:

for y > 0, while being zero otherwise.

Proof. We already know that the thesis is true for m = 2, 3, 4. We now admit that it is true for m-1 and we demonstrate that this implies that the thesis is true for (proof by induction). Let’s define the random variables Y_1 = X_1 + X_2 + ... + {X_{m-1}} and Y_2 = X_m. These two random variables are independent (Prop. 2) so – according to Prop. 1 – we have:

Now, f_{Y_1} is the thesis for m-1 while f_{Y_2} is the exponential distribution with parameter \lambda_m. So we have:

For the sum we have:

The sum within brackets can be written as follows:

So far, we have found the following relationship:

or, equivalently:

prop 7

In order for the thesis to be true, we just need to prove that

prop 7_1

which is equivalent to the following:

prop 7_3

Searching for a common denominator allows us to rewrite the sum above as follows:

prop 7_4

So, we just need to prove that:

It is possible to show, with some manipulations, that this is equivalent to the thesis of Prop. 6 and this proof is concluded ♦

References. A paper on this same topic has been written by Markus Bibinger and it is available here.

Integrals for Statistics

In the following four tables, I present a collection of integrals and integral functions that are of common use in Statistics. The values for the Beta function have been calculated by a code of mine that uses – as a recursive function – another code that I wrote for the calculation of the Gamma Function (see the related blog post). Below these tables, you find a representation of the Beta function, plotted by my code. I have demonstrated each one of these results and these calculations will be included in an upcoming book.

Immagine.pngImmagine 2Immagine 3