Many of my readers are probably aware of the attempts that are currently being made to mathematically simulate the energy metabolism of ME/CFS patients, integrating metabolic data with genetic data. In particular, dr. Robert Phair has developed a mathematical model of the main metabolic pathways involved in energy conversion, from energy stored in the chemical bonds of big molecules like glucose, fatty acids, and amino acids, to energy stored in adenosine triphosphate (ATP), ready to use. Phair, who is an engineer, determined the differential equations that rule this huge amount of chemical reactions and adapted them to the genetic profile found in ME/CFS patients. Genetic data are the result of the analysis of the exomes of more than 40 patients (I am among them). He found, in particular, that the enzyme ATP synthase (one step of the mitochondrial electron transport chain, called complex V) presents a damaging variant that is found in less than 30% of healthy controls while being detected in more than 60% of ME/CFS patients. He found other enzymes (I don’t know their exact number) of energy metabolism meeting these same criteria. Setting a reduced activity for these enzymes in his mathematical model, he found that, after particularly stressing events (such as infections), ME/CFS patients metabolism can fall into a low-energy state without being able to escape from it (this theory has been called the “metabolic trap hypothesis”) (R, R, R). Phair’s hypothesis is currently being experimentally tested and has not been published yet, but some years ago two physicists published an interesting mathematical model of energy metabolism during and after exercise, in ME/CFS patients compared to healthy controls (Lengert N. et Drossel B. 2015). In what follows I will describe this model and its predictions and we will also see how these differential equations look like.
Metabolic pathways that have been analyzed
The model by Lengert and Drossel extends two previously published systems of differential equations that describe the behaviour of glycolysis, Krebs cycle (hugely simplified as a single reaction!), mitochondrial electron transport chain (described in detail), creatine kinase system, and of conversion of adenosine diphosphate (ADP) in ATP, in skeletal muscles (Korzeniewski B. et Zoladz JA. 2001), (Korzeniewski B. et Liguzinski P. 2004). They added equations for lactate accumulation and efflux out of the cell, for de novo synthesis of inosine monophosphate (IMP) during recovery, for degradation of adenosine monophosphate (AMP) into IMP, for degradation of IMP to inosine and hypoxanthine. All the pathways involved are collected in figure 1. These reactions are described by 15 differential equations and the solution is a set of 15 functions of time that represent the concentration of the main metabolites involved (such as lactate, pyruvate, ATP etc). Let’s now take a closer look at one of these equations and at the general structure of the whole system of equations.
Figure 1. This is a schematic representation of the metabolic pathways described by the mathematical model developed by Lengert and Drossel. In detail: cytosolic synthesis and degradation of ADP, AMP and IMP (left), protein kinase pathway and glycolysis (centre), electron transport chain and TCA cycle (right). From Lengert N. et Drossel B. 2015.
Figure 2. Lactate dehydrogenase is the enzyme involved in the catalysis of the conversion of lactate to pyruvate. This reaction goes in both directions.
Differential equations for chemical reactions
Let’s consider the equation used by the Authors for the reaction catalyzed by lactate dehydrogenase (the transformation of pyruvate into lactate, figure 2) where they also took into account the efflux of lactate from the cytosol. The differential equation is the following one:
where the three parameters are experimentally determined and their values are
The first one describes the activity of the enzyme lactate dehydrogenase: the higher this parameter is, the more active the enzyme is. The second one describes the inverse reaction (from lactate to pyruvate). The third one is a measure of how much lactate the cell is able to transport outside its membrane. You have probably recognized that the equation of lactate is a first-order ordinary differential one. We say “first-order” because in the equation there is only the first-order derivative of the function that we have to determine (lactate, in this case); “ordinary” refers to the fact that lactate is a function only of one variable (time, in this case). You can easily realize that an equation like that can be written as follows:
Suppose now that we had other two differential equations of this type, one for pyruvate and one for protons (the other two functions of time that are present in the equation):
We would have a system of three ordinary differential equations like this one
The initial values of the functions that we have to determine are collected in the last row: these are the values that they have at the beginning of the simulation (t=0). In this case, these values are the concentrations of lactate, pyruvate and protons in the cytosol, at rest. The three functions of time are called the solution to the system. This kind of system of equations is an example of a Cauchy’s problem, and we know from mathematical theory that not only it has a solution, but that this solution is unique. Moreover, whereas this solution can’t be always easily found with rigorous methods, it is quite easy to solve the problem with computational methods, like the Runge-Kutta method or the Heun’s method. All that said, the system of ordinary differential equations proposed by Lengert and Drossel for energy metabolism is just like this one, with the exception that it comprises 15 equations instead of three. So, the main difficulty in this kind of simulation is not the computational aspect but the determination of the parameters (like the enzymatic ones) and of the initial values, that have to be collected from the medical literature or have to be determined experimentally, if not already available. The other problem is how to design the equations: there are several ways to build a model for a chemical reaction or for any other biological process.
The mathematical model of ME/CFS
How do we adapt to ME/CFS patients a model of energy metabolism that has been set with parameters taken from experiments performed on healthy subjects? This is a very good question, and we have seen that Robert Phair had to use genetic data from ME/CFS patients on key enzymes of energy metabolism, in order to set his model. But this data was not available when Lengert and Drossel designed their equations. So what? They looked for studies about the capacity of oxidative phosphorylation in ME/CFS patients in comparison with healthy subjects, and they found that it had been measured with different experimental settings by various groups and that the common denominator was a reduction ranging from about 35% (Myhill S et al. 2009), (Booth, N et al 2012), (Argov Z. et al. 1997), (Lane RJ. et al. 1998) to about 20% (McCully KK. et al. 1996), (McCully KK. et al. 1999). So the idea of the Authors was to multiply the enzymatic parameter of each reaction belonging to the oxidative phosphorylation by a number ranging from 0.6 (severe ME/CFS) to 1.0 (healthy person). In particular, they used a value of 0.7 for ME/CFS, in their in silico experiments.
Predictions of the mathematical model
The mathematical model was used to perform in silico exercise tests with various length and intensities. What they found was that the time of recovery in the average ME/CFS patient was always greater if compared to a healthy person. The time of recovery is defined as the time that a cell needs to replenish its content of ATP (97% of the level in resting state) after exertion. In Figure 3 you see the results of the simulation for a very short (30 seconds) and very intense exercise. As you can see, in the case of a healthy cell (on the left) the recovery time is of about 600 minutes (10 hours) whereas a cell from a person with ME/CFS (on the right) requires more than 1500 minutes (25 hours) to recover.
Figure 3. Results of the simulation for an exercise with a duration of 30 seconds and a high intensity (initial ATP consumption 300 times the resting value). On the left, the case of a healthy skeletal muscle cell, on the right the case of a cell from a person with ME/CFS whose mitochondrial reactions have a velocity reduced to 70% of the velocity of healthy control. The plot has been obtained by using the online version of the software, available here.
Another interesting result of the simulation is an increase in AMP in patients vs control (figure 3, orange line). This is due to the compensatory use of the two metabolic pathways in figure 4: the reaction catalyzed by adenylate kinase, in which two molecules of ADP are used to produce a molecule of ATP and a molecule of AMP; and the reaction catalyzed by AMP deaminase, that degrades AMP to IMP (that is then converted to inosine and hypoxanthine). These two reactions are used by ME/CFS patients more than in healthy control, in order to increase the production of ATP outside mitochondria.
Figure 4. The metabolic pathway on the left is used by ME/CFS patients more than in control to increase the production of ATP outside mitochondria, according to the present model. The pathway on the right then degrades AMP to IMP.
If we give a closer look at the concentrations of AMP and IMP in the 4 hours following the exertion (figure 5), we actually see an increased production of IMP (green line) and AMP (orange line) in skeletal muscles of ME/CFS patients (on the right) vs controls (left).
Figure 5. The same as figure 3, but magnified in order to have a closer look at concentrations during the 4 hours following the exertion. The healthy cell is on the left, whereas the cells from a person with ME/CFS is on the right.
A further compensatory pathway used by patients (according to this model) is the production of ATP from ADP by the enzyme creatine kinase (figure 6). This is another way that we have on our cells to produce ATP in the cytosol without the help of mitochondria. In this model of ME/CFS, there is an increase in the use of this pathway, which leads to a decrease in the cellular concentration of phosphocreatine and an increase in the cellular concentration of creatine (figure 7).
Figure 6. The reaction catalyzed by creatine kinase: a molecule of ADP has converted to ATP thanks to the phosphate group carried by phosphocreatine.Figure 7. The concentration of phosphocreatine in the cytosol of skeletal muscle cells is lower in ME/CFS (right) versus control (left) during and after exercise. This is due to the greater use of this molecule to produce ATP anaerobically in ME/CFS metabolism vs control. Parameters for this simulation are the same as described in figure 3.
Comparison with available metabolic data
I am curious to see if data from the various metabolomic studies done after the publication of the model by Lengert and Drossel are in agreement with it. I will discuss this topic in another article because I still have to study this aspect. I would just point out that if we assumed true the high rate of IMP degradation proposed in this model, we would probably find a high level of hypoxanthine in the blood of patients, compared to controls, whereas this metabolite is decreased in patients, according to one study (Armstrong CW et al. 2015).
Comparison with the model by Phair
The metabolic model by Robert Phair will probably give a more accurate simulation of the energy metabolism of patients if compared with the system of ordinary differential equations that we have discussed in this article. And there are two reasons for that. The first one is that Phair has included equations also for fatty acid beta-oxidation, pentose phosphate pathway, and NAD synthesis from vitamin niacin. The other one is that, whereas the two German physicists reduced the velocities of all the enzymatic reactions that happen in mitochondria, Phair has genetic data for every enzyme involved in these reactions for a group of ME/CFS patients and thus he can determine and set the actual degree of activity for each enzyme. But there is a further level required in order to bring the mathematical simulation closer to the reality: gene expression. We know, for instance, that in ME/CFS patients there is a higher than normal expression of aconitase (an enzyme belonging to the TCA cycle) and of ATP synthase (Ciregia F et al 2016) and this should be taken into account in a simulation of ME/CFS patients energy metabolism. Note that ATP synthase is exactly the enzyme that Phair has found to be genetically damaged in patients, and this makes perfect sense: if an enzyme has reduced activity, its reaction can be speeded up by expressing more copies of the enzyme itself.
One could expect that, in a near future, genetic data and gene expression data from each of us will be used to set mathematical models for metabolic pathways, in order to build a personalized model of metabolism that might be used to define, study and correct human diseases in a personalized fashion. But we would need a writer of sci-fiction in order to tell this chapter of the future of medicine.
In a previous blog post, I have discussed the case of normally distributed random variables. But what happens if we have a vector whose components are random variables with this kind of distribution? In particular, what is the expression of the joint probability density? I don’t know how to derive this formula from the marginal distributions, but I have been able to demonstrate, after several hours of work, the following proposition.
Proposition 1. The joint probability density in the case of a random vector whose two components follow a normal distribution is:
where σ_1 and σ_2 are the standard deviations, μ_1 and μ_2 are the expectations of the two random variables and ρ is the correlation coefficient between the two variables.
Proof. The marginal distribution of x_1 is given by
By operating the substitutions
we obtain
This demonstrates that x_1 is in fact a normally distributed random variable and that its expectation and standard deviation are σ_1 and μ_1, respectively. The same applies to x_2. In order to demonstrate that ρ is the correlation coefficient between x_1 and x_2 we have to calculate the covariance of f_X, which is given by E[X_1,X_2]-E[X_1]E[X_2]. The first addend is given by:
Let’s consider the inner integral and the following substitutions
We have:
The two integrals inside the root square are not the easiest ones, and I will not discuss them here. Just consider that they can be solved using the well known results:
We have
If we put these two integrals inside I, we obtain
The two inner integrals are two difficult ones, again. The first one can be solved using the same exponential integral used above:
For the other inner integral we have
The first and the second integrals can be solved using the table previously reported, while the third one is zero. At the end of all this work, we have:
And this proves that ρ is the correlation coefficient between x_1 and x_2. This is a complete demonstration of the expression of the joint probability density of a bivariate normal distribution ■
This formula can be extended to the case of m components.
Proposition 2. The joint probability density in the case of a random vector whose m components follow a normal distribution is:
where C is the covariance matrix, given by
Proof. This formula is easy to prove in the case of m=2, using proposition 1. I have no idea about how it could be proved when m>2 ■
I have written a code in Fortran (download) that asks you for expectations, standard deviations and correlation coefficient and plots the joint density of the bivariate normal distribution. I have written a code in Octave that does the same job (download). In the picture below you find the joint density for two standard normally distributed random variables with ρ=0 (left), ρ=0.7 (right, upper half) and ρ=-0.7 (right, lower half). These pictures have been plotted by the code in Fortran. The other image is a comparison between the two programs, for another bivariate normal distribution.
In what follows I will present a set of codes that I have written in Octave and in Fortran for the study of random variables that follow a gamma distribution. In another post, we will discuss the beta distribution.
2_The gamma distribution
A random variable is said to have a gamma distribution with parameters (α, λ) if its density is given by
with α, λ>0 and where Γ(α) is the gamma function, which is quite a pain in the lower back, as you will see. This random variable can be used to describe the probability that the duration of life of an entity (whether it is a mechanical component or a bacterium) depends on time. The gamma distribution allows to describe the following situations:
for α>1 the probability of failure of a mechanical component (or death of an organism) increases as the time passes by (increasing hazard rate);
for α=1 the probability of failure is constant (constant hazard rate);
for α<1 the failure is less likely as time passes by (decreasing hazard rate).
The hazard rate is the limit below:
where F_X(t) is the distribution function of the gamma distribution. I have not been able to find an analytic expression for the hazard rate of the gamma distribution, but I have been able to obtain it with the computer-aided calculation (see below). But in order to calculate F_X(t), we need to calculate the gamma function…
3_The gamma function
Γ(α) (the gamma function) is an integral function that cannot be calculated analytically:
It has to be calculated with numerical integration, but a problem is that Γ(α) is an indefinite integral. One way to face this problem is to stop the numerical integration when the addend is below a certain cut off (let’s say 0.0001). It is worth noting that an important feature of this function is that if you know its value in an interval such as [x, x+1], then you can derive Γ(α) for any other possible value of α. This is due to the recursive formula: Γ(α+1) = αΓ(α). That said, I have calculated Γ(α) in [1, 2] and then I have used these values to define Γ(α) in [0, 4] with the .exe file (written in Fortran) available here. Download it and put it in a folder. When you run it, it gives a .txt file with the values of Γ(α) in [0, 4] and two .mat files that can be used to build the plot of Γ(α) (see figure 1). Values of Γ(α) in [0, 2] are tabulated below, while those of you who like the very long list of numbers can find a huge table with values of the gamma function in [0, 4] at the end of this article.
Table 1. Values of the gamma function in [0.1, 1.99], with a step of 0.01.Figure 1. The gamma function between 0 and 4, as calculated by my code in Fortran. Plotted by Octave.
You can find below the script of main_f_gamma.exe and its module.
!Data: 1/03/2018 PROGRAM main_f_gamma
USE DISLIN !libreria grafica USE mod_f_gamma !modulo con procedure
IMPLICIT NONE
REAL(KIND=num), DIMENSION (1:400):: f_gamma !valori della funzione gamma fra 1 e 4 REAL(KIND=num), DIMENSION (1:400):: alpha !valori della ascissa fra 1 e 4 CHARACTER(len=10):: chiusura !serve a chiudere il programma INTEGER:: i,j !indici di ciclo
CALL sub_f_gamma_1_2 (f_gamma,alpha) !calcola la funzione gamma fra 1 e 2
!calcolo della funzione gamma e della ascissa fra 2 e 4
ciclo: DO i=201,400,1 alpha(i) = 1. + alpha(i-100) f_gamma(i) = alpha(i-100)*f_gamma(i-100) END DO ciclo
!calcolo della funzione gamma e della ascissa fra 0 e 1
ciclo: DO i=2,100,1 alpha(i) = alpha(100+i)-1. f_gamma(i) = f_gamma(100+i)/alpha(i) END DO ciclo
!scrivo la funzione gamma tra 0 e 4 su un file di testo
WRITE(*,*) “To close this program press any key, then press RETURN.” READ(*,*) chiusura
END PROGRAM main_f_gamma
!Data: 1/03/2018
MODULE mod_f_gamma
!sezione dichiarativa
IMPLICIT NONE
!costanti
REAL,PARAMETER:: e = 2.71828182 !numero di Nepero REAL,PARAMETER:: d_alpha = 0.01 !incremento delle ascisse della funzione gamma INTEGER,PARAMETER:: num = SELECTED_REAL_KIND (p=13) !aumento il numero di cifre significative
CONTAINS
!————————————————————————————-
SUBROUTINE sub_f_gamma_1_2 (f_gamma, alpha)
!Esegue l’integrale necessario al calcolo della funzione gamma !per valori compresi fra 1 e 2 e registra questi valori in un array. !L’integrazione viene eseguita con il metodo di Cavalieri-Simpson
IMPLICIT NONE
!dichiaro gli argomenti fittizi
REAL(KIND=num),INTENT(OUT),DIMENSION(1:400):: f_gamma !valori della funz. gamma in [1,2] REAL(KIND=num),INTENT(OUT),DIMENSION(1:400):: alpha !contiene l’ascissa della funzione gamma
!dichiaro le variabili locali
REAL(KIND=num),DIMENSION(1:1000000):: funzione !è la funzione integranda REAL(KIND=num),DIMENSION(1:1000000):: primitiva !è l’integrale illimitato REAL(KIND=num),PARAMETER:: d_x = 0.001 !intervallo di integrazione INTEGER:: i,j !indice del ciclo INTEGER:: index = 1 !contiene l’estremo superiore della integrazione numerica
!sezione esecutiva
!assegno i valori di funzione e ascissa, eseguo l’integrazione numerica, assegno i valori di f_gamma_1_2
alpha(101) = 1. !assegna il primo valore di alpha alpha(102) = alpha(101)+d_alpha !assegna il primo valore di alpha f_gamma(101) = 1. !assegna il primo valore di gamma
ciclo_esterno: DO j=102,200,1 !assegna i valori a f_gamma_1_2
!assegna i valori a funzione
ciclo_funzione: DO i=1,1000000,1
funzione (i) = ( (d_x*REAL(i))**(alpha(j)-1.) )*( e**((-1)*d_x*REAL(i)) ) index = i IF (funzione(i)<=0.0001) EXIT ciclo_funzione
f_gamma(j) = primitiva(index) !assegna il valore j-mo a f_gamma_1_2
alpha (j+1) = alpha (j) + d_alpha
END DO ciclo_esterno
WRITE(*,*) alpha WRITE(*,*) f_gamma
RETURN
END SUBROUTINE sub_f_gamma_1_2
!————————————————————————————-
END MODULE mod_f_gamma
4_The distribution function
The distribution function of the gamma density cannot be calculated analytically so, again, I had to make good use of my poor coding skills and I wrote a code in Octave (here, it uses the two .mat files generated by the code in Fortran mentioned above!) and one in Fortran (available here). In figure 2 (code in Octave) you can find the density and the distribution function for different values of α.
Figure 2. Densities (left) and distribution functions (right) for α = 1.5, 2.0, 3.5 and λ = 1.5.
The code in Fortran, on the other hand, asks the user for the values of α and λ and gives a plot for the density and one for the distribution function. Moreover, it gives a .txt file with the values of the distribution function. Note that you can only place values for α that are below 25, otherwise you generate an overflow condition. I am not sure why you have overflow condition with a value of α that is below 34, but this is another story. Nevertheless, for those who are interested, Γ(n) = (n-1)! and the biggest factorial that can be handled by a computer in single precision mode (where 8 bits are used for the exponent) is 33!. So, we have a little mystery here… The reason why I have stuck with single precision is that DISLIN – the graphics library – seems to be unable to handle double precision numbers. If you are interested in the gamma function, the density and the distribution function for α = 20 and λ=2, then the code in Fortran gives for Γ(20) a value of 1.216451E+17 and the two plots in figure 3.
Figure 3. Density and distribution function for α = 20 and λ=2.
In what follows you find the scripts of the Fortran .exe mentioned in this paragraph.
PROGRAM main_legge_gamma USE DISLIN !libreria grafica USE mod_legge_gamma !modulo con procedure
IMPLICIT NONE
REAL(KIND=num):: alph, lamb !valore di alpha e di lambda REAL(KIND=num):: f_g !il valore della funzione gamma REAL(KIND=num):: estremo !estremo superiore di x REAL(KIND=num):: delta !intervallo di integrazione numerica REAL(KIND=num), DIMENSION (1:500000):: f_gamma !valori della funzione gamma fra 0 e 5000 REAL(KIND=num), DIMENSION (1:500000):: alpha !valori della ascissa fra 0 e 5000 REAL(KIND=num), DIMENSION (1:1000):: d_g !contiene la densita’ REAL(KIND=num), DIMENSION (1:1000):: f_r !funzione di ripartizione REAL(KIND=num), DIMENSION (1:1000):: x !ascissa CHARACTER(len=10):: chiusura !serve a chiudere il programma INTEGER:: i,j,k !indici di ciclo INTEGER:: ind !il valore di alpha in cui calcolo la funz. gamma
!calcolo della funzione gamma fra 1 e 2
CALL sub_f_gamma_1_2 (f_gamma,alpha) !calcola la funzione gamma fra 1 e 2
WRITE(*,*) “This program calculates the gamma distribution, for alpha and lambda specified” WRITE(*,*) “by the user. It plots the diagram of the density and distribution function, and” WRITE(*,*) “writes a file .txt with some values of the distribution function.” WRITE(*,*) ” “ WRITE(*,*) “Please, type the value of alpha (a positive number between 0.01 and 25.0).” WRITE(*,*) “Use dot as decimal separator.” READ(*,*) alph WRITE(*,*) ” “ WRITE(*,*) “Please, type the value of lambda (a positive number).” WRITE(*,*) “Use dot as decimal separator.” READ(*,*) lamb
!calcolo della funzione gamma e della ascissa fra 2 e 25
ciclo_2_5000: DO i=201,2500,1 alpha(i) = 1. + alpha(i-100) f_gamma(i) = alpha(i-100)*f_gamma(i-100) END DO ciclo_2_5000
!calcolo della funzione gamma e della ascissa fra 0 e 1
ciclo_0_1: DO i=2,100,1 alpha(i) = alpha(100+i)-1. f_gamma(i) = f_gamma(100+i)/alpha(i) END DO ciclo_0_1
!cerco il valore di alpha specificato dall’utente
ciclo_alph: DO i=1,2500,1 ind = i IF (ABS(alpha(i)-alph)<=0.01) EXIT ciclo_alph END DO ciclo_alph
!assegno il valore alla funzione gamma
f_g = f_gamma(ind)
WRITE(*,*) “The value of the gamma function in”, alph, “is”, f_g
!calcolo la densita’ gamma
IF (alph <= 1) THEN estremo = 2. delta = estremo/1000. ciclo_densita_1: DO k=1,1000,1 x(k)= delta*k d_g(k) = ( ( lamb**alph )/( f_g ) )*( x(k)**( alph – 1. ) )*( e**( (-1)*lamb*x(k) ) ) END DO ciclo_densita_1 ELSE estremo = ( (alph-1.)/lamb )*3. delta = estremo/1000. ciclo_densita_1: DO k=1,1000,1 x(k)= delta*(k-1.) d_g(k) = ( ( lamb**alph )/( f_g ) )*( x(k)**( alph – 1. ) )*( e**( (-1)*lamb*x(k) ) ) END DO ciclo_densita_1 END IF
WRITE(20,*) “The gamma distribution for alpha =”, alph, “and lambda =”, lamb WRITE(20,*) ” “ WRITE(20,*) “The value of the gamma function in”, alph, “is”, f_g WRITE(20,*) ” “ WRITE(20,*) “distribution function” WRITE(20,*) ” “ WRITE(20,*) ” x “,”|”,”distribution function” WRITE(20,*) “_________________________________”
ciclo_write: DO j=1,1000,1 WRITE(20,100) x(j),”|”,f_r(j) END DO ciclo_write 100 FORMAT (1X,1F8.4,A,1F8.4)
!disegno il grafico della densita’
CALL diagramma (x, d_g, ‘Density ‘)
!disegna il grafico della funzione di ripartizione
CALL diagramma (x, f_r, ‘distribution function’)
WRITE(*,*) “To close this program press any key, then press RETURN.” READ(*,*) chiusura
END PROGRAM main_legge_gamma
!Data: 1/03/2018
MODULE mod_legge_gamma
!sezione dichiarativa
IMPLICIT NONE
!costanti
REAL,PARAMETER:: e = 2.71828182 !numero di Nepero REAL,PARAMETER:: d_alpha = 0.01 !incremento delle ascisse della funzione gamma INTEGER,PARAMETER:: num = 1 !singola precisione
CONTAINS
!————————————————————————————-
SUBROUTINE sub_f_gamma_1_2 (f_gamma, alpha)
!Esegue l’integrale necessario al calcolo della funzione gamma !per valori compresi fra 1 e 2 e registra questi valori in un array. !L’integrazione viene eseguita con il metodo di Cavalieri-Simpson
IMPLICIT NONE
!dichiaro gli argomenti fittizi
REAL(KIND=num),INTENT(OUT),DIMENSION(:):: f_gamma !valori della funz. gamma in [1,2] REAL(KIND=num),INTENT(OUT),DIMENSION(:):: alpha !contiene l’ascissa della funzione gamma
!dichiaro le variabili locali
REAL(KIND=num),DIMENSION(1:1000000):: funzione !è la funzione integranda REAL(KIND=num),DIMENSION(1:1000000):: primitiva !è l’integrale illimitato REAL(KIND=num),PARAMETER:: d_x = 0.001 !intervallo di integrazione INTEGER:: i,j !indice del ciclo INTEGER:: index = 1 !contiene l’estremo superiore della integrazione numerica
!sezione esecutiva
!assegno i valori di funzione e ascissa, eseguo l’integrazione numerica, assegno i valori di f_gamma_1_2
alpha(101) = 1. !assegna il primo valore di alpha alpha(102) = alpha(101)+d_alpha !assegna il primo valore di alpha f_gamma(101) = 1. !assegna il primo valore di gamma
ciclo_esterno: DO j=102,200,1 !assegna i valori a f_gamma_1_2
!assegna i valori a funzione
ciclo_funzione: DO i=1,1000000,1
funzione (i) = ( (d_x*REAL(i))**(alpha(j)-1.) )*( e**((-1)*d_x*REAL(i)) ) index = i IF (funzione(i)<=0.0001) EXIT ciclo_funzione
f_gamma(j) = primitiva(index) !assegna il valore j-mo a f_gamma_1_2
alpha (j+1) = alpha (j) + d_alpha
END DO ciclo_esterno
RETURN
END SUBROUTINE sub_f_gamma_1_2
!————————————————————————————-
SUBROUTINE diagramma (x, funzione,label_y)
!disegno il grafico di funzione
!dichiaro gli argomenti fittizi
REAL(KIND=num),INTENT(IN),DIMENSION(1:1000):: x !ascissa REAL(KIND=num),INTENT(IN),DIMENSION(1:1000):: funzione !funzione CHARACTER (len=20), INTENT(IN):: label_y !nome delle ordinate
!dichiaro le variabili locali
REAL(KIND=num),DIMENSION(1:1000):: max_x !estremo ascissa REAL(KIND=num),DIMENSION(1:1000):: max_funz !estremo funzione
CALL METAFL (‘PDF’) !scelgo il formato dell’output CALL SETPAG (‘DA4P’) !formato della pagina CALL SCRMOD (‘revers’) !scritta nera su fondo bianco CALL DISINI !inizializza DISLIN impostando parametri di default
CALL PAGERA !chiedo una linea di contorno per il piano xy CALL DUPLX !chiedo un font a doppio spessore
CALL AXSPOS (300,1400) !coordinate dell’angolo in basso a sinistra CALL AXSLEN (1500,1000) !la lunghezza, in pixel, dei due assi
CALL LABDIG (2,’X’) !numero di cifre decimali negli assi CALL LABDIG (3,’Y’) !numero di cifre decimali negli assi
CALL NAME (‘X’,’x’) !assegno un nome all’asse delle x CALL NAME (label_y,’y’) !assegno un nome all’asse delle y
CALL GRAF (0.,max_x,0.,max_x/5.,0.,max_funz+max_funz/5.,0.,max_funz/5.) !ambiti degli assi CALL TEXMOD (‘ON’) !modalita’ TeX CALL GRID (2,2) !impone una griglia CALL CURVE (x,funzione,1000)
CALL DISFIN !termina DISLIN
END SUBROUTINE diagramma
!————————————————————————————-
END MODULE mod_legge_gamma
5_The hazard rate
As mentioned in paragraph 2, the hazard rate of the gamma distribution cannot be easily studied. I forgot to mention that it is relatively easy to prove that it is constant for α=1. But what happens for α>1 and for α<1? In order to answer, I have written a code (download) that plots it and it seems to be possible to say that the hazard rate increases for α>1 and decreases for α<1. This means that the gamma distribution is suitable for modeling each of the three kinds of patterns for the probability of failure as the time passes by. In figure 4 you find an example of the hazard rate for α<1 (left) and an example for α>1 (right).
Figure 4. Hazard rate for α = 0.8, λ = 1.5 (left) and α = 3.0, λ=1.5 (right).
This is the script of the Fortran exe program mentioned in this paragraph:
!Data: 7/03/2018
PROGRAM main_legge_gamma_avaria
USE DISLIN !libreria grafica USE mod_legge_gamma_avaria !modulo con procedure
IMPLICIT NONE
REAL(KIND=num):: alph, lamb !valore di alpha e di lambda REAL(KIND=num):: f_g !il valore della funzione gamma REAL(KIND=num):: estremo !estremo superiore di x REAL(KIND=num):: delta !intervallo di integrazione numerica REAL(KIND=num), DIMENSION (1:500000):: f_gamma !valori della funzione gamma fra 0 e 5000 REAL(KIND=num), DIMENSION (1:500000):: alpha !valori della ascissa fra 0 e 5000 REAL(KIND=num), DIMENSION (1:1000):: d_g !contiene la densita’ REAL(KIND=num), DIMENSION (1:1000):: f_r !funzione di ripartizione REAL(KIND=num), DIMENSION (1:1000):: x !ascissa REAL(KIND=num), DIMENSION (1:1000):: h !tasso di avaria CHARACTER(len=10):: chiusura !serve a chiudere il programma INTEGER:: i,j,k !indici di ciclo INTEGER:: ind !il valore di alpha in cui calcolo la funz. gamma
!calcolo della funzione gamma fra 1 e 2
CALL sub_f_gamma_1_2 (f_gamma,alpha) !calcola la funzione gamma fra 1 e 2
WRITE(*,*) “This program calculates the gamma distribution, for alpha and lambda specified” WRITE(*,*) “by the user. It plots the diagram of density, distribution function, failure rate” WRITE(*,*) “and writes a file .txt with some values of the distribution function.” WRITE(*,*) ” “ WRITE(*,*) “Please, type the value of alpha (a positive number between 0.01 and 25.0).” WRITE(*,*) “Use dot as decimal separator.” READ(*,*) alph WRITE(*,*) ” “ WRITE(*,*) “Please, type the value of lambda (a positive number).” WRITE(*,*) “Use dot as decimal separator.” READ(*,*) lamb
!calcolo della funzione gamma e della ascissa fra 2 e 25
ciclo_2_5000: DO i=201,2500,1 alpha(i) = 1. + alpha(i-100) f_gamma(i) = alpha(i-100)*f_gamma(i-100) END DO ciclo_2_5000
!calcolo della funzione gamma e della ascissa fra 0 e 1
ciclo_0_1: DO i=2,100,1 alpha(i) = alpha(100+i)-1. f_gamma(i) = f_gamma(100+i)/alpha(i) END DO ciclo_0_1
!cerco il valore di alpha specificato dall’utente
ciclo_alph: DO i=1,2500,1 ind = i IF (ABS(alpha(i)-alph)<=0.01) EXIT ciclo_alph END DO ciclo_alph
!assegno il valore alla funzione gamma
f_g = f_gamma(ind)
WRITE(*,*) “The value of the gamma function in”, alph, “is:”, f_g
!calcolo la densita’ gamma
IF (alph <= 1) THEN estremo = 2. delta = estremo/1000. ciclo_densita_1: DO k=1,1000,1 x(k)= delta*k d_g(k) = ( ( lamb**alph )/( f_g ) )*( x(k)**( alph – 1. ) )*( e**( (-1)*lamb*x(k) ) ) END DO ciclo_densita_1 ELSE estremo = ( (alph-1.)/lamb )*3. delta = estremo/1000. ciclo_densita_1: DO k=1,1000,1 x(k)= delta*(k-1.) d_g(k) = ( ( lamb**alph )/( f_g ) )*( x(k)**( alph – 1. ) )*( e**( (-1)*lamb*x(k) ) ) END DO ciclo_densita_1 END IF
WRITE(20,*) “The gamma distribution for alpha =”, alph, “and lambda =”, lamb WRITE(20,*) ” “ WRITE(20,*) “The value of the gamma function in”, alph, “is”, f_g WRITE(20,*) ” ” WRITE(20,*) “distribution function” WRITE(20,*) ” “ WRITE(20,*) ” x “,”|”,”distribution function” WRITE(20,*) “__________________________________”
ciclo_write: DO j=1,1000,1 WRITE(20,100) x(j),”|”,f_r(j) END DO ciclo_write 100 FORMAT (1X,1F8.4,A,1F8.4)
!calcolo il tasso di avaria
ciclo_avaria: DO k=1,1000,1 h(k) = d_g(k)/( 1. – f_r(k) ) END DO ciclo_avaria
!disegno il grafico della densita’
CALL diagramma (x, d_g, ‘Density ‘)
!disegna il grafico della funzione di ripartizione
CALL diagramma (x, f_r, ‘distribution function’)
!disegno il grafico del tasso di avaria
CALL diagramma (x, h, ‘Failure rate ‘)
WRITE(*,*) “To close this program press any key, then press RETURN.” READ(*,*) chiusura
END PROGRAM main_legge_gamma_avaria
!Data: 1/03/2018
MODULE mod_legge_gamma_avaria
!sezione dichiarativa
IMPLICIT NONE
!costanti
REAL,PARAMETER:: e = 2.71828182 !numero di Nepero REAL,PARAMETER:: d_alpha = 0.01 !incremento delle ascisse della funzione gamma INTEGER,PARAMETER:: num = 1 !aumento il numero di cifre significative
CONTAINS
!————————————————————————————-
SUBROUTINE sub_f_gamma_1_2 (f_gamma, alpha)
!Esegue l’integrale necessario al calcolo della funzione gamma !per valori compresi fra 1 e 2 e registra questi valori in un array. !L’integrazione viene eseguita con il metodo di Cavalieri-Simpson
IMPLICIT NONE
!dichiaro gli argomenti fittizi
REAL(KIND=num),INTENT(OUT),DIMENSION(:):: f_gamma !valori della funz. gamma in [1,2] REAL(KIND=num),INTENT(OUT),DIMENSION(:):: alpha !contiene l’ascissa della funzione gamma
!dichiaro le variabili locali
REAL(KIND=num),DIMENSION(1:1000000):: funzione !è la funzione integranda REAL(KIND=num),DIMENSION(1:1000000):: primitiva !è l’integrale illimitato REAL(KIND=num),PARAMETER:: d_x = 0.001 !intervallo di integrazione INTEGER:: i,j !indice del ciclo INTEGER:: index = 1 !contiene l’estremo superiore della integrazione numerica
!sezione esecutiva
!assegno i valori di funzione e ascissa, eseguo l’integrazione numerica, assegno i valori di f_gamma_1_2
alpha(101) = 1. !assegna il primo valore di alpha alpha(102) = alpha(101)+d_alpha !assegna il primo valore di alpha f_gamma(101) = 1. !assegna il primo valore di gamma
ciclo_esterno: DO j=102,200,1 !assegna i valori a f_gamma_1_2
!assegna i valori a funzione
ciclo_funzione: DO i=1,1000000,1
funzione (i) = ( (d_x*REAL(i))**(alpha(j)-1.) )*( e**((-1)*d_x*REAL(i)) ) index = i IF (funzione(i)<=0.0001) EXIT ciclo_funzione
f_gamma(j) = primitiva(index) !assegna il valore j-mo a f_gamma_1_2
alpha (j+1) = alpha (j) + d_alpha
END DO ciclo_esterno
RETURN
END SUBROUTINE sub_f_gamma_1_2
!————————————————————————————-
SUBROUTINE diagramma (x, funzione,label_y)
!disegno il grafico di funzione
!dichiaro gli argomenti fittizi
REAL(KIND=num),INTENT(IN),DIMENSION(1:1000):: x !ascissa REAL(KIND=num),INTENT(IN),DIMENSION(1:1000):: funzione !funzione CHARACTER (len=20), INTENT(IN):: label_y !nome delle ordinate
!dichiaro le variabili locali
REAL(KIND=num),DIMENSION(1:1000):: max_x !estremo ascissa REAL(KIND=num),DIMENSION(1:1000):: max_funz !estremo funzione
CALL METAFL (‘PDF’) !scelgo il formato dell’output CALL SETPAG (‘DA4P’) !formato della pagina CALL SCRMOD (‘revers’) !scritta nera su fondo bianco
CALL DISINI !inizializza DISLIN impostando parametri di default
CALL PAGERA !chiedo una linea di contorno per il piano xy CALL DUPLX !chiedo un font a doppio spessore
CALL AXSPOS (300,1400) !coordinate dell’angolo in basso a sinistra CALL AXSLEN (1500,1000) !la lunghezza, in pixel, dei due assi
CALL LABDIG (2,’X’) !numero di cifre decimali negli assi CALL LABDIG (3,’Y’) !numero di cifre decimali negli assi
CALL NAME (‘X’,’x’) !assegno un nome all’asse delle x CALL NAME (label_y,’y’) !assegno un nome all’asse delle y
CALL GRAF (0.,max_x,0.,max_x/5.,0.,max_funz+max_funz/5.,0.,max_funz/5.) !ambiti degli assi CALL TEXMOD (‘ON’) !modalita’ TeX per le formule CALL GRID (2,2) !impone una griglia CALL CURVE (x,funzione,1000)
CALL DISFIN !termina DISLIN
END SUBROUTINE diagramma
!————————————————————————————-
END MODULE mod_legge_gamma_avaria
A table for the gamma function in [0.01, 3.99] is reported below. In each column, the value of α is indicated on the left while the corresponding value for Γ(α) is indicated on the right.
I am currently trying to learn as much as I can about basic statistics; and one of the reasons for that is the need to gain access to the tools useful to better understand statistical analyses of medical data that I find in all the research papers on ME/CFS and related topics, which I read almost on a daily basis. Hopefully, I should also be able, at the end of this path, to perform my own statistical analyses of biological data.
When I study a subject where it is required the use of computer-aided calculation, I usually code my own software, even if there are programs ready to use for that purpose. The main reason for that is that in order to write a program that really works, you have to reach a complete understanding of the argument. So, it is a test in some way: if your program works properly, then you have learned something new. The other reason is that I like this activity, even though I can’t perform it for most of the time, because of the cruel encephalopathy due to my mysterious disease.
To give you an example of this attitude that I have, when I decided to study molecular mimicry between B-cell epitopes from a mathematical point of view, I found several programs for peptides alignments; nevertheless I decided to write my own software, and I spent several months of pure joy (but also of profound frustration because I was often too sick to write even a single line of code), learning about substitution matrices, dynamic programming and performing my own in silico experiments of cross-reactivity between enolases (a glycolitic enzyme evolutionary conserved across species) from different organisms. This work was then used in a paper recently published in the scientific journal Medical Hypotheses (Maccallini P et al. 2018). In figure 1 the flow diagram that I drew, while here and here you can download the main program and the subroutine, respectively, that I coded in Octave in order to perform comparisons between proteins. But this is another story… So, let’s come back to statistics.
Figure1. The flow diagram of my own software for the search of cross-reactive B cell epitopes in two proteins.
Many measurements of physical or biological phenomena obey to what is known as normal distribution: indicated with x the value of a specific measure, we have that values near the mean (also known as expectation and indicated with the letter μ) are more frequently assumed by x. In particular, if we indicate f(t) a function (named density) such that f(t)dt represents the probability that x takes a value between t and t+dt, then f(t) is the well known bell-shaped curve in figure 2 (on the left). As you can see, the higher the standard deviation σ is, the lower the bell: this means that standard deviation represents a measure of how much the values taken by x are spread on the x-axis.
Figure 2. Density (left) and distribution function (right) for some values of expectation and standard deviation for a normally distributed random variable.
Figure 2 has been obtained with a code that I have written in Octave, which can be downloaded here. So, what is the probability that x takes a value below a? This probability is given by the integral of f(t) between -∞ and a. This function is named distribution function, Φ(x). In figure 2, on the right, Φ(x) is plotted for different values of expectation and standard deviation. I have calculated these integrals using the numerical method known as Simpson’s rule. For μ=0 and σ=1 we have a particular type of normal distribution, usually called standard normal distribution. Some values of its distribution function are collected in the table in figure 3, that I have derived with this code in Octave. A table like this is available in any book of statistics, but I found useful to calculate my own table, for the reasons we have already mentioned.
Figure 3. The distribution function for a standard normal distribution.
Now, what about the distribution function for a normal non-standard distribution? Of course, you can calculate Φ(x) for any given values of μ and σ using available tools (like, for instance, function normcdf(x, m, s) of Octave). But, as always, I have coded my own software (in Fortran in this case). You can download the executable file here. Put it in a folder, then run it: it will ask you to type the values of μ, σ and it will give as output two diagrams (one for f, the other for Φ) and a .txt file which contains several values of Φ that can be used for applications. For μ=2, σ=0.5 we obtain the diagrams in figure 4, and a table whose first lines are in figure 5.
Figure 4. Density (left) and distribution function (right) for μ=2, σ=0.5 for a random variable with normal distribution.
Cercare similitudini fra sequenze di peptidi è estremamente utile per stabilire rapporti evolutivi fra proteine (e quindi fra gli esseri viventi che le sintetizzano), per progettare vaccini, per studiare fenomeni di autoimmunità e altro. In questo post presento un software che si occupa proprio del confronto fra due sequenze proteiche. La scrittura di questo programma mi ha tenuto compagnia durante molti mesi, segnandomi la rotta fra le costanti ricadute e riacutizzazioni della mia malattia, malattia che fin’ora non mi ha ancora abbandonato per un solo giorno. Mentre scrivevo questo e altri codici, che ne sono lo sviluppo, inventavo anche un modo per aspettare il domani.
Figura 1. Una sequenza di tre amminoacidi, con in evidenza il legame peptidico e l’angolo psi del legame fra il carbonio C-alpha e il C-beta di un amminoacido. Disegno di Paolo Maccallini.
Needleman e Wunsch
L’algoritmo di Needleman-Wunsch descrive un procedimento automatico che consente di calcolare il migliore allineamento possibile fra due sequenze di amminoacidi (Needleman SB, Wunsch CD, 1969). Questo metodo permette di svolgere in modo relativamente veloce e ingegnoso il confronto fra tutti gli allineamenti fra le due sequenze, considerando ogni possibile numero di lacune, in ogni possibile posizione. Il suo scopo è quello di scegliere fra questi allineamenti il migliore, ovvero quello che garantisce il ‘punteggio’ più alto, essendo tale punteggio calcolato utilizzando delle matrici quadrate di dimensione 20, dette ‘matrici di sostituzione’. Questo compito è non banale e comporta l’esame di un numero di allineamenti pari a
dove k e m sono le lunghezze dei due peptidi. Si tratta di numeri molto elevati, infatti posto ad esempio k=4 e m=6, si ottengono circa 215 allineamenti diversi. In genere però si ha a che fare con peptidi di centinaia di amminoacidi, il che comporta milioni di possibili allineamenti tra cui cercare il migliore. Ebbene, l’algoritmo di Needleman e Wunsch permette di effettuare questa analisi senza dover considerare direttamente ogni allineamento possibile. In figura 2 e 3 si ha una descrizione grafica di questo algoritmo.
Figura 2. L’algoritmo di Needleman-Wunsch, con l’indicazione delle variabili che ho usato nel mio codice per implementarlo. Disegno di Paolo Maccallini.
Figura 3. La matrice TBM (trace back matrix) per uno specifico allineamento. Disegno di Paolo Maccallini.
Il mio programma
Il mio software per l’allineamento globale fra due proteine si chiama NeW_6 ed è scritto in Octave. Il programma presenta la stessa funzionalità di due analoghi prodotti di largo impiego, che sono LALIGN dello Swiss Institute of Bioniformatics e EMBOSS Needle dell’Europen Bioinformatic Institute. In particolare il mio programma ha le seguenti caratteristiche:
permette all’utente di scegliere fra un set di comuni matrici di sostituzione;
ha un modello di lacuna del tipo a+b(x), dove a è la penalità della lacuna iniziale, b quella delle lacune di estensione e x è il numero di lacune;
prevede lacune alla fine delle sequenze.
Lo sviluppo del programma, nonché il codice, si trovano in questo PDF, dove è possibile seguire vari esempi applicativi e varie versioni del programma stesso, nonché dei test in cui ne confronto l’output con i programmi attualmente in uso.
Applicazioni
Perché scrivere un programma che esiste già? Mi è servito per penetrare i metodi di allineamento fra due sequenze di amminoacidi, ma soprattutto mi ha permesso di costruire programmi più complessi (non inclusi nel PDF di cui sopra) che sto attualmente utilizzando per risolvere problemi di immunologia.