# P-value threshold for genetic studies

“This it seems to me is characteristic of most judgments involving physical processes — the law of the excluded middle is not a valid description of our actual physical experience — there has to be a third category of doubtful, in addition to positive or negative.”

P. W. Bridgman, The Nature of Physical Theory

1) The multiple testing issue

With the emergence of titanic biological datasets generated by the latest omics technologies, the problem of correction of the significance threshold (the conventional $\alpha = 0.05$) for multiple testing has required a careful revision and update. The simplest way to correct for multiple testing is the Bonferroni correction (a derivation is provided in par. 5): in this method, the threshold for statistical significance is given by

$1)\;\;p\,<\,\frac{\alpha}{m}$

where the denominator represents the number of independent variables. Now, in the case of genetic testing, how should we set the value of $m$? Let’s say that we are comparing two populations of individuals (one with a disease, the other made up by healthy subjects) and we have genotyped N variants in each one of them. Once we have associated a p value to every single variant, we must face the problem of correcting it for multiple testing by using Eq. 1. What should we do? Well, it is simple, we put $m\,=\,N$, right? Wrong!

In general, variants belonging to the same chromosome are not inherited in an independent fashion: we know that whole chunks of chromosomes of about 50 kb pass from one generation to the other. These blocks are called haplotypes and there has been a considerable effort in classifying them for various human populations in the last 20 years. A map of human haplotypes is now available (HapMap) and it allows for the calculation of the real number of independent variants (often called tag SNPs) on the human Genome, and for other pivotal purposes I won’t write here about, like for instance imputation (a technique that dramatically reduces the number of variants we must genotype in order to scan the genome of a human being) (The International HapMap Consortium 2005), (The International HapMap Consortium, 2009). Visit the page of the International HapMap Consortium here. For us, what is important here is the fact that the knowledge of the haplotypes gives us a measure of the linkage disequilibrium between any possible pair of variants on the same chromosome. But what is linkage disequilibrium? Let’s say that we have two loci $L_1$ and $L_2$ on the same chromosome. $L_1$ has the two alleles $A$ and $a$, while $L_2$ presents the two alleles $B$ and $b$ of frequencies $f(A),\,f(a),\,f(B),\,f(b)$, respectively. Let’s define now the random variable $X$ whose value is one if $L_1$ is occupied by $A$, zero otherwise; similarly, let’s define the random variable $Y$ such that it is one if $L_2$ is occupied by $B$, zero otherwise. We can say that both these variables have a Bernoulli distribution with expectations given by $E[X]\,=\,f(A)$ and $E[Y]\,=\,f(B)$, respectively. If we now define $f(A,\,B)$ the frequency of the haplotype $AB$, and if we similarly define $f(A,\,b),\,f(a,\,b),\,f(a,B)$, we can then write the following relationships:

$2)\;\;\begin{cases} f(Ab)\,=\,f(A)\,-\,f(aB)\\f(aB)\,=\,f(B)\,-\,f(AB)\\f(ab)\,=\,1\,-\,f(B)\,-\,f(A)\,+\,f(AB)\\f(AB)\,+\,f(Ab)\,+\,f(aB)\,+\,f(ab)\,=\,1\\f(A)\,+\,f(a)\,=\,f(B)\,+\,f(b)\,=\,1 \end{cases}\$

where we have considered that $f(A,\,B)\,=\,E[XY]$, $f(A,\,b)\,=\,E[X(1-Y)]\,=$ $E[X]\,-\,E[XY]\,...$. That said, the covariance between $X,\,Y$ is given by:

$3)\;\; Cov[X,\,Y]\,=\,E[X\,Y]\,-\,E[X]E[Y]\,=$

$\,=\, f(AB)\,-\,f(A)f(B)\,=\,f(Ab)\,-\,f(A)f(b)\,=$

$=\,f(aB)\,-\,f(a)f(B)\,=\,f(ab)\,-\,f(a)f(b)$

Or also (with some algebraic manipulation, by substituting the relationship of Eq. 2):

$4)\;\;Cov[X,\,Y]\,=\,f(AB)f(ab)\,-\,f(Ab)f(aB)$

That said, one definition of Linkage disequilibrium is just

$5)\;\;d\,=\,Cov[X,\,Y]$

This definition presents a problem, namely the fact that when we have a complete dependence between $L_1$ and $L_2$, its value varies in $]0,0.25]$ (see Figure 1, Figure 2, and paragraph 6 for a detailed discussion) and this does not seem to have a particular biological meaning. It would be more interesting to have a definition such that Linkage disequilibrium assumes a unique value (for example 1) when we have complete dependence between the two loci. Note that we can accomplish this by considering the correlation coefficient between $X\,=\,Y$ instead, which is given by:

$6)\;\;\rho_{XY}\,=\,\frac{Cov[X,\,Y]}{\sqrt {Var(X)Var(Y)}}\,=\frac{d}{\sqrt {Var(X)Var(Y)}},$

Thus, in the present article we consider the following definition for linkage disequilibrium:

$7)\;\;linkage\,disequilibrium\,=\,\rho^2\,=\,\frac{d^2}{f(A)f(a)f(B)f(b)}$

where we have considered that $Var[X]\,=\,E[X](1-E[X])\,=\,f(A)f(a)$ and $Var[Y]\,=\,E[Y](1-E[Y])\,=\,f(B)f(b)$. For further details on this topic, go to paragraph 6. See also (Reginald D Smith 2020).

3) The search for the number of independent loci

Consider now that the number of loci in linkage equilibrium one with the other is the value m in Eq. 1, the one we are interested in. From an operational point of view, how can we determine if two loci are in linkage equilibrium so that they can be considered independent one from the other? We can randomly build the genome of 2,000 human beings with the only restrain that haplotypes must be respected (this is where the HapMap comes into play), we then randomly assign them to two populations and we calculate for each locus the p value for the comparison of these two populations. Let’s say that locus $L_i$ has a p value $p_i$. Since these two groups are identical by construction, it must be $p\,>\,\frac{\alpha}{m}$ or, in other words:

$8)\;\;m\,>\,\frac{\alpha}{p_i}$

If we repeat this inequality for each locus, we have a lower bound for m. That number is the number of independent loci we were searching for (Pe’er, I. et al. 2008).

4) Experimental results

I discuss here the calculation of m performed in a recent study, the best I have been able to find on this topic (Fadista, J. et al. 2016). The strength of this study is that they calculated m for two kinds of sequencing techniques: whole-genome sequencing (WGS) and whole-exome sequencing (WES). Not only that, but they also provided the results for different European and non-European populations, for different definitions of linkage disequilibrium, and for different thresholds for minor allele frequency (MAF). These results are plotted in their paper and they are also available in xlsx format (here, even though I must say that they omitted part of raw data on different ethnicities). Nevertheless, I have elaborated them in my own way (the script that draws the diagrams is available below) by using a different arrangement of the data and by providing a fancy polynomial interpolation, by cubic splines, as to have smooth and elegant surfaces and curves. In Figure 3 I plotted m in function of the threshold for the minor allele frequency (MAF) and for the linkage disequilibrium, in the case of whole-genome sequencing (WGS). In that plot, it is assumed that MAF is the lower bound for the minor allele frequency while the axis ‘Linkage disequilibrium’ is the upper limit for the value $\rho^2$ in Eq. 7. Figure 4 is exactly the same diagram, but in this case, it has been built by considering data from whole-exome sequencing (WES). Some observations: the number of independent variants is bigger in WGS than it is in WES (which is due to the fact that in the former case we are considering more genetic material than in the latter one); when we reduce MAF, m increases (reducing MAF means considering more variants, so again, this leads to a high number of independent variants); m decreases with Linkage disequilibrium (the threshold for considering two variants as independent one from the other). This last point deserves a further explanation: in this context, when we say that LD is, let’s say, 0.8 it means that we consider two variants to be independent when their linkage disequilibrium (calculated as in Eq. 7) is below 0.8; as we reduce this value, we define a more stringent criterion for independence between pair of variants, thus the number m must decrease.

If we consider the upper limit for $\rho^2$ to be 0.8, we obtain Table 1 and Table 2, where both m and corresponding Bonferroni corrections are collected, for various MAF thresholds.

In Figures 5, 6 you can find these same data after interpolation, arranged as a curve where p is expressed in function of m and MAF.

5) How to derive the Bonferroni correction

If we have a threshold $\alpha'\,=\,0.05$ for statistical significance and we perform m tests, then the probability that one test is positive by chance is $P\,<\,alpha'$ and the probability that one test is negative by chance is $1\,-\,P\,>\,1\,-\,\alpha'$. This also means that the probability that all the m tests are negative is $(1\,-\,P)^m\,>\,(1\,-\,\alpha')^m$. In conclusion, the probability that at least one test is positive pureley by chance is

$9)\;\;p\,=\,1\,-\,(1\,-\,P)^m\,<\,1\,-\,(1\,-\,\alpha')^m$

We want this probability to be lower than $\alpha\,=\,0.05$, which leads to the following inequality:

$10)\;\;\alpha'\,<\,1\,-\,\sqrt[m]{1\,-\,\alpha}$

This value $\alpha'$ is the corrected threshold for statistical significance, accorrding to the Bonferroni correction. But this is not the correction in Eq. 1, you might note. Yes, that is right. But let’s consider the well known expansion by series:

$11)\;\;(1\,-\,\alpha)^{\frac{1}{m}}\,=\,\sum_{k\,=\,0}^{\infty}(-\,\alpha)^k\begin{pmatrix} \frac{1}{m}\\k \end{pmatrix}$

If we take only the first two addends of the sum and if we substitute them in the expression already found for $\alpha'$, we obtain Eq. 1. The error (its absolute value) that we make when we consider Eq. 1 instead of Eq. 8 is the one plotted in Figure 7, in function of m, for $\alpha\,=\,0.05$.

6) Mathematical notes on linkage disequilibrium

By substituting the fourth relationship of Eq. 2 in Eq. 4, we obtain the surface

$f(ab)\,-\,f(ab)f(Ab)\,-\,f(ab)f(aB)\,-$

$-\,f^2(ab)\,-\,f(Ab)f(aB)\,-\,d\,=\,0$

which is a quadratic form in the space of the frequencies of the haplotypes ab, Ab, aB. More precisely, and even though this is of no importance for us, this is a hyperbolic cylinder (as the reader can easily verify by studying its associated matrix). If we now ask that $f(Ab)\,\in\,[0,1]$ and $f(AB)\,\in\,[0,1]$ for $f(aB)\,\in\,[0,1]$, $f(ab)\,\in\,[0,1]$, we find the relationships

$\begin{cases} d\,\geq\,-f(aB)\,+\,f(aB)f(ab)\,+\,f(aB)^2\\d\,\leq\,f(ab)\,+\,f(aB)f(ab)\,+\,f(aB)^2\\d\,\geq\,-\,f(aB)\,-\,f(aB)f(ab)\,-\,f(ab)^2\\d\,\leq\,f(ab)\,-\,f(aB)f(ab)\,-\,f(ab)^2\\f(aB)\,\in\,[0,1],\,f(ab)\,\in\,[0,1]\end{cases}$

Taken togheter, these inequations say that d can assume only the values included within the elegant surface in Figure 1. In particular, when $f(Ab)\,=\,f(aB)\,=\,0$ (a condition of perfect disequilibrium which goes under the name of coupling), then d assumes the values of the curve coloured blue, while when $f(AB)\,=\,f(ab)\,=\,0$ (repulsion) then d assumes the value on the curve coloured orange. These two courves are plotted in Figure 2 too. Note that $f(Ab)\,=\,0\,\Rightarrow\,f(aB)\,=\,0$; moreover, $f(AB)\,=\,0\,\Rightarrow\,f(ab)\,=\,0$ (this can be proved easily by using Eq. 2).

Note that the application of the method of the Lagrange multipliers for restrained optimization to Eq. 4, with the restrains defined by the fourth relationship of Eq. 2, would not lead to any conclusion in this case, because it is equal to the search for the extremes of a function whose gradient is never zero.

7) Supplementary Material

The script that plots Figures 1 and 2 can be downloaded here.

The script that plots Figures 3, 4, and 5 can be downloaded here.

The equations of this blog post were written using $\LaTeX$ (see this article)

# The lost illustration

Philosophy is written in a great book that is continually open before our eyes (that is the Universe), but it cannot be understood unless one first learns to decipher the language and the characters in which it is written. It is written in a mathematical language, and the characters are triangles, circles, and other geometric figures, without which it is impossible for us to understand a word of it; without them it is a vain wandering through a dark labyrinth.

Galileo Galilei, The Assayer

The geometry of the backbone (or main chain) of a peptide is completely described by a sequence of dihedral angles (also known as torsion angles): the angle Φ is the angle around the chemical bond between the alpha carbon of one amino acid and the azote of the same amino acid; the angle Ψ is the angle around the axis of the bond between the alpha carbon and the other carbon (I call it beta carbon, but this is a personal notation) of the same amino acid. This definition is incomplete, of course, because as we all know, a dihedral angle is defined not only by an axis but also by two planes that intersect in that axis. Not only that, these two angles have a sign, so we must specify a positive arrow of rotation. This can be done in several ways. The problem is that the most efficient way would be by just drawing the planes the torsion angles are defined by. Now, through the years I discovered that this drawing is usually lacking in books, leading to some confusion, particularly in those who study biochemistry without a particular interest in a quantitative description of molecular structures. In Figure 1 you find the graphical description of Φ and Ψ in a well-know book of biochemistry. Do you see the planes the angles are defined by? In Figure 2, two further examples of illustrations that are not suited at completely describing the two dihedral angles, from a two other well-know books, one about bioinformatics, the other one a booklet about protein structures. The second one is the best one, but you still have to mentally merge two drawings (FIG. 5 and FIG. 6) to get the full picture. I hope I won’t be sued…

Now, it is possible that I have just been unlucky with the books of my personal library. Or it may be that the illustration with the three planes we need to correctly define Φ and Ψ (we just need to add a plane to the amide planes), would not be clearly readable. I tried years ago to draw the planes for Ψ (the illustration is in this blog post) and I have now completed it with the other torsion angle, by drawing a tripeptide (Figure 3). It is just a handmade drawing, but I think it serves the scope. This is what I call the lost illustration.

# A second glance at the UK Biobank Database

Featured

Note. This blog post should be read on a laptop. With smaller screens, some of the tables cannot be correctly formatted.

Introduction: what has been found

The UK Biobank is a large-scale biomedical database that includes genetc data on 500,000 UK citizens (Sudlow, C et al. 2015). Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is among the phenotypes reported by participants. The total number of ME/CFS patients included is 1208 females and 451 males, with control groups of more than 150 k individuals each. For each individual 805,426 variants (both SNPs and short InDels) were genotyped. From them, up to 30 million variant can be imputed using haplotype maps. This is the biggest GWAS study on ME/CFS patients ever made, so far. Many research groups have analyzed this huge quantity of data and the conclusion, according to (Dibble J. et al. 2020), is that the only variants to be significantly different between patients and controls are included in gene SLC25A15, which encodes mitochondrial ornithine transporter I, particularly variant rs7337312 (position13:41353297:G:A on reference genome GRCh37, forward strand). But this difference is statistically significant only when we consider female patients vs female controls (Table 1). For a detailed discussion of this result, the best reference is the already mentioned (Dibble J. et al. 2020). I wrote an analysis myself on this same result and on its possible implications, available here. My focus, while writing that document, was also understanding the architecture of the metadata generated by using the UK Biobank, by Neale’s Lab (R), one of the research groups that performed the statistical analyses. It was not obvious, at the beginning, how I could derive for each variant genotyped a table like Table 1. In fact, it is the result of the elaboration of five files, four of them of considerable size. Once I realized how to read and manipulate these files, I manually derived the data I was interested in, at first. Then I decide to write a code that could do the work in an automated fashion (see below).

What has not been found

Now, even if we are certainly disappointed by the fact that the positive result of these analyses does not add much to what we already knew about ME/CFS, it is important to stress that negative results are also relevant. Why? Because we can rule out the huge amount of genetic associations made through the years between fatigue in general, and CFS in particular, and genetic variants (see for instance the encyclopedic review Whang T. et al. 2017). As an example, I will focus on the variants of TNF alpha gene collected in Table 2. In 2006, an Italian group found that the minor allele of variant rs1799724 (TNF-857) was significantly more frequent in 54 ME/CFS patients than in 224 controls ( Carlo-Stella N. et al. 2006). In 2020, a group from Germany reported a statistically significant difference for the same variant, but only when ME/CFS patients without infectious triggered onset were considered (Steiner S. et al. 2020). Two other variants of gene TNF alpha were associated with morning fatigue among oncology outpatients with breast, prostate, lung, or brain cancer ( Dhruva A. et al. 2015). Are these variants different between the 1659 ME/CFS patients and the about 300 k controls in the UK Biobank database?

To answer this question I have selected a suitable region of chromosome 6 (ch6:31,539,768 – ch6:31,546,495) that includes not only all the variants in Table 2, but the whole TNF gene. I then analyzed this region with a code I wrote in GNU Octave, version 6.3.0. This code generates three .csv files (one for each of the following groups: females only, males only, both sexes) with information on each one of the variants genotyped within the region mentioned (see Supplementary Material). An extract from these files, with only the variants collected in Table 2, can be seen in Table 3, Table 4, and Table 5. None of these variants reaches the statistically significant threshold, which for this kind of studies is considered to be p<5×10^(-8), for variants with a minor allele frequency above 5% (even lower for less common variants) (Fadista J et al. 2016).

Another way to conveniently visualize the differences between patients and controls (if any) is to plot p values in function of the position of the variants they refer to. We actually plot the log10 of 1/p or, in other words, -log10(p). This kind of diagram is known as Manhattan plot and in Figure 1 you see the manhattan plot for region ch6:31,539,768 – ch6:31,546,495, in the case of females + males. As you can see, none of the variants has a p value below 0.01. In Figure 2 you find the same plot, this time for females only (left) and males only (right).

Conclusions

In conclusion, these codes of mine provide as output the frequency of both the minor and the major allele among 1659 ME/CFS patients, stratified by sex, for about 15 million variants (including SNPs and InDels). They also provide the p value (beta test) for the comparison of these patients with about 300 k controls, along with general information on the variants (such as position, minor allele frequency among 500 k UK citizens, predicted consequences, etc). These codes do so by analyzing the metadata generated by Neale’s Lab, a research group that worked on the UK Biobank dataset. In this blog post, I have shown how to use these data to test previously published genetic associations between ME/CFS and gene TNF alpha.

Supplementary material

The files that I used as input for my codes are the following ones (click on them to download):

20002_1482.gwas.imputed_v3.both_sexes_TNF

20002_1482.gwas.imputed_v3.females_TNF

20002_1482.gwas.imputed_v3.males_TNF

The output generated by TNF.m (see below) consists in the following files (click on them to download) which were used to build tables from 3 to 5:

TNF_total_both_sexes

TNF_total_females

TNF_total_males

Details on these files can be found in this PDF. The code that generates the plots is pretty simple and can be downloaded here. The code that generates the three output files from the four input ones is the one that follows.

% file name = TNF
% it generates the file TNF_tot by using the package IO
% see https://octave.sourceforge.io/io/ for a description
clear all
close all
%-------------------------------------------------------------------------------
% Data source
%-------------------------------------------------------------------------------
% The collection of the analysis generated by Neale’s Lab with the UK Biobank
% data is available here:
% At lines 9 and 10 you can download a collection of the phenotypes and of some
% of the attributes of the relative sample, for females and males respectively
% (phenotypes.female.tsv, phenotypes.male.tsv). We are interested in phenotype 20002_1482
% (Non-cancer illness code, self-reported: chronic fatigue syndrome). In line 11
% you find a file that contains the details on each variant (variants.tsv). You
% must unzip the files before opening them. I have used File viewer plus for reading
% this file (Excel doesn’t seem to open it).
%-------------------------------------------------------------------------------
% Length and position of TNF
%-------------------------------------------------------------------------------
% The position of TNF goes from ch6:31,543,342 to ch6:31,546,113 on the reference
% genome GRCh37 (alias h19), according to NCBI (https://www.ncbi.nlm.nih.gov/gene/7124).
% The two variants we are interested in has a position ch6:31,542,482 (rs1799724, TNF-857)
% and ch6:31,543,031 (rs1800629), on GRCh37. So we must include them, along with
% the whole TNF gene. Since rs1799724 falls also in gene LTA, I include this gene
% as well (LTA goes from ch6:31,539,876 to ch6:31,542,101). Then a suitable interval
% in this dataset is ch6:31,539,768 to ch6:31,546,495.
%-------------------------------------------------------------------------------
% numbers of cases and controls
%-------------------------------------------------------------------------------
n_cases_f = 1208
n_controls_f = 192945
n_cases_m = 451
n_controls_m = 166537
n_cases_b = n_cases_f + n_cases_m
n_controls_b = n_controls_f + n_controls_m
%-------------------------------------------------------------------------------
% name of the gene
%-------------------------------------------------------------------------------
gene = 'TNF';
%-------------------------------------------------------------------------------
% number of variants = upper limit for range - 1
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
[filetype, sh_names, fformat, nmranges] = xlsfinfo (file_name);
strn = char(sh_names(1,2));
strn = substr(strn,5) % upper limit for range as a string
n = str2num(strn) % upper limit for range as a number
%-------------------------------------------------------------------------------
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
sheet = 'Sheet1'
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant = text;
% reading and storing the chromosome
range = strcat('B2:B',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
chr = num;
% reading and storing the position on chromosome
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pos = int32(num);
% reading and storing the ref allele
range = strcat('D2:D',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ref = text;
% reading and storing the alt allele
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
alt = text;
% reading and storing the rsid
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
rsid = raw;
% reading and storing the consequence
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence = text;
% reading and storing the consequence category
range = strcat('I2:I',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence_category = text;
% reading and storing the Alternate allele count (calculated using hardcall genotypes)
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC = int32(num);
% reading and storing the Alternate allele frequency (calculated using hardcall genotypes)
range = strcat('M2:M',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AF = double(num);
% reading and storing the Minor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele)
range = strcat('N2:N',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_allele = text;
% reading and storing the Minor allele frequency (calculated using hardcall genotypes)
range = strcat('O2:O',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF = double(num);
% reading and storing the Hardy-Weinberg p-value
range = strcat('P2:P',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
p_hwe = double(num);
% reading and storing the Number of samples with defined genotype at this variant
range = strcat('Q2:Q',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_called = int32(num);
% reading and storing the Number of samples without a defined genotype at this variant
range = strcat('R2:R',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_not_called = int32(num);
% testing by writing some of the data
i = 2
variant (i)
chr (i)
pos (i)
ref (i)
alt (i)
rsid (i)
consequence (i)
consequence_category (i)
AC (i)
AF (i)
minor_allele (i)
minor_AF (i)
p_hwe (i)
n_called (i)
n_not_called(i)
%-------------------------------------------------------------------------------
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.females_',gene,'.xlsx')
sheet = 'Sheet1'
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_f = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_f = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_f = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_f = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_f = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_f = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_f = double(num);
% testing by writing some of the data
i = 2
variant_f(i)
minor_AF_f(i)
low_confidence_variant_f(i)
n_complete_samples_f(i)
AC_f(i)
ytx_f(i)
pval_f(i)
%-------------------------------------------------------------------------------
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.males_',gene,'.xlsx')
sheet = 'Sheet1'
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_m = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_m = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_m = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_m = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_m = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_m = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_m = double(num);
% testing by writing some of the data
i = 2
variant_m(i)
minor_AF_m(i)
low_confidence_variant_m(i)
n_complete_samples_m(i)
AC_m(i)
ytx_m(i)
pval_m(i)
%-------------------------------------------------------------------------------
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.both_sexes_',gene,'.xlsx')
sheet = 'Sheet1'
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_b = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_b = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_b = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_b = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_b = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_b = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_b = double(num);
% testing by writing some of the data
i = 2
variant_b(i)
minor_AF_b(i)
low_confidence_variant_b(i)
n_complete_samples_b(i)
AC_b(i)
ytx_b(i)
pval_b(i)
%-------------------------------------------------------------------------------
% checking for wrong alignments between these 4 files
%-------------------------------------------------------------------------------
error = 0.;
for i=1:n-1
a = variant(i);
b = variant_f(i);
c = variant_m(i);
d = variant_b(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,c);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,d);
if ( (test==0) );
error = error +1;
endif
endfor
error
%-------------------------------------------------------------------------------
% defining major and minor alleles among all the variants
%-------------------------------------------------------------------------------
for i=1:n-1
frequency = AF(i); %frequency of the alternative allele
if ( frequency <= 0.5 )
minor_allele2(i)=alt(i);
major_allele(i)=ref(i);
else
minor_allele2(i)=ref(i);
major_allele(i)=alt(i);
endif
endfor
%-------------------------------------------------------------------------------
% defining alternate allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
alt_AF_f(i) = ytx_f(i)/(2n_cases_f); alt_AF_m(i) = ytx_m(i)/(2n_cases_m);
alt_AF_b(i) = ytx_b(i)/(2n_cases_b); endfor
%-------------------------------------------------------------------------------
% defining alternate allele frequencies in control groups %-------------------------------------------------------------------------------
for i=1:n-1 alt_AF_f_controls(i) = ( AC_f(i)- ytx_f(i) )/(2n_controls_f);
alt_AF_m_controls(i) = ( AC_m(i)- ytx_m(i) )/(2n_controls_m); alt_AF_b_controls(i) = ( AC_b(i)- ytx_b(i) )/(2n_controls_b);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f(i) = 1-alt_AF_f(i);
ref_AF_m(i) = 1-alt_AF_m(i);
ref_AF_b(i) = 1-alt_AF_b(i);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in control groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f_controls(i) = 1-alt_AF_f_controls(i);
ref_AF_m_controls(i) = 1-alt_AF_m_controls(i);
ref_AF_b_controls(i) = 1-alt_AF_b_controls(i);
endfor
%-------------------------------------------------------------------------------
% checking for errors by comparing minor_allele with minor_allele2
%-------------------------------------------------------------------------------
error2 = 0.;
for i=1:n-1
a = minor_allele(i);
b = minor_allele2(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error2 = error2 +1;
endif
endfor
error2
%-------------------------------------------------------------------------------
% writing of GENE_total_females.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_f{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_f(i);
columns{i+1,15} = alt_AF_f(i);
columns{i+1,16} = alt_AF_f_controls(i);
columns{i+1,17} = ref_AF_f(i);
columns{i+1,18} = ref_AF_f_controls(i);
columns{i+1,19} = n_cases_f;
columns{i+1,20} = n_controls_f;
endfor
file_name = strcat(gene,'_total_females.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of GENE_total_males.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_m{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_m(i);
columns{i+1,15} = alt_AF_m(i);
columns{i+1,16} = alt_AF_m_controls(i);
columns{i+1,17} = ref_AF_m(i);
columns{i+1,18} = ref_AF_m_controls(i);
columns{i+1,19} = n_cases_m;
columns{i+1,20} = n_controls_m;
endfor
file_name = strcat(gene,'_total_males.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of CLYBL_total_both_sexes.csv
%-------------------------------------------------------------------------------
columns = cell(1523,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_b{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_b(i);
columns{i+1,15} = alt_AF_b(i);
columns{i+1,16} = alt_AF_b_controls(i);
columns{i+1,17} = ref_AF_b(i);
columns{i+1,18} = ref_AF_b_controls(i);
columns{i+1,19} = n_cases_b;
columns{i+1,20} = n_controls_b;
endfor
file_name = strcat(gene,'_total_both_sexes.csv')
cell2csv (file_name, columns," ")

# Testing the prophecy of Babylon

Some days ago it came to my attention the fact that there is a religious group that considers the presence of propositions within the Old Testament that seem to describe events that have happened after the propositions were written as proof of the divine origin of such a text. Not only that: to my understanding, they seem to infer from those successful prophecies that also the other predictions present in the Bible must turn out to happen somewhere in the future.

Now, in these statements, there are several problems. One can note, for instance, that successful scientific theories like the Newtonian gravitation continuously predict with great accuracy the future, and yet this doesn’t seem to be generally considered proof that Isaac Newton had a divine nature. On the other hand, the fact that his gravitational law predicts the orbits of the Earth, of Mars, of the modules of the Apollo missions, etc., doesn’t prevent the same theory to fail when it comes to the orbit of Mercury. So, this single counter-example might be considered enough to prove that the first paragraph of this article contains false statements.

But then there is another problem. How do I evaluate whether a prophecy is true or false? Despite the obvious difficulties due to the fact that this kind of predictions are expressed in a human language (translated from one tongue into another) and not in mathematical language and as such ambiguous by its very nature, one can still try – I reasoned – to apply the standard method used in science. In other words, one can calculate the probability of the null hypothesis (that, in this case, is: the prophecy is true by chance) and see what kind of number he gets.

Well, I discovered right away that this calculation had in fact been attempted by Peter Stoner, a math teacher at Pasadena City College (R), and it can be found in his book, Science Speaks (freely available here). Let’s consider one of those prophecies, the one that seems to be particularly close to the heart of Marina, the Jehovah’s Witness who is currently persecuting me, the prophecy of Babylon:

And Babylon … shall never be inhabited, neither shall it be dwelt in from generation to generation: neither shall the Arabian pitch tent there; neither shall the shepherds make their fold there. But wild beasts of the desert shall lie there; and their houses shall be full of doleful creatures.

(Isa. 13:19-21 – written 712 B.C.)

And they shall not take of thee a stone for a corner, nor a stone for foundations; but thou shalt be desolate forever, saith the Lord … Neither doth any son of man pass thereby.

(Jer. 51:26,43 – written 600 B.C.)

Now, in his book, Peter Stoner translates this prophecy in the following seven events or propositions:

$E_1$: Babylon shall be destroyed. $P(E_1)=1/10$
$E_2$: It shall never be reinhabited (once destroyed). $P(E_2)=1/100$
$E_3$: The Arabs shall not pitch their tents there (once destroyed). $P(E_3)=1/200$
$E_4$: There shall be no sheepfolds there (once destroyed). $P(E_4)=1/5$
$E_5$: Wild beasts shall occupy the ruins (once destroyed). $P(E_5)=1/5$
$E_6$:  The stones shall not be taken away for other buildings (once destroyed). $P(E_6)=1/100$
$E_7$: Men shall not pass by the ruins (once destroyed). $P(E_7)=1/10$

I have added the expression within brackets, to enhance clarity. The probabilities of the single events (on the right) are those proposed by Peter Stoner, and I am not going to discuss them. Now, if we indicate the whole prophecy with E, according to Stoner the probability of the null hypothesis is

$P(E) = P(\bigcap_{i=1}^{7}E_i) = \prod_{i=1}^{7} P(E_i) = \frac{10^ { -9 } }{5}$

This is a really small probability for the null hypothesis, so we must accept that a power beyond human comprehension has… But let’s read again the prophecy: it seems articulated but, in fact, it only says that the place where Babylon once raised will be basically abandoned by human beings. In other words, it seems reasonable to say that $E_i\subset E_7$ for $i = 2, 3, 4, 6$. If that is the case, then we have

$P(E) = P(\bigcap_{i=1}^{7}E_i) = P( E_1 \bigcap( \bigcap_{i=2}^{7}E_i) ) =$

$= P( E_1 \bigcap( \bigcap_{i=2}^{6}(E_i \bigcap E_7) ) \bigcap (E_5 \bigcap E_7) ) =$

$= P( E_1 \bigcap E_7 \bigcap ( E_5 \bigcap E_7 ) ) =$

$= P( E_1 \bigcap ( E_5 \bigcap E_7 ) ) = P( E_1)P ( E_5 \bigcap E_7 )$

A further observation is that if the place is desolated with no human beings ($E_7$) then it is reasonable to assume that it becomes the reign of wild animals. In other words: $P(E_5|E_7) > P(E_5)$. Not only that, I would guess that it is safe to assume that $P(E_5|E_7)$ is about one. Then we have found

$P(E) = P( E_1) P(E_5|E_7) P(E_7) = P( E_1) P(E_7) = \frac{1}{10^2}$

In other words, the mistake that leads Stoner to a completely wrong value for $P(E)$ is the fact that he considered the events as independent one from the other, while this is obviously not the case.

Now, is this the probability of the null hypothesis? Well, it depends, because this is a case in which we have a prediction that has been got from a very long book with thousands of propositions, some of which look very much like predictions. Now, of course, when one picks a prophecy among all these propositions, he might be unconsciously tempted to pick the one that looks more like a fulfilled prophecy. In other words, we have to check for multiple comparisons in this case. So, let us consider that we have a number of N propositions similar to the one about Babylon. The probability $p(N)$ that at least one of these propositions is true purely by chance is

$p(N) = 1 - \frac{99^N}{100^N}$

The function $p(N)$ is plotted by the following code and as you can see, for N >4 we have that the null hypothesis becomes very likely. In other words, if we pick this prophecy among 5 other similar sentences, its resemblance to reality is just due to chance. In the figure below the red line indicates a probability of 0.05. A probability equal to or higher than 0.05 is associated with null hypotheses that must be considered true.

It should be added that the fact that the prophecy of Babylon has to be considered true is highly questionable. The reader can do his own research on that, for example starting from the notes collected in this text.

% file name = Babylon
% it calculates the probability that N prophecies are true by chance
clear all
% array for p(N)
N = 30;
for i=1:N
p(i) = 1-(99/100)^i;
p2(i) = 0.05;
endfor
% plotting
plot ([1:N],p(:),’-k’, “linewidth”, 2)
hold on
plot ([1:N],p2(:),’-r’, “linewidth”, 2)
xlabel(‘number of prophecies’);
ylabel(‘probability of the null hypothesis’);
grid minor
grid on

A more detailed discussion of these same topics is now available here.

Introduction

Let’s consider the relative humidity we have right now in Venice, Italy: 97% with a temperature of 8°C and a pressure of 755 mmHg. Pretty humid, right? What about a warm place, let’s say Buenos Aires, in Argentina? Right now they have a relative humidity of 55%, at a temperature of 23° and a pressure of 760 mmHg. Now, which place is more humid, between these two? In other words: in which of these two places does the same volume of air contain the biggest amount of water? Are you sure you know the answer?

Some definitions and a formula

Rlative humidity is defined as follows

$RH\;=\;100\cdot\frac{e'}{e'_w}$

where ${e'}$ is the partial pressure of water vapour in air and $e'_w$ is the saturation vapour pressure with respect to water (the vapour pressure in air in equilibrium with the surface of water). We then have

$e'_w(p,t)\;=\;(1.0016\;hPa+3.15\cdot 10^{-6} p-\frac{0.074}{p}\;hPa^{2}) 6.112\cdot e^{\frac{17.62t}{243.12+t}}$

Here and in what follows we express pressure in hPa, with $hPa\;=\;10^2 Pa$, while $t$ is temperature in °C and $T$ is temperature in $K$, with $T\;=\;273.15+t$. Absolute humidity is simply given by

${\rho}_v\;=\;\frac{m_v}{V}$

where $m_v$ is the mass of vapour and $V$ is the total volume occupied by the mixture (Ref, chapter 4).

Physics of gasses

From the law of perfect gasses and considering that, according to Dalton’s law, in a moisture the partial pressure of each gas is the pressure it would have if it occupied the whole volume (Ref), we can also write

$e'\;=\;m_v\cdot R_v\frac{273,15 + t}{V}$

with $R_v\;=\;4.614\cdot\frac{hPa\cdot m^3}{kg\cdot K}$. Then, by substituting $e'$ and $m_v$ in ${\rho}_v$, we have

${\rho}_v\;=\;\frac{RH}{100\cdot R_v\cdot (273.15+t)}(1.0016\;hPa+3.15\cdot 10^{-6} p-\frac{0.074}{p} \;hPa^{2}) 6.112\cdot e^{\frac{17.62t}{243.12+t}}$

At the end of this post you find a simple code in Octave that calculates this formula and the plot.

If we apply now the equation for ${\rho}_v$ to the environmental parameters relative to Venice, we find that one $m^3$ of air contains 8 grams of water; if we repeat the same calculation for Buenos Aires we find that the same volume of air contains 11 grams of water. In other words, today Venice is less humid than Buenos Aires in the same period if we refer to the content of water of the air.

In the diagram below you can see the plot of absolute humidity in function of the temperature, for three values of relative humidity. So, for instance, 20 $\frac{g}{m^3}$ of water corresponds to a RH of 90% at a temperature of 24°C and to a RH of only 50% at a temperature of 35°C. The reason for this, beyond the mathematical formulae, is obvious: when the temperature of the air decreases, the ability of the moisture to retain water as vapour decreases accordingly.

Conclusion

Relative humidity is relevant for the subjective perception of heat (the more the relative humidity, the more difficult it is sweating for human beings, then the higher it is the perception of heat since we use sweating for cooling down). But if we are interested in the interaction of air and our lungs, for instance, relative humidity might be completely misleading and absolute humidity is likely the parameter to be considered (R).

The script

% file name = absolute_humidity
% it calculates the absolute humidity given the relative humidity and the temperature
clear all
% constant for water vapour (J/kg*K)
Rv = 461.4;
% relative humidity (%), temperature (°C), pressure (mmHg)
RH = 55
t = 23
pHg = 760
% conversion to Pa (1 mmHg = 133.322365 Pa)
p = pHg*133.322
% conversion to hPa=100Pa
p = p/100;
Rv = 4.614;
% absolute humidity (kg/m^3)
AH = (RH/(100*Rv*(273.15+t)))*(1.0016+3.15*10^(-6)*p-0.074/p)*6.112*e^(17.62*t/(243.12+t))
% we now fix the RH and plot AH for variuos values of t
RH = 50
pHg = 760
p = pHg*133.322;
p = p/100;
for i=1:30
t2(i)=i+5;
AH2(i) = 1000*(RH/(100*Rv*(273.15+t2(i))))*(1.0016+3.15*10^(-6)*p-0.074/p)*6.112*e^(17.62*t2(i)/(243.12+t2(i)));
endfor
plot (t2, AH2,’-k’,’LineWidth’,3)
grid on
grid minor
hold on
RH = 70
for i=1:30
t2(i)=i+5;
AH2(i) = 1000*(RH/(100*Rv*(273.15+t2(i))))*(1.0016+3.15*10^(-6)*p-0.074/p)*6.112*e^(17.62*t2(i)/(243.12+t2(i)));
endfor
plot (t2, AH2,’-r’,’LineWidth’,3)
hold on
RH = 90
for i=1:30
t2(i)=i+5;
AH2(i) = 1000*(RH/(100*Rv*(273.15+t2(i))))*(1.0016+3.15*10^(-6)*p-0.074/p)*6.112*e^(17.62*t2(i)/(243.12+t2(i)));
endfor
plot (t2, AH2,’-b’,’LineWidth’,3)
xlabel (‘temperature (°C)’)
ylabel (‘absolute humidity (g/{m^3})’)
legend (‘AH for RH = 50%’,’AH for RH = 70%’,’AH for RH = 90%’, ‘location’, ‘NORTHWEST’ )

The equations of this blog post were written using $\LaTeX$.

Featured

Introduction

Some days ago, David Systrom offered an overview of his work on cardiopulmonary testing in ME/CFS during a virtual meeting hosted by the Massachusetts ME/CFS & FM Association and the Open Medicine Foundation. In this blog post, I present an introduction to the experimental setting used for Systrom’s work (paragraph 1), a brief presentation of his previous findings (paragraph 2), and an explanation of his more recent discoveries in his cohort of patients (paragraph 3). In paragraph 4 you’ll find a note on how to support his research.

1. Invasive Cardiopulmonary Exercise Testing

It is a test that allows for the determination of pulmonary, cardiac, and metabolic parameters in response to physical exertion of increasing workload. It is, mutatis mutandis, the human equivalent of an engine test stand. A stationary bike with a mechanical resistance that increases by 10 to 50 Watts for minute is usually employed for assessing the patient in a upright position, but a recumbent bike can also be used in some instances. Distinguishing between these two different settings might be of pivotal relevance in ME/CFS and POTS. I shall now briefly describe some of the measurements that can be collected during invasive cardiopulmonary exercise testing (iCPET) and their biological meaning. For a more accurate and in-depth account, please refer to (Maron BA et al. 2013), (Oldham WM et al. 2016). I have used these papers as the main reference for this paragraph, unless otherwise specified.

Gas exchange. A face mask collects the gasses exchanged by the patient during the experiment and allows for monitoring of both oxygen uptake per unit of time (named $VO_2$) and carbon dioxide output ($VCO_2$), measured in mL/min. Gas exchange is particularly useful for the determination of the anaerobic threshold (AT), i.e. the point in time at which the diagram of $VCO_2$ in function of $VO_2$ displays an abrupt increase in its derivative: at this workload, the patient starts relying more on her anaerobic energy metabolism (glycolysis, for the most part) with a build-up of lactic acid in tissues and blood (see Figure 1).

Oxygen uptake for unit of time at AT (called $VO_2$max) can be considered an integrated function of patient’s muscular, pulmonary, and cardiac efficiency during exercise. It is abnormal when its value is below 80% of what predicted according to patient’s age, sex, and height. Importantly, according to some studies there might be no difference in $VO_2$max between ME/CFS patients and healthy controls, unless the exercise test is repeated a day after the first measure: in this case the value max$VO_2$ for patients is significantly lower than for controls (VanNess JM et al. 2007), (Snell CR and al. 2013).

Another measure derived from the assessing of gas exchange is minute ventilation (VE, measured in L/min) which represents the total volume of gas expired per minute. The link between VE and $VO_2$ is as follows:

$VO_2\;=\;VE\cdot(inspired\;VO_2\; -\; expired\;VO_2)$

Maximum voluntary ventilation (MVV) is the maximum volume of air that is voluntarily expired at rest. During incremental exercise, a healthy person should be able to maintain her VE at a value ∼0.7 MVV and it is assumed that if the ratio VE/MVV is above 0.7, then the patient has a pulmonary mechanical limit during exercise. If VE is normal, then an early AT suggests an inefficient transport of oxygen from the atmosphere to muscles, not due to pulmonary mechanics, thus linked to either pulmonary vascular abnormalities or muscular/mitochondrial abnormalities. It is suggested that an abnormally high derivative of the diagram of VE in function of $VCO_2$ and/or a high ratio VE/$VCO_2$ at AT (these are measures of how efficiently the system gets rid of $CO_2$) are an indicator of poor pulmonary vascular function.

Respiratory exchange ratio (RER) is a measure of the effort that the patient puts into the exercise. It is measured as follows:

$RER=\frac{VCO_2}{VO_2}$

and an RER>1.05 indicates a sufficient level of effort. In this case the test can be considered valid.

Arterial catheters. A sensor is placed just outside the right ventricle (pulmonary artery, Figure 2) and another one is placed in the radial artery: they allow for measures of intracardiac hemodynamics and arterial blood gas data, respectively. By using this setting, it is possible to indirectly estimate cardiac output (Qt) by using Fick equation:

$Qt=\frac{VO_2}{arterial\;O_2 - venous\;O_2}$

where the $arterial\;O_2$ is measured by the radial artery catheter and the venous one is measured by the one in the pulmonary artery (ml/L). An estimation for an individual’s predicted maximum Qt (L/min) can be obtained by dividing her predicted $VO_2$max by the normal maximum value of  $arterial\;O_2 - venous\;O_2$ during exercise, which is 149 mL/L:

$predicted\; Qt\;max=\frac{predicted\; VO_{2}max}{149 \frac{mL}{L}}$

If during iCPET the measured Qt max is below 80% of the predicted maximum cardiac output (as measured above), associated with reduced $VO_2$max, then a cardiac abnormality might be suspected. Stroke volume (SV), defined as the volume of blood ejected by the left ventricle per beat, can be obtained from the Qt according to the following equation:

$Qt=SV\cdot HR\;\xrightarrow\;SV\;=\;\frac{Qt}{HR}\;=\;\frac{\frac{VO_2}{arterial\; O_2 - venous\; O_2}}{HR}$

where HR stands for heart rate. One obvious measure from the pulmonary catheter is the mean pulmonary artery pressure (mPAP). The right atrial pressure (RAP) is the blood pressure at the level of the right atrium. Pulmonary capillary wedge pressure (PCWP) is an estimation for the left atrial pressure. It is obtained by the pulmonary catheter. The mean arterial pressure (MAP) is the pressure measured by the radial artery catheter and it is a proxy for the pressure in the left atrium. RAP, mPAP, and PCWP are measured by the pulmonary catheter (the line in red) which from the right atrium goes through the tricuspid valve, enters the right ventricle, and then goes along the initial part of the pulmonary artery (figure 2).

Derived parameters. As seen, Qt (cardiac output) is derived from actual direct measures collected by this experimental setting, by using a simple mathematical model (Fick equation). Another derived parameter is pulmonary vascular resistance (PVR) which is obtained using the particular solution of the Navier-Stokes equations (the dynamic equation for Newtonian fluids) that fits the geometry of a pipe with a circular section. This solution is called the Poiseuille flow, and it states that the difference in pressure between the extremities of a pipe with a circular cross-section A and a length L is given by

$\Delta\;P\;=\;\frac{8\pi\mu L}{A^2}Q$

where $\mu$ is a mechanical property of the fluid (called dynamic viscosity) and Q is the blood flow (Maccallini P. 2007). As the reader can recognize, this formula has a close resemblance with Ohm’s law, with P analogous to the electric potential, Q analogous to the current, and $\frac{8\pi\mu L}{A^2}$ analogous to the resistance. In the case of PVR, Q is given by Qt while $\Delta\;P\;=\;mPAP\;-\;PCWP$. Then we have:

$PVR\;=\;80\frac{\;mPAP\;-\;PCWP}{Qt}$

where the numeric coefficient is due to the fact that PVR is usually measured in $\frac{dyne\cdot s}{cm^5}$ and 1 dyne is $10^5$ Newton while 1 mmHg is 1333 N/m².

A subset of patients with exercise intolerance presents with preload-dependent limitations to cardiac output. This phenotype is called preload failure  (PLF) and is defined as follows: RAP max < 8 mmHg, Qt and $VO_2$max <80% predicted, with normal mPAP (<25 mmHg) and normal PVR (<120 $\frac{dyne\cdot s}{cm^5}$) (Maron BA et al. 2013). This condition seems prevalent in ME/CFS and POTS. Some of these patients have a positive cutaneous biopsy for small-fiber polyneuropathy (SFPN), even though there seems to be no correlation between hemodynamic parameters and the severity of SFPN. Intolerance to exercise in PLF seems to improve after pyridostigmine administration, mainly through potentiation of oxygen extraction in the periphery. A possible explanation for PLF in non-SFPN patients might be a more proximal lesion in the autonomic nervous system (Urbina MF et al. 2018), (Joseph P. et al. 2019). In particular, 72% of PLF patients fits the IOM criteria for ME/CFS and 27% meets the criteria for POTS. Among ME/CFS patients, 44% has a positive skin biopsy for SFPN. One possible cause for damage to the nervous system (both in the periphery and centrally) might be TNF-related apoptosis-inducing ligand (TRAIL) which has been linked to fatigue after radiation therapy; TRAIL increases during iCPET among ME/CFS patients (see video below).

3. Latest updates from David Systrom

During the David Systrom reported on the results of iCPET in a set of ME/CFS patients. The $VO_2$max is lower in patients vs controls (figure 3, up). As mentioned before, $VO_2$max is an index that includes contributions from cardiac performances, pulmonary efficiency, and oxygen extraction rate in the periphery. In other words, a low $VO_2$max gives us no explanation on why it is low. This finding seems to be due to different reasons in different patients even though the common denominator among all ME/CFS patients of this cohort is a low pressure in the right atrium during upright exercise (low RAP, figure 3, left). But then, if we look at the slope of Qt in function of $VO_2$ (figure 3, right) we find three different phenotypes. Those with a high slope are defined “high flow” (in red in figure 3). Then we have a group with a normal flow (green) and a group with a low flow (blue). If we look then at the ability to extract oxygen by muscles (figure 3, below) expressed by the ratio

$\frac{arterial\;O_2 - venous\;O_2}{HB}$

we can see that the high flow patients reach the lowest score. In summary, all ME/CFS patients of this cohort present with poor $VO_2$max and preload failure. A subgroup, the high flow phenotype, has poor oxygen extraction capacity at the level of skeletal muscles.

Now the problem is: what is the reason for the preload failure? And in the high flow phenotype, why the muscles can’t properly extract oxygen from blood? As mentioned, about 44% of ME/CFS patients in this cohort has SFPN but there is no correlation between the density of small-fibers in the skin biopsies and the hemodynamic parameters. Eleven patients with poor oxygen extraction (high flow) had their muscle biopsy tested for mitochondrial function (figure 4) and all but one presented a reduction in the activity of citrate synthase (fourth column): this is the enzyme that catalyzes the last/first step of Krebs cycle and it is considered a global biomarker for mitochondrial function. Some patients also have defects in one or more steps of the electron transport chain (fifth column) associated with genetic alterations (sixth column). Another problem in high flow patients might be a dysfunctional vasculature at the interface between the vascular system and skeletal muscles (but this might be true for the brain too), rather than poor mitochondrial function.

The use of an acetylcholinesterase inhibitor (pyridostigmine) improved the ability to extract oxygen in the high flow group, without improving cardiac output, as measured with a CPET, after one year of continuous use of the drug. This might be due to better regulation of blood flow in the periphery. This paragraph is an overview of the following video:

4. Funding

The trial on the use of pyridostigmine in ME/CFS at the Brigham & Women’s Hospital by Dr. David Systrom is funded by the Open Medicine Foundation (R). This work is extremely important, as you have seen, both for developing diagnostic tools and for finding treatments for specific subgroups of patients. Please, consider a donation to the Open Medicine Foundation to speed up this research. See how to donate.

The equations of this blog post were written using $\LaTeX$ (see this article).

# My first time with LaTeX

Each mathematical formula I have used in my blog till now was an image: I wrote them in a Word document using the equation editor by Microsoft (that I like very much), then I captured the screen and saved it as an image, and then I uploaded them in my blog.

In the past, I tried to find a way to write them as text, without succeeding. But in fact, it is possible, by using the LaTeX syntax and a few HTML commands.

These are the first “true” mathematical formulae present in this blog. I will use this page to test the LaTeX syntax for mathematical expressions (a handbook is available here). I the table that follows, I present a series of relevant examples of LaTeX syntax. A useful resource for testing out LaTeX instructions is this website.

Visualizza articolo

# Testing hypotheses

Introduction

My ME/CFS improves during summer, in the period of the year that goes from May/June to the end of September. I don’t know why. I have several hypotheses. One possible reason for the improvement in summer is an interaction between the light from the Sun and some parts of my physiology, the immune system for instance. We know that ME/CFS tends to have an oscillating course in most of the patients (Chu L. et al. 2019), but the presence of a seasonal pattern in this patient population has not been investigated so far, to my knowledge. And yet, if you ask directly to patients, many of them say that they feel better in summer. Unfortunately, we don’t have scientific data on that, this is an area worth investigating with some carefully done survey.

Seasonal variation of the immune system

The immune system has a high degree of variation for several reasons (Brodin P et Davis MM 2017). In particular, there are studies about the seasonal fluctuations in the expression of some crucial genes of the immune response (Dopico XC et al. 2014).

How does this regulation happen? Different mechanisms are possible, some of them might be related to changes in the light we receive from the Sun as the Earth rotates around it. We know that the length of the day has an effect on innate immunity: the more the hours of light, the lower the power of the innate immune system (Pierre K. et al. 2016). We also know that ultraviolet radiation, particularly UVB, is an agonist for the aryl hydrocarbon receptor (AhR) (Navid F. et al. 2013). This receptor seems to reduce the expression of the major histocompatibility complex II (MHC II) in dendritic cells (DCs), thus reducing their antigen-presenting activity (Rothhammer V. et Quintana F.J. 2019). UVB might be able to reach dendritic cells when they are circulating near the skin, during summer, thus inhibiting their antigen-presenting activity. Infrared radiation, on the other hand, seems to have an effect on energy metabolism: in Fall we lose a significant amount of infrared radiation in a wavelength range (0.7-1.0 nm) that is known to have an effect on mitochondrial activity (Nguyen L.M. et al. 2013) and it might perhaps have an indirect effect on immunity too.

As further proof of seasonal fluctuation in immunity, some immunological diseases have this kind of seasonality: Rheumatoid arthritis (Nagamine R. et al. 2014) and Rheumatic fever (Coelho Mota C.C. et al. 2010) are two examples. Moreover, the prevalence of Multiple Sclerosis is directly proportional to the latitude (Simpson S. et al. 2011). We also know that there is seasonal fluctuation in serum autoantibodies (Luong T.H. et al. 2001).

Of course, sunlight might be just one of the variables into play. The other aspect I am considering is the seasonal distribution of some common pathogens. Streptococcus, Enteroviruses and Fungi of the genus Penicillium are known to have a seasonal distribution with a peak in Fall and/or Winter (Ana S.G. et al. 2006), (Salort-Pons M et al. 2018), (Coelho Mota C.C. et al. 2010). Common influenza has this pattern too. Rheumatic fever, a disease due to an abnormal immune response to Streptococcus, has its flares in Fall because Streptococcus is more common in that period of the year (Coelho Mota C.C. et al. 2010). Even the composition of the gut microbiota has a seasonal pattern (Koliada A. et al. 2020). I am currently investigating my immunosignature, measured with an array of 150.000 random peptides, to see if I can find some relevant pathogen in my case. You can find this study here.

(A few months after I wrote these notes a pivotal study has been published on these same topics, avalilable here).

An experiment

I moved from Rome (Italy) to Rosario (Argentina) at the beginning of January. I was very sick and I steadily improved after about 40 days. I became a less severe ME/CFS patients and I could work several hours a day and care for myself, granted that I did not exceed with aerobic exercise. At the end of March, I started deteriorating as it usually happens at the end of September, when I am in Rome. In order to study this phenomenon, I have built a complete model of solar radiation at sea level, which considers the inclination of sunrays in function of the latitude and of the day of the year. It takes into account the effect of the atmosphere (both diffusion and absorption) and the eccentricity of the orbit (Maccallini P. 2019). If you look at the figure below (a byproduct of my mathematical model) you can see that when I started deteriorating in Rosario, the power of sunrays at noon in that city was still as high as it is in Rome during the summer solstice (this is due to the fact that the Earth is closer to the Sun in this period and to the fact that Rosario is closer to the Equator than Rome is).

So I have to discard the original idea that the power within the infrared range, or the ultraviolet radiation, or the visible one is responsible for my improvement in summer. If I still have to consider that sunlight has something to do with my improvement, I must conclude that it is the length of the day the relevant parameter: I may need more than 12 hours of light to feel better. Why? Because the longer the day, the lower the strength of the innate immunity. This is now my working hypothesis and I will start from the following mathematical model to pursue this research: (Pierre K. et al. 2016).

I will also use full-spectrum lamps early in the morning and in the evening to reproduce a 15 hours day, so to dampen down my innate immune system in a safe, drug-free way. I have to reproduce a day of 15 hours and see what happens. In the figure below the hours of the day at dawn and at dusk and the length of the day for Rome, for each day of the year (this is also a plot from my model).

What follows is the script I have coded in order to plot the first figure of this post. More details on this model of solar radiation are here: (Maccallini P. 2019). As a further note, I would like to acknowledge that I started pursuing this avenue in the summer of 2009: I was building the mathematical model of solar radiation (see figure below, made in 2009) but as the summer finished, I turned into a statue and I had to stop working on it. When I improved, about a year later I started working on the systematic analysis of the mechanical equilibrium of planar structures (it is a chapter of this book). I am proud of that analysis, but it has not been very useful for my health…

% file name = sun emissive power sea level Rosario vs Roma
% sun emissive power per unit area, per unit wavelength at sea level
clear all
% three parameters of the orbit
A = 6.69*( 10^(-9) ); % 1/km
B = 1.12*( 10^(-10) ); % 1/km
delta = pi*313/730;
% the two parameters of Plunk's law
C_1 = 3.7415*( 10^(-16) ); % W*m^2
C_2 = 1.4388*( 10^(-2) ); % mK
% Stefan-Boltzmann parameter ( W/( (m^2)*(K^4) ) )
SB = 5.670*( 10^(-8) );
% radius of the photosphere (m)
R_S = 696*(10^6); % m
% temperature of the photosphere (K)
T_S = 5875;
% conversion of units of measurments
N = 20; % dots for the equator
R = 3.8; % radius of the orbit
ro_E = 1.3; % radius of the earth
lambda_Rosario = -32*pi/180; % latitude of Rosario (radiants)
lambda_Roma = 41*pi/180; % latitude of Rome (radiants)
delta = 23.45*pi/180; % tilt angle
% the array of theta
theta(1) = 0; % winter solstice (21/22 December)
i_ws = 1;
day = 2*pi/365;
days = [1:1:366];
for i = 2:366
theta(i) = theta(i-1) + day;
if ( abs( theta(i) - (pi/2) ) <= day )
i_se = i; % spring equinox (20 March)
endif
if ( abs( theta(i) - pi ) <= day )
i_ss = i; % summer solstice (20/21 June)
endif
if ( abs( theta(i) - (3*pi/2) ) <= day )
i_ae = i; % autumn equinox (22/23 September)
endif
endfor
% the array of the radius (m)
for i=1:1:366
o_omega (i) = (10^3)/[ A + ( B*sin(theta(i) + delta ) ) ]; % m
endfor
% the array of the wavelength in micron
N = 471;
L(1) = 0.3;
L(N) = 5.0;
delta_L = ( L(N) - L(1) )/(N-1);
for j = 2:N-1
L (j) = L(j-1) + delta_L;
endfor
% the array of beta*L
% the array of L in metres
L_m = L*( 10^(-6) );
% angle psi
psi(1) = 0;
minute = pi/(12*60);
for i = 2:(24*60)+1
psi(i) = psi(i-1) + minute;
endfor
% -----------------------------------------------------------------------------
% Rosario
lambda = lambda_Rosario
% angle between n and r at noon in Rosario
for i= [i_ws, i_ae, i_ss, i_se]
for j=1:(24*60) + 1
% scalar product between n and r
scalar_p(j) = [cos(lambda)*sin(psi(j))*cos(delta) + sin(lambda)*sin(delta)]*( -cos(theta(i)) )+ [(-1)*cos(lambda)*cos(psi(j))]*( -sin(theta(i)) );
endfor
% value of psi at noon
for j=1:(24*60) + 1
if ( ( scalar_p(j) ) == ( max( scalar_p ) ) )
j_noon = j;
psi_noon (i) = psi(j);
endif
endfor
% angle between n and r at noon
cos_gamma (i) = scalar_p(j_noon);
endfor
% the array of the emissive power (W/(m^2)*micron) in Rosario
for i = i_se:i_se
for j=1:N
num = C_1*( (R_S)^2 );
den = ( (L_m(j)^5)*( (e^(C_2/( L_m(j)*T_S ))) - 1)*( (o_omega(i))^2 ) )*10^6;
power(j,i) = ( num/den )*( e^(-S(j)/cos_gamma (i)) );
endfor
% plotting
plot (L (1:N), power(1:N,i), '-r', "linewidth", 2)
xlabel('wavelenght ({\mu})');
ylabel('W/m^{2}{\mu}');
axis ([0.3,5,0,1500])
grid on
endfor
hold on
% -----------------------------------------------------------------------------
% Rome
lambda = lambda_Roma
% angle between n and r at noon in Rosario
for i= [i_ws, i_ae, i_ss, i_se]
for j=1:(24*60) + 1
% scalar product between n and r
scalar_p(j) = [cos(lambda)*sin(psi(j))*cos(delta) + sin(lambda)*sin(delta)]*( -cos(theta(i)) )+ [(-1)*cos(lambda)*cos(psi(j))]*( -sin(theta(i)) );
endfor
% value of psi at noon
for j=1:(24*60) + 1
if ( ( scalar_p(j) ) == ( max( scalar_p ) ) )
j_noon = j;
psi_noon (i) = psi(j);
endif
endfor
% angle between n and r at noon
cos_gamma (i) = scalar_p(j_noon);
endfor
% the array of the emissive power (W/(m^2)*micron) in Rosario
for i = [i_ae, i_ss]
for j=1:N
num = C_1*( (R_S)^2 );
den = ( (L_m(j)^5)*( (e^(C_2/( L_m(j)*T_S ))) - 1)*( (o_omega(i))^2 ) )*10^6;
power(j,i) = ( num/den )*( e^(-S(j)/cos_gamma (i)) );
endfor
endfor
hold on
plot (L (1:N), power(1:N,i_ae), '-k', "linewidth", 2)
plot (L (1:N), power(1:N,i_ss), '--k', "linewidth", 2)
legend ('spring equinox in Rosario', 'autumn equinox in Rome', 'summer solstice in Rome', "location",'NORTHEAST')
hold on
plot ([0.4,0.4], [0,1500], '--k', "linewidth", 1)
plot ([0.7,0.7], [0,1500], '--k', "linewidth", 1)

# A Mathematical model for the diffusion of Coronavirus 19 among the Italian population

Abstract

In this document, I propose a distribution for the number of infected subjects in Italy during the outbreak of Coronavirus 19. To do that I find a logistic curve for the number of deaths due to this virus, in Italy. I also use a density of probability for the fatality rate and one for the number of days from infection to death. These functions have been built using recently published statistical data on the Chinese outbreak.

Go to the article

# Brain Normalization with SPM12

Introduction

Brain normalization is the rigid rotation and/or deformation of a 3D brain scan so that it can match a template brain. This is a necessary step for the analysis of brain magnetic resonance data (whether it is morphological or functional) as well as of brain PET data. It allows introducing a set of spatial coordinates such that each triplet (x,y,z) identifies the same anatomical region both in the brain we are studying and in the template (Mandal P.K. et al. 2012). So, for instance, once the normalization has been performed on an fMRI of the brain of a patient and on a set of fMRIs from a suited control group, we can compare the BOLD activation of each anatomical region of the patient’s brain with the activation of the corresponding anatomical region of the control group.

Mathematical notes

This part can be skipped, it is not necessary to read these passages for the understanding of the following paragraphs. If we assume that P is a point of the brain before normalization and we call S(P) its position after the normalization, we can consider the vectorial function:

which gives the new position of each point of the brain, after normalization. If P_0 is a point of the brain before the normalization, then we can write:

and remembering the expression of the differential of a vectorial function, we have

With a few passages we can write:

From the above formula, we realize that in order to define the configuration of the brain after normalization, we have to define, for each point P, a set of 12 parameters. Of these parameters, 6 describe the rigid movement and can be considered the same for each point. The other 6 (the coefficients of the matrix) are those that describe the change of shape and size and, in general, they are different for each point. The job of SPM is to calculate these parameters.

Brain atlases

There are several templates (also called brain atlases) that have been developed across the years. The very first one was published in 1957 (Talairach J. et al. 1957). The same researcher then participated in building one of the most used brain atlas ever, based on a single female brain, published in 1988 (Talairach J. et Tournoux P. 1988). Another widely used template is the so-called MNI-152. It was built adapting the 3D MRI brain scans of 152 healthy individuals to the Talairach and Tournoux template. The adaptation was achieved using both a rigid roto-translation and a deformation (Maintz J.B. et Viergever M.A. 1988). The second step is required for overcoming the problem of differences in brain shape and dimension that we encounter within the human population.

Limitations

Available brain atlases have some limitations. One of them being the fact that despite diseased brains are the most widely studied, they are also the most difficult to register to a template built from healthy individuals, because of usually marked differences in shape and/or size. This is true for instance for brains of patients with Alzheimer’s disease (Mandal P.K. et al. 2012). Another important limitation is that registration algorithms perform poorly for the brainstem (particularly for pons and medulla) (Napadow V. et al. 2006). This might have represented a problem for the study of diseases where a possible involvement of the brainstem is suspected, like for instance ME/CFS (VanElzakker M. et al. 2019).

SPM12

The SPM software package is one of the most widely used instruments for the analysis of functional brain imaging data (web page). It is freely available for download, but it requires that you have a MatLab copy in your computer. Those who don’t have a MatLab license can request and install a standalone version of SPM12 by following the instructions of this page.

Importing a DICOM file

Once you have installed SPM12 in your computer, the first step in order to register a brain is to convert the format the series of images are written in, to a format that SPM12 can read. MRI images are usually in .dcm format (DICOM) while SPM12 reads .nii files. In order to do that, click DICOM import (figure below, on the left, red ellipse), then click on DICOM files (on the right, red ellipse), then select your .dcm file and click DONE (below, red ellipse). If you then click DISPLAY (blue ellipse, left) you will see your MRI scan in another window (see next paragraph). A video tutorial on these operations is available here.

Setting the origin

To start the normalization process, it is highly recommended to set manually the origin of the coordinates. If this is done properly, the registration will not only take less time but, even more importantly, the chances of a successful normalization will increase. The origin is set at the level of the anterior commissure (figure below). To find this anatomical structure, you can follow this video. Once you have put the cross on the right place in the sagittal, coronal and axial windows, just click SET ORIGIN (red ellipse) and then save your work clicking REORIENT (blue ellipse).

Normalization estimate

In this step, SPM12 calculates the set of distortions that have to be applied to the source brain to adapt it to the template in MNI space. On the main menu select NORMALIZE (ESTIMATE) (figure, on the left, red). This will open the batch editor where you are asked to load the subject you want to apply normalization to (figure, right, red). You have also a set of estimation options (blue), that we leave as they are.  Then you click the RUN button, the arrow on the top of the batch editor.

At this point, your PC will perform a set of calculations that will require a few minutes. At the end of this process, a new .nii file will be saved in the spm12 folder. This is the set of distortions that will allow your subject’s brain to be registered to the template.

Normalization writing

Now click on NORMALIZE (WRITE) on the main menu. The batch editor will then ask you for the deformation field, which is the file generated in the previous step, and for the images to write, which is the scan of your subject (figure below). Select them, then press the RUN button on the batch editor. A new .nii file will be written in the spm12 folder. This is the normalized brain!

In the next figure, you have the normalized brain on the left and the initial scan of the same subject on the right. As you can see, there is an overall change of shape.

Anatomical areas

Now that we have normalized our brain in the MNI space, we can easily find anatomical regions within its sections. We can, for instance, load the normalized brain with MRIcron and overlay a template with Brodmann’s areas highlighted in various colours (figure below).

.

# Immunosignature analysis of a ME/CFS patient. Part 1: viruses

“Each hypothesis suggests its own criteria, its own means of proof, its own methods of developing the thruth; and if a group of hypotheses encompass the subject on all sides, the total outcome of means and of methods is full and rich.”

The purpose of the following analysis is to search for the viral epitopes that elicited – in a ME/CFS patient – IgGs against a set of 6 peptides, determined thanks to an array of 150.000 random peptides of 16 amino acids each. These peptides were used as query sequences in a BLAST search against viral proteins. No human virus was found. Three phages of bacterial human pathogens were identified, belonging to the classes Actinobacteria and γ-Proteobacteria. One of these bacteria, Serratia marcescens, was identified in a similar study on 21 ME/CFS cases.

(a commentary in Dutch is available here)

1. The quest for a pathogen

Scientists have been speculating about an infectious aetiology of ME/CFS for decades, without ever being able to link the disease to a specific pathogen. The idea that the disease might be triggered and/or maintained by an infection is due to the observation that for most of the patients the onset occurs after an infectious illness (Chu, L. et al. 2019). It has also been observed that after a major infection (whether parasitic, viral or bacterial) about 11% of the population develops ME/CFS (Mørch K et al. 2013), (Hickie I. et al. 2006).

In recent years, the advent of new technologies for pathogen hunting has given renewed impulse to the search for ongoing infection in this patient population. A 2018 study, investigating the genetic profile of peripheral blood for prokaryotic and eukaryotic organisms reported that most of the ME/CFS patients have DNA belonging to the eukaryotic genera Perkinsus and Spumella and to the prokaryotic class β-proteobacteria (alone or in combination) and that these organisms are statistically more present in patients than in controls (Ellis J.E. et al. 2018). Nevertheless, a previous metagenomic analysis of plasma by another group revealed no difference in the content of genetic material from bacteria and viruses between ME/CFS patients and healthy controls (Miller R.R. et al. 2016). Moreover, metagenomic analysis pursued in various samples from ME/CFS patients by both Stanford University and Columbia University has come empty (data not published, R, R).

2. Immunological methods

Another way of investigating the presence of current and/or past infections that might be specific of this patient population is to extract the information contained in the adaptive immune response. This can be made in several ways, each of them having their own limits. One way would be to collect the repertoire of T cell receptors (TCRs) of each patient and see if they have been elicited by some particular microorganism. This is a very complex and time-consuming method that has been developed in recent years and that I have described in details going through all the recent meaningful publications (R). The main limitation of this method is that, surprisingly, TCRs are not specific for a single epitope (Mason DA 1998), (Birnbaum ME et al. 2014), so their analysis is unlikely to reveal what agent selected them. On the other hand, the advantage of this method is that T cell epitopes are linear ones, so they are extremely suited for BLAST searches against protein databases. An attempt at applying this method to ME/CFS is currently underway: it initially gave encouraging results (R), then rejected by further analysis.

Another possible avenue for having access to the information registered by adaptive immunity is to investigate the repertoire of antibodies. The use of a collection of thousands of short random peptides coated to a plate has been recently proposed as an efficient way to study the response of B cells to cancer (Stafford P. et al. 2014), infections (Navalkar K.A. et al. 2014), and immunization (Legutki JB et al. 2010). This same method has been applied to ME/CFS patients and it has shown the potential of identifying an immunosignature that can differentiate patients from controls (Singh S. et al. 2016), (Günther O.P. et al. 2019). But what about the antigens eliciting that antibody profile? Given a set of peptides one’s antibodies react to, a possible solution for interpreting the data is to use these peptides as query sequences in a BLAST search against proteins from all the microorganisms known to infect humans. This has been done for ME/CFS, and the analysis led to several matches among proteins from bacteria, viruses, endogenous retroviruses and even human proteins (in fact, both this method and the one previously described can detect autoimmunity as well) (Singh S. et al. 2016).  There are several problems with this approach, though. First of all, the number of random peptides usually used in these arrays is not representative of the variety of possible epitopes of the same length present in nature. If we consider the paper by Günther O.P. and colleagues, for instance, they used an array of about 10^5 random peptides with a length of 12 amino acids each, with the number of all the possible peptides of the same length being  20^12 ∼ 4·10^15. This means that many potential epitopes one has antibodies to are not represented in the array. Another important limitation is that B cell epitopes are mainly conformational ones, which means that they are assembled by the folding of the proteins they belong to (Morris, 2007), the consequence of this being that the subset of random peptides one’s serum react to are in fact linear epitopes that mimic conformational ones (they are often called mimotopes) (Legutki JB et al. 2010). This means that a BLAST search of these peptides against a library of proteins from pathogens can lead to completely misleading results.

Recently an array of overlapping peptides that cover the proteins for many know viruses has been successfully used for the study of acute flaccid myelitis (AFM). This technology, called VirScan, has succeeded in linking AFM to enteroviruses where metagenomic of the cerebrospinal fluid has failed (Shubert R.D. et al. 2019). This kind of approach is probably better than the one employing arrays of random peptides, for pathogen hunting. The reason being that a set of only 150.000 random peptides is unlikely to collect a significant amount of B cell epitopes from viruses, bacteria etc. Random peptides are more suited for the establishment of immunosignatures.

3. My own analysis

I have recently got access to the results of a study I was enrolled in two years ago. My serum was diluted and applied to an array of 150.000 peptides with a length of 16 random amino acids (plus four amino acids used to link the peptides to the plate). Residues Threonine (T), Isoleucine (I) and Cysteine (C) were not included in the synthesis of peptides. Anti-human-IgG Ab was employed as a secondary antibody. The set of peptides my IgGs reacted to has been filtered with several criteria, one of them being subtracting the immune response common to healthy controls, to have an immune signature that differentiates me from healthy controls. The end result of this process is the set of the following six peptides.

 1 ALHHRHVGLRVQYDSG 2 ALHRHRVGPQLQSSGS 3 ALHRRQRVLSPVLGAS 4 ALHRVLSEQDPQLVLS 5 ALHVRVLSQKRPLQLG 6 ALHLHRHVLESQVNSL

Table 1. My immunosignature, as detected by an array of 150.000 random peptides 20-amino-acid long, four of which are used for fixing them to the plate and are not included here.

The purpose of the following analysis is to search for the viral epitopes that elicited this immune response. To overcome the limitations enumerated at the end of the previous paragraph I have decided to search within the database of viral proteins for exact matches of the length of 7 amino acids. Why this choice? A survey of a set of validated B cell epitopes found that the average B cell epitope has a linear stretch of 5 amino acids (Kringelum, et al., 2013); according to another similar work, the average linear epitope within a conformational one has a length of 4-7 amino acids (Andersen, et al., 2006). To filter the matches and to reduce the number of matches due to chance, I opted for the upper limit of this length. I excluded longer matches to limit the number of mimotopes for conformational epitopes. Moreover, I decided to look only for perfect matches (excluding the possibility of gaps and substitutions) so to simplify the analysis. It is worth mentioning that a study of cross-reactive peptides performed for previous work (Maccallini P. 2016), (Maccallini P. et al. 2018) led me to the conclusion that cross-reactive 7-amino-acid long peptides might often have 100% identity.

So, to recap, I use the following method: BLAST search (blastp algorithm) against viral proteins (taxid 10239), a perfect match (100% identity) of at least 7-amino-acid peptides (≥43% query cover), max target sequences: 1000, substitution matrix: BLOSUM62.

4. Results

Table 2 is a collection of the matches I found with the method described above. You can look at figure 1 to see how to read the table.

 ALHHRHVGLRVQYDSG (102_1_F_viruses) 9-LRVQYDS-15 QDP64279.1(29-35) Prokaryotic dsDNA virus sp. Archea, Ocean 8-GLRVQYD-14 AYV76690.1(358-364) Terrestrivirus sp Amoeba, forest soil ALHRHRVGPQLQSSGS (102_2_F_viruses) 2-LHRHRVG-8 YP_009619965.1(63-69) Stenotrophomonas phage vB_SmaS_DLP_5 Stenotrophomonas maltophilia (HP) ALHRRQRVLSPVLGAS (102_3_F_viruses) 2-LHRRQRV-8 QHN71154.1 (288-294) Mollivirus kamchatka Protozoa (R) 8-VLSPVLG-14 QDB71078.1 (24-30) Serratia phage Moabite Serratia marcescens (HP) ALHRVLSEQDPQLVLS (102_4_F_viruses) 7-SEQDPQL-13 BAR30981.1 (151-157) uncultured Mediterranean phage uvMED Archea and Bacteria, Med. sea 3-HRVLSEQ-9 AXS67723.1 (494-500) Cryptophlebia peltastica nucleopolyhedrovirus invertebrates 2-LHRVLSE-8 YP_009362111.1 (74-80) Marco virus Ameiva ameiva ALHLHRHVLESQVNSL (102_6_F_viruses) 2-LHLHRHV-8 YP_009119106.1 (510-516) Pandoravirus inopinatum Acanthamoeba 4-LHRHVLE-10 ASZ74651.1 (61-67) Mycobacterium phage Phabba Mycobacterium smegmatis mc²155 (HP)

Table 2. Collection of the matches for the BLAST search of my unique set of peptides against viral proteins (taxid 10239). HP: human pathogen. See figure 1 for how to read the table.

5. Discussion

There are no human viruses detected by this search. There are some bacteriophages and three of them have as hosts bacteria that are known to be human pathogens. Bacteriophages (also known as phages) are viruses that use the metabolic machinery of prokaryotic organisms to replicate (figure 2). It is well known that bacteriophages can elicit specific antibodies in humans: circulating IgGs to naturally occurring bacteriophages have been detected (Dąbrowska K. et al. 2014) as well as specific antibodies to phages injected for medical or experimental reasons (Shearer WT et al. 2001), as reviewed here: (Jonas D. Van Belleghem et al. 2019). According to these observations, one might expect that when a person is infected by a bacterium, this subject will develop antibodies not only to the bacterium itself but also to its phages.

If that is the case, we can use our data in table 2 to infer a possible exposure of our patient to the following bacterial pathogens: Stenotrophomonas maltophilia (HP), Serratia marcescens (HP), Mycobacterium smegmatis mc²155 (HP). In brackets, there are links to research about the pathogenicity for humans of each species. M. smegmatis belongs to the class Actinobacteria, while S. maltophila and S. marcescens are included in the class γ-Proteobacteria.

Interesting enough, Serratia marcescens was identified as one of the possible bacterial triggers for the immunosignature of a group of 21 ME/CFS patients, in a study that employed an array of 125.000 random peptides (Singh S. et al. 2016). This bacterium accounts for rare nosocomial infections of the respiratory tract, the urinary tract, surgical wounds and soft tissues. Meningitis caused by Serratia marcescens has been reported in the pediatric population (Ashish Khanna et al. 2013).

Mollivirus kamchatka is a recently discovered giant virus whose hosts are presumed to be protozoa that inhabit the soil of subarctic environment (Christo-Fourox E. et al. 2020). Not sure what the meaning might be in this context.

6. Next step

The next step will be to perform a similar BLAST search against bacterial proteins to see, among other things,  if I can find matches with the six bacteria identified by the present analysis. A further step will be to pursue an analogous study for eukaryotic microorganisms and for human proteins (in search for autoantibodies).

# Maximum of a normal random vector

When Ettore Majorana first met Enrico Fermi, between the end of 1927 and the beginning of 1928, Fermi – who was already an acclaimed scientist in the field of nuclear physics – had just solved an ordinary differential equation of the second-order (whose solution is now commonly named the Thomas-Fermi function) – by numerical integration. It required a week of assiduous work for him to accomplish this task, with the aid of a hand calculator. Fermi showed the results (a table with several numbers) to Majorana, who was a 21 years old student of electrical engineering who had some vague idea of switching from the field of boring ways of providing electrical energy for boring human activities, to the quest for the intimate structure of matter, under the guide of Fermi, the brightest Italian scientific star of that period.

Majorana looked at the numerical table, as I wrote, and said nothing. After 24 hours he came back to Fermi’s lab and compared his own results with the table made by Fermi: he concluded that Fermi didn’t make any mistake, and he decided that it could be worth working with him, so he switched from engineering to physics (Segrè E. 1995, page 69-70).

Only recently it has been possible to clarify what kind of approach to the equation Majorana had in those hours. It is worth mentioning that he not only was able to prove that the equation admitted in fact a unique solution (this is not trivial, because the theorem of Cauchy can not be applied directly to that equation and Fermi did not cared to solve this problem); he also solved the equation in a semianalytic way (by series expansion), with a method that has the potential to be generalized to a whole family of differential equations and that has been published only 75 years later (Esposito S. 2002), after the discovery of the notes Majorana wrote in those hours. He used this solution, not a numerical one, to derive a table that could be used for a comparison with the solution by Fermi. This mathematical discovery has been possible only because the notes that Majorana wrote in those two days have been found and studied by Salvatore Esposito, with the help of other physicists.

I won’t mention here the merits that Majorana has in theoretical physics, mainly because I am very very far from understanding even a bit of his work. But as Erasmo Recami wrote in his biography of Majorana (R), a paper published by Majorana in 1932 about the relativistic theory of particles with arbitrary spin (Majorana E. 1932) contained a mathematical discovery that has been made independently in a series of papers by Russian mathematicians only in the years between 1948 and 1958, while the application to physics of that method – described by Majorana in 1932 – has been recognized only years later. The fame of Majorana has been constantly growing for the last decades.

The notes that Majorana took between 1927 and 1932 (in his early twenties) have been studied and published only in 2002 (Esposito S. et al. 2003). These are the notes in which the solution of the above-mentioned differential equation has been discovered, by the way. In these 500 pages, there are several brilliant calculations that span from engineering to statistics, from advanced mathematical methods for physics to, of course, theoretical physics. Some notes on classical mechanics are also included. In what follows I will go through what is probably the least difficult and important page among them, the one where Majorana presents an approximated expression for the maximum value of density of the largest of the components of a normal random vector. I have already written in this blog some notes about the multivariate normal distribution (R). But how can we find the maximum component of such a vector and how does it behave? Let’s assume that each component has a mean of zero and a standard deviation of one. Then we easily find  that the analytical expressions of the cumulative distribution function and of the density of the largest component (let’s say Z) of an m-dimensional random vector are

We can’t have an analytical expression for the integral, but it is relatively easy to use Simpson’s method (see the code at the end of this paragraph) to integrate these expressions and to plot their surfaces (figure 1).

Now, what about the maximum reached by the density of the largest among the m components? It is easy, again, using our code, to plot both the maximum and the time in which the maximum is reached, in function of m (figure 2, dotted lines). I have spent probably half an hour in writing the code that gives these results, but we usually forget how fortunate we are in having powerful computers on our desks. We forget that there was a time in which having an analytical solution was almost the only way to get mathematical work done. Now we will see how Majorana obtained the two functions in figure 2 (continuous line), in just a few passages (a few in his notes, much more in mine).

% file name = massimo_vettore_normale

clear all
delta = 0.01;
n(1) = 0.

for i=2:1:301;
n(i) = delta + n(i-1);
n_2(i) = - n(i);
end

for i=1:1:301
f(i) = 0.39894228*( e^(  (-0.5)*( n(i)^2 )  ) );
end

for i=1:1:3
sigma(1) = 0.;
sigma(3) = sigma(1) + delta*( f(1) + ( 4*f(2) ) + f(3) )/3;
sigma(2) = sigma(3)*0.5;
for j=2:1:299
sigma(j+2) = sigma(j) + delta*( f(j) + ( 4*f(j+1) ) + f(j+2) )/3;
end
end

for i=1:1:301
F(i) =  0.5 + sigma(i);
F_2(i) = 1-F(i);
end

for i=1:1:100;
m(i) = i;
end

for i=1:1:301
for j=1:1:100
F_Z (i,j) = F(i)^j;
F_Z_2 (i,j) = F_2(i)^j;
f_Z (i,j) = 0.39894228*j*( F(i)^(j-1) )*( e^(  (-0.5)*( n(i)^2 )  ) );
f_Z_2 (i,j) = 0.39894228*j*( F_2(i)^(j-1) )*( e^(  (-0.5)*( n(i)^2 )  ) );
endfor
endfor

figure (1)
mesh(m(1:2:100),n(1:10:301),F_Z(1:10:301,1:2:100));
grid on
hold on
mesh(m(1:2:100),n_2(2:10:301),F_Z_2(2:10:301,1:2:100));
xlabel('m');
ylabel('t');
legend('F',"location","NORTHEAST");

figure (2)
mesh(m(1:2:100),n(1:10:301),f_Z(1:10:301,1:2:100));
grid on
hold on
mesh(m(1:2:100),n_2(2:10:301),f_Z_2(2:10:301,1:2:100));
xlabel('m');
ylabel('t');
legend('f',"location","NORTHEAST");


Asymptotic series

I have always been fascinated by integrals since I encountered them a lifetime ago. I can still remember the first time I learned the rule of integration by parts. I was caring for my mother who was dying. That night I was in the hospital with her, but she couldn’t feel my presence, she had a tumour in her brain and she was deteriorating. And yet I was not alone, because I had my book of mathematics and several problems to solve. But when my mind was hit by the disease for the first time, about a year later, and I lost the ability to solve problems, then real loneliness knocked at my door.

Now, why am I talking about the integration by parts? Well, I have discovered a few days ago, while studying Majorana’s notes, that integration by parts – well known by students to be a path towards recursive integrations that usually leads to nowhere – is, in fact, a method that can be useful for developing series that approximate a function for large values of x (remember that Taylor’s polynomials can approximate a function only for values of x that are close to a finite value $x_0$, so we can’t use them when x goes to ∞). Majorana used one such a series for the error function. He developed a general method, which I tried to understand for some time, without being able to actually get what he was talking about. His reasoning remained in the back of my mind for days, while I moved from Rome to Turin, where I delivered a speech about a paper on the measure of electric impedance in the blood of ME/CFS patients; and when I cried, some minutes later, looking at my drawings put on the screen of a cinema, Majorana was with me, with his silence trapped behind dark eyes. A couple of days later, I moved to a conference in London, searching for a cure that could perhaps allow my brain to be normal again and I talked with a famous scientist who worked on the human genome project. Majorana was there too, in that beautiful room (just a few metres from Parliament Square), sitting next to me. I could feel his disappointment, I knew that he would have found a cure, had he had the chance to examine that problem. Because as Fermi once said to Bruno Pontecorvo, “If a problem has been proposed, no one in the world can resolve it better than Majorana” (Esposito S. et al. 2003). Back in Rome, I gave up with the general method by Majorana and I found the way to calculate the series from another book. The first tip is to write the error function as follows:

Now by integrating by parts, one gets

But we can integrate by parts one other time, and we get

And we can go on and on with integration by parts. This algorithm leads to the series

whose main property is that the last addend is always smaller (in absolute value) than the previous one. And even though this series does not converge (it can be easily seen considering that the absolute value of its generic addend does not go to zero for k that goes to ∞, so the Cauchy’s criteria for convergence is not satisfied) it gives a good approximation for the error function. From this series, it is easy to calculate a series for the Gaussian function (which is what we are interested in):

A clever way to solve a transcendental equation if you don’t want to disturb Newton

Taking only the first two terms of the series, we have for the cumulative distribution function of Z the expression:

The further approximation on the right is interesting, I think that it comes from a well-known limit:

Now we can easily calculate the density of Z by deriving the cumulative distribution function:

With a further obvious approximation, we get:

In order to find the value of x in which this density reaches its largest value, we have to search for the value of x in which its derivative is zero. So we have to solve the following equation:

Which means that we have to solve the transcendental equation:

Majorana truncated the second member of the equation on the right and proposed as a solution the following one:

Then he substituted again this solution in the equation, in order to find ε:

With some further approximations, we have

So Majorana’s expression for the value of x in which the density of Z reaches its maximum value is

I have tried to solve the transcendental equation with Newton’s method (see the code below) and I found that Majorana’s solution is a very good one (as you can see from figure 3). Now, If we compare the approximation by Majorana with what I obtained using numerical integration at the beginning (figure 2) we see that Majorana found a very good solution, particularly for the value of $x_M$. Note: the transcendental equation that has been solved here seems the one whose solution is the Lambert W function, but it is not the same!

% file name = tangenti

clear all

x(1) = 1;            %the initial guess
for i=1:1:100
m(i) = i;
end

for i=1:1:100
for j = 2:1:1000
f(j-1) = exp( 0.5*( x(j-1)^2 ) ) - ( m(i)/( x(j-1)*sqrt(2*pi) ) );
f_p(j-1) = x(j-1)*exp( 0.5*( x(j-1)^2 ) ) + ( m(i)/( (x(j-1)^2)*sqrt(2*pi) ) );
x(j) = x(j-1) - ( f(j-1)/f_p(j-1) );
if ( abs(x(j)) < 0.001 )
break;
endif
max_t (i) = x(j);
endfor
endfor

% the aproximations by Majorana

for j=1:1:100
max_t_M (j) = sqrt(log(j^2)) - ( log(sqrt(2*pi*log(j^2)))/sqrt(log(j^2)) );
endfor

% it plots the diagrams

plot(m(1:1:100),max_t (1:1:100),'.k','Linewidth', 1)
xlabel('m')
ylabel('time for maximum value')
grid on
hold on
plot(m(1:1:100),max_t_M (1:1:100),'-k','Linewidth', 1)

legend('numerical integration',"Majorana's approximation", "location", 'southeast')

Epilogue

From 1934 to 1938 Majorana continued his studies in a variety of different fields (from game theory to biology, from economy to quantistic electrodynamics), but he never published again (R), with the exception for a work on the symmetric theory of electrons and anti-electrons (Majorana E. 1937). But it has been concluded by biographers that the discoveries behind that work were made by Majorana about five years earlier and yet never shared with the scientific community until 1937 (Esposito S. et al. 2003). And in a spring day of the year 1938, while Mussolini was trying his best to impress the world with his facial expressions, Ettore became a subatomic particle: his coordinates in space and their derivatives with respect to time became indeterminate. Whether he had lived in a monastery in the south of Italy or he had helped the state of Uruguay in building its first nuclear reactor; whether he had seen the boundless landscapes of Argentina or the frozen depth of the abyss, I hope that he found, at last, what he was so desperately searching for.

He had given his contribution to humanity, so whatever his choice has been, his soul was already safe. And as I try to save my own soul, going back and forth from mathematics to biology, in order to find a cure, I can feel his presence. The eloquence of his silence trapped behind dark eyes can always be clearly heard if we put aside the noise of the outside world. And it tells us that Nature has a beautiful but elusive mathematical structure which can nevertheless be understood if we try very hard.

In the meanwhile, I write these short stories, like a mathematical proof of my own existence, in case I don’t have further chances to use my brain.

Until time catches me.