I am currently trying to learn as much as I can about basic statistics; and one of the reasons for that is the need to gain access to the tools useful to better understand statistical analyses of medical data that I find in all the research papers on ME/CFS and related topics, which I read almost on a daily basis. Hopefully, I should also be able, at the end of this path, to perform my own statistical analyses of biological data.
When I study a subject where it is required the use of computer-aided calculation, I usually code my own software, even if there are programs ready to use for that purpose. The main reason for that is that in order to write a program that really works, you have to reach a complete understanding of the argument. So, it is a test in some way: if your program works properly, then you have learned something new. The other reason is that I like this activity, even though I can’t perform it for most of the time, because of the cruel encephalopathy due to my mysterious disease.
To give you an example of this attitude that I have, when I decided to study molecular mimicry between B-cell epitopes from a mathematical point of view, I found several programs for peptides alignments; nevertheless I decided to write my own software, and I spent several months of pure joy (but also of profound frustration because I was often too sick to write even a single line of code), learning about substitution matrices, dynamic programming and performing my own in silico experiments of cross-reactivity between enolases (a glycolitic enzyme evolutionary conserved across species) from different organisms. This work was then used in a paper recently published in the scientific journal Medical Hypotheses (Maccallini P et al. 2018). In figure 1 the flow diagram that I drew, while here and here you can download the main program and the subroutine, respectively, that I coded in Octave in order to perform comparisons between proteins. But this is another story… So, let’s come back to statistics.
Many measurements of physical or biological phenomena obey to what is known as normal distribution: indicated with x the value of a specific measure, we have that values near the mean (also known as expectation and indicated with the letter μ) are more frequently assumed by x. In particular, if we indicate f(t) a function (named density) such that f(t)dt represents the probability that x takes a value between t and t+dt, then f(t) is the well known bell-shaped curve in figure 2 (on the left). As you can see, the higher the standard deviation σ is, the lower the bell: this means that standard deviation represents a measure of how much the values taken by x are spread on the x-axis.
Figure 2 has been obtained with a code that I have written in Octave, which can be downloaded here. So, what is the probability that x takes a value below a? This probability is given by the integral of f(t) between -∞ and a. This function is named distribution function, Φ(x). In figure 2, on the right, Φ(x) is plotted for different values of expectation and standard deviation. I have calculated these integrals using the numerical method known as Simpson’s rule. For μ=0 and σ=1 we have a particular type of normal distribution, usually called standard normal distribution. Some values of its distribution function are collected in the table in figure 3, that I have derived with this code in Octave. A table like this is available in any book of statistics, but I found useful to calculate my own table, for the reasons we have already mentioned.
Now, what about the distribution function for a normal non-standard distribution? Of course, you can calculate Φ(x) for any given values of μ and σ using available tools (like, for instance, function normcdf(x, m, s) of Octave). But, as always, I have coded my own software (in Fortran in this case). You can download the executable file here. Put it in a folder, then run it: it will ask you to type the values of μ, σ and it will give as output two diagrams (one for f, the other for Φ) and a .txt file which contains several values of Φ that can be used for applications. For μ=2, σ=0.5 we obtain the diagrams in figure 4, and a table whose first lines are in figure 5.