Summer simulation, a correlation study

Summer simulation, a correlation study

Abstract

I report here on a successful attempt at reproducing the improvement in my ME/CFS symptoms that I usually experience during summer by means of artificially manipulating the temperature of the air, humidity, and the infrared radiation inside a room I spent five months in, during the period from December to April 2022. I show that the improvement is not correlated with the physical parameters of the air (temperature, relative and absolute humidity, atmospheric pressure, air density, dry air density, saturation vapor pressure, and partial vapor pressure). I hypothesize that it is driven instead by either the mid-wave infrared radiation emitted by a lamp used in the experiment or by the longwave infrared radiation emitted by the walls of the room in which I spent the months of the experiment.

1) Introduction

My ME/CFS improves during summer, in the period of the year that goes from May/June to the end of September (when I am in Rome, Italy). I don’t know why. In a previous study (Maccallini P. 2022), I analyzed the correlation between my symptoms and several environmental parameters, during three periods covering a total of 190 days: in one, I improved while the season was going toward summer, in another one we were in full-blown summer, and in the third one my functioning declined while summer was fading away. The statistical analysis showed a positive correlation between my functionality score and the temperature of the air and a negative correlation between my functionality score and air density and dry air density (remember that the higher the functionality score, the better I feel). No correlation was found between my functionality score and: relative humidity, absolute humidity, atmospheric pressure, and the concentration of particulate with a diameter below or equal to 2.5\,\mu m. While the absence of correlation can be used to rule out causation, the presence of correlation does not mean automatically causation. Moreover, since air density and dry air density are strongly inversely correlated with air temperature, the correlations with air density might be driven by the correlation with air temperature (and vice-versa). In order to further investigate the nature of these correlations, I planned a second study, this time performed inside a chamber where I could manipulate the temperature of the air, the content of water in the air, and the amount of infrared radiation I was hit by. I called this chamber “Summer Simulator” because I used several devices to obtain, during the months from December to May, the conditions that we usually have in Rome during summer. I lived inside the simulator while I collected a functionality score, three times per day, and while I measured (every 10 minutes) the temperature of the air, relative humidity, and atmospheric pressure. The data for the months from January to April were used to study how my symptoms correlated with the environmental parameters I could directly measure and those that I could mathematically derive.

2) Methods

2.1) Data collection

I used a Bresser WiFi 4-in-1 Weather Station (see figure 1) to measure the temperature of the air, pressure, and relative humidity inside the room I spent almost all my time during the experiment. Measures were registered every 10 minutes and sent to the console which uploaded them to my Weathercloud account, via WiFi. The barometer was calibrated by using the closest weather station of the meteorological network of Rome (R). I registered a functionality score three times a day (value from 3 to 7, the higher the better) to use for the search for correlations with environmental data. The data that were employed were those of January, February, March, and April 2022. I also used the daily functionality score collected in 2017, during the same months, for testing the effect of the Summer Simulator on my symptoms.

Figure 1. Bresser 4-in-1 sensor: I used it to measure the temperature of the air, atmospheric pressure, and relative humidity inside the simulator. In the second image, you see the interface of the weather station (left) and a CO2 sensor I used to ensure that the air exchange in the room was adequate.

2.2) Summer Simulator

Simulation of summer was attempted with the following settings. I lived for 5 months in a room of 9\,m^2 . The temperature was controlled by a heat pump (Electrolux EXP26U558HW) that collects air from inside the room, divides it into two fluxes, transfers heat from one to the other (by cycles of compression-evaporation of refrigerant R290), and releases the warmer flux inside and the other one outside the window. The humidity was controlled by two/three ultrasonic mist humidifiers that release mist which is then turned into vapor in a fraction that depends on the temperature of the air (Figure 2).

Figure 2. Ultrasonic mist humidifiers.

To simulate thermal radiation, I used an infrared heater (Turbo TN35E, figure 3) coupled with a chamber with walls covered by mylar, a material that reflects most of the thermal radiation; the chamber was built around the bed, and it had only one side open, to allow for air circulation (figure 4).

Figure 3. Infrared heater, Turbo TN35E.

Figure 4. The infrared chamber with walls made up of mylar, a material that reflects thermal radiation. On the right, you can see the heat pump (white) and the pipeline of the exhausted air, behind the curtain, that leads outside, through a hole in the window.

The infrared heater was, on average, at a distance of 0.8 m from the left side of the bed. It emits most of its power in the mid-wave infrared range (1-4 \mu m).

2.3) Mathematical model for the infrared radiation

I built a mathematical model for the radiation from the lamp (Maccallini P. 2022) which predicts that the power per unit of area of a plane parallel to yz (see figure 4) centered in P\equiv(x_P,\,y_P,\,z_P) is given by

1) W_\bot(x_P, y_P,z_P)\,=\,I_\xi\lg{\frac{\sqrt{(\frac{L}{2}-y_P)^2+x_P^2 +z_P^2 }+\frac{L}{2}-y_P}{\sqrt{(\frac{L}{2}-y_P)^2+x_P^2 +z_P^2 }-(\frac{L}{2}+y_P)}}

where L = 0.53 m is the length of the emitting surface of the lamp on the y axis and where

2) I_\xi\,=\,\frac{e\Pi(1-4b^2)\sqrt{1-4c^2}}{8\pi hLbc[F(\arctan\sqrt{\frac{1-4c^2}{4c^2}},\sqrt{\frac{1-4b^2}{1-4c^2}})-E(\arctan\sqrt{\frac{1-4c^2}{4c^2}},\sqrt{\frac{1-4b^2}{1-4c^2}})]}

In this formula, e is the efficiency of the conversion of electrical power into mid-wave infrared emission (it is 93% according to the manufacturer) while \Pi is the power absorbed by the device (about 800 W, according to the measurements performed by a power meter) and where F and E are the elliptic functions of the first and second kind, respectively (Maccallini P 2013). Parameters b and c (the semi-axes of the ellipsoid in figure 5, in the directions y and z, respectively) depend on how the flux of radiant power is concentrated by the reflector of the lamp. In my model, they are undetermined and must be derived by a minimum of two independent measures of power per unit area. But in the absence of these measures, we can use the values for which we have the absolute minimum for the function I_\xi (b, c) which is reached for b = 0.49, c = 0.36 (assuming c < b < 0.5).

Figure 5. On the left, the systems of coordinates used for building the mathematical model of the infrared heater. On the right, the detail of the ellipsoid that describes the concentration of the radiant power that exits the heater as a result of the power released by the bulb and the power reflected by the reflector behind the bulb; this ellipsoid is represented as a sphere on the left side of this figure. All the details of the model can be found in (Maccallini P. 2022).

The solution of the model (equations 1 and 2) was determined analytically, the values for the elliptic functions were calculated by employing Wolfram Mathematica 5.2 as a function of b and c and saved in a csv file, then the output of the lamp was calculated by a custom script written in Octave (Maccallini P. 2022) that used the output from the Mathematica script. Note that Octave does not have a built-in function for the elliptic functions and a numerical integration by Simpson’s rule led to unsatisfactory results, as detailed in the already mentioned reference. The prediction for a plane parallel to yz at a distance x_P\,=\,0.8\,m from O, is reported in figure 6, where it can be seen that the maximum of W_\bot is reached for P\equiv(0.8,\,0,\,0) and its value is about 376 \frac{W}{m^2}.

Figure 6. The power per unit of area orthogonal to the plane zy (W_\bot), as a function of x_P, y_P, z_P, for x_P\,=\,0.8\,m.

2.4) Derived environmental parameters

The weather station only measured air temperature, atmospheric pressure, and relative humidity, every 10 minutes. The measures were then uploaded via WiFi to a server and downloaded to my computer from my Weathercloud account. To calculate the daily means of these parameters I designed an algorithm that takes into account missing data, as follows: it calculates an hourly mean measure when at least one measure is available for that hour; it fills isolate missing hourly means with the mean between the measure of the hour before and the one of the following hour; it then calculates the daily mean if the hourly means of at least 21 hours are present for each day, otherwise the daily mean is considered as a missing value. From the hourly means of temperature, relative humidity, and pressure, I calculated the hourly means for saturation vapor pressure (SVP), partial vapor pressure (PVP), absolute humidity (AH), dry air density (DAD), and air density (AD), by means of the following formulae:

3) SVP\;=\;6.112e^{\frac{17.62t}{243.12+t}}

4) PVP\,=\,RH\frac{SVP}{100}

5) AH\,=\,100\mu_w\frac{PVP}{R(273.15\,+\,t)}

6) DAD\,=\,100\mu_{dry}\frac{p\,-\,PVP}{R(273.15\,+\,t)}

7) AD\,=\,AH\,+\,DAD

where t, p, RH are air temperature, barometric pressure, and relative humidity, respectively, while \mu_w\,=\,18.015\frac{g}{mol} is the molecular weight of water, \mu_{dry}\,=\,28.97\frac{g}{mol} is the molecular weight of dry air, R\,=\,8.31\frac{J}{K\cdot mol} is the constant of gasses, and where we express temperature in °C and pressure in hPa. These formulae are taken from chapter 4 of the Guide to Instruments and Methods of Observation by the World Meteorological Organization (R) and are derived and discussed in (Maccallini P. 2021). Once obtained the hourly means for the derived environmental measures, the corresponding daily means were computed by using the same method described for t, p, and RH. In figure 8, I reported, as an example, the plots for the environmental parameters (daily means) and the mean daily score for the month of March. All the other plots can be obtained by running the R script reported at the end of this article, in the same folder containing the raw data (that are available, just before the script, for download).

2.5) Statistical analysis

To assess the effect of the summer simulation on my symptoms, I compared the functionality score collected in 2017 (from January to April) with the score collected during the same months of 2022, while living inside the Summer Simulator. The null hypothesis was tested by running the Wilcox test. The correlations between the daily mean score and the environmental parameters were assessed by Spearman’s correlation coefficient and the significance of each correlation was also tested. All the calculations were performed by a custom R script written by myself, available, along with the row data, at the bottom of this page.

3) Results

The comparison between the mean daily scores for the months of January, February, March, and April 2017 and the same parameter for the same months of the year 2022 show that the Summer Simulator has raised the mean daily score in a statistically significant fashion: p value = 1.897\cdot10^{-10}; see figure 7.

Figure 7. Comparison between the mean daily functionality score in 2017 (from January to April), on the right, and the same score in 2022 (same months), while living inside the Summer Simulator, on the left. The difference is statistically significant, reaching a p value of 1.897\cdot10^{-10}. In 2017 I was following a therapy that I judged mildly effective, which allowed me to take the functionality score three times a day during the cold months, which is something that I usually stop doing when I get worse in Autumn, also because the score would always be the lowest one. The diagram is a box and whisker plot, with the indication of the 1st quantile, the median (2nd quantile), and the 3rd quantile.

Figure 8. Daily mean values for each one of the environmental parameters considered in this study and for the functionality score, for the month of March.

The correlations between the mean daily score and the mean daily values of the environmental parameters, and the corresponding p values, are collected in Table 1. After correction for multiple comparisons (Benjamini-Hochberg method), none of the correlations reaches statistical significance.

Parameter
(daily mean)
\rhopp’
Temperature-0.21960.020590.1421
Relative humidity-0.18600.050680.1928
Pressure0.057640.54970.5497
Saturation vapor pressure-0.22140.019550.1421
Partial vapor pressure-0.20810.028420.1421
Absolute humidity-0.20980.027110.1421
Dry air density0.16860.078330.1928
Air density0.15930.096410.1928

Table 1. Correlations between the mean daily functionality score and several environmental parameters (Spearman’s rank correlation coefficient) and the corresponding p values before and after correction for multiple comparisons (Benjamini-Hochberg method).

I run a similar analysis with another method, too: I compared the daily mean temperatures for days with a mean functionality score above 5 and the daily mean temperature for the days with a score below 5, by running the Wilcox test; I repeated this same analysis for all the other environmental parameters (see Table 2 and figure 8).

Figure 8. For each environmental parameter, you find here the comparison between the mean daily value corresponding to days with a mean functionality score above 5, equal to 5, and below 5. Box and whisker plots, with also the indication of the means, as red dots. The p values for the comparisons between days with a mean functionality score below 5 and above 5 are collected in table 2.

Parameterpp
Temperature0.026940.1813
Relative humidity0.1330.2112
Pressure0.21120.2112
Saturation vapor pressure0.025350.1813
Partial vapor pressure0.063240.1897
Absolute humidity0.062130.1897
Dry air density0.030380.1813
Air density0.036250.1813

Table 2. For each environmental parameter, I compared the daily mean value corresponding to days with a functionality score above 5, with the same value for days with a functionality score below 5 (Wilcox test) and I corrected for multiple comparisons (Benjamini-Hochberg method).

4) Discussion

Simulation of summer by raising the air temperature, absolute humidity, and infrared radiation did show to improve my symptoms when I compared my daily functionality score with the score collected for the same period of the year, but in 2017, while I was following another therapy that I judged mildly effective (figure 7). Yet, no correlation reached statistical significance between my functionality score and the environmental parameters that could be measured or derived (temperature, relative and absolute humidity, atmospheric pressure, saturation vapor pressure, partial vapor pressure, dry air density, air density). If we accept that the summer simulation has really improved my condition (i.e. if we rule out a placebo effect), we must admit that the improvement was due to either some combination of the aforementioned parameters or something else. The only other parameter I could exert my control on was infrared radiation by an infrared heater that delivered a power of about 380\,\frac{W}{m^2} on the vertical plane by the left side of my bed, according to a mathematical model of the lamp (figure 6). This radiation was then partially reflected by the surfaces covered in mylar of my infrared chamber, so that it hit my body from other directions too, even though I have not built a model for the whole infrared chamber, yet.

One possible explanation, to my understanding, for the results of this study, is then that the improvement was driven by some effect of the radiation from the lamp, either directly on my body or indirectly to some other parameter of the environment which in turn affected my biology. Importantly, the absence of correlation with air temperature, air density, and dry air density reported by the present study suggests that the significant correlations found in the previous study (Maccallini P. 2022) between my functionality score and the three mentioned parameters were driven by something else that correlates with temperature (which negatively correlates with air density and dry air density). We know that there is an almost linear positive correlation between the temperature of the air and longwave infrared radiation from the Earth into space (Koll DDB and Cronin TW 2018), while the emission of the surface of the Earth within a short distance from it must follow the Stefan-Boltzman law so that it is proportional to T^4, where T is the temperature of the air at the level of the surface, expressed in Kelvin. But it should be noted that while 99% of the power emitted by the Earth as infrared radiation has a wavelength above 5\,\mu m, my infrared heater emits in the range 1-4\,\mu m. So, my artificial source of radiation does not simulate the radiation from the Earth, if we consider the wavelength (while it does from the point of view of the total power, as we will see in a following study, already concluded). But it could be considered that the radiation from the lamp did end up hitting the walls of my room, which in turn released their thermic energy as long wave infrared radiation. It is conceivable then that both in the Summer Simulator and in real summer, my improvements might be driven by far infrared radiation. Another possibility is that the improvement was due to the infrared radiation emitted by my body (long wave infrared radiation) and reflected back to me by the mylar coating of the infrared chamber.

I can’t rule out a placebo effect of the Summer Simulator, even though it showed greater effectiveness than another therapy I was following in 2017, which I considered mildly effective. The difficulty of assessing the effectiveness of a treatment by a subjective score can hardly be overestimated. The identification of objective measures that can be routinely used to score the level of symptoms is of paramount importance for correlation studies like the present one.

5) Conclusions

By simulating summer during the months from January to April of 2022, I experienced an improvement in my symptoms similar to the one I usually experience during summer (although of a lesser magnitude). The simulation was achieved by increasing air temperature and absolute humidity, by using an infrared heater, and by living inside a chamber covered with mylar, a material that reflects most of the thermal radiation, so to avoid dispersion of the radiation from the heater. The improvement does not correlate with any of the parameters of the air (temperature, absolute and relative humidity, saturation vapor pressure, partial vapor pressure, density of the air, density of dry air, atmospheric pressure). One possible explanation for my improvement while living inside the Summer Simulator is that it was due to either the direct radiation from the heater (which is mid-wave infrared radiation) or the radiation from the walls heated by the lamp and the air. The latter radiation is far infrared radiation, the same radiation emitted by the Earth, which linearly correlates with air temperature and which might drive the improvements I experience during summer and the correlation between my functionality score and air temperature reported in a previous study.

6) Funding

This study was supported by those who donated to this website.

7) Supplementary material

The environmental measures and the symptom scores used for this study are available for download:

Score_control_Summer_Simulator.csv

Score_Summer_Simulator.csv

Weathercloud_2022_1.csv

Weathercloud_2022_2.csv

Weathercloud_2022_3.csv

Weathercloud_2022_4.csv

The mathematical model of the infrared heater is available here for download. The script I wrote in R for the statistical analyses is the following one.

# file name: summer_simulator
#
# year: 2022
# month: 08
# day: 31
#
#-------------------------------------------------------------------------------
# We read the measures from Bresser Weather Station
#-------------------------------------------------------------------------------
#
# We set the names of the columns for the data.frames containing environmental data
#
cn<-c("Date","Tempin","Temp","Chill","Dewin","Dew","Heatin","Heat","Humin","Hum",
"Wspdhi","Wspdavg","Wdiravg","Bar", "Rain", "Rainrate","Solarrad","Uvi")
#
# We read the environmental data
#
Jan_2022<-read.csv("Weathercloud Paolo WS 2022-01.csv", sep = ";", col.names = cn, header = T)
Feb_2022<-read.csv("Weathercloud Paolo WS 2022-02.csv", sep = ";", col.names = cn, header = T)
Mar_2022<-read.csv("Weathercloud Paolo WS 2022-03.csv", sep = ";", col.names = cn, header = T)
Apr_2022<-read.csv("Weathercloud Paolo WS 2022-04.csv", sep = ";", col.names = cn, header = T)
List_2022<-list(Jan_2022, Feb_2022, Mar_2022, Apr_2022)
#
# We set some parameters
#
mis<-3 # the highest number of missing hours accepted for a day
muw<-18.01528 # molecular weight of water (g/mol)
mud<-28.97 # molecular weight of dry air (g/mol)
R<-8.31 # universal constant of gasses (J/(K*mole))
month_name<-c("January", "February", "March", "April")
#
#-------------------------------------------------------------------------------
# We work on the score
#-------------------------------------------------------------------------------
#
# We set the names of the columns for the data.frame containing the score
#
cn<-c("Year","Month","Day","score_1","score_2","score_3")
#
# We read the scores
#
Score<-read.csv("Score_Summer_Simulator.csv", sep = ";", col.names = cn, header = T)
Score_control<-read.csv("Score_control_Summer_Simulator.csv", sep = ";", col.names = cn, header = T)
#
attach(Score)
#
ms<-c(rep(0,31*4))
#
# We calculate the daily mean score
#
for (d in 1:length(Year)) {
if (is.na(score_1[d])==F&is.na(score_2[d])==F&is.na(score_3[d])==F) {
ms[d]<-(score_1[d] + score_2[d] + score_3[d])/3
} else ms[d]<-NA
}
#
attach(Score_control)
#
ms_c<-c(rep(0,length(Year)))
#
# We calculate the daily mean score
#
for (d in 1:length(Year)) {
if (is.na(score_1[d])==F&is.na(score_2[d])==F&is.na(score_3[d])==F) {
ms_c[d]<-(score_1[d] + score_2[d] + score_3[d])/3
} else ms[d]<-NA
}
#
# We test the effect of the simulator
#
wilcox.test(ms, ms_c)
boxplot(ms, ms_c, ylab = "score", names = c("summer simulator","other"))
#
#-------------------------------------------------------------------------------
# We calculate mean values for temp (°C), relative humidity (%), and atmospheric
# pressure (hPa)
#-------------------------------------------------------------------------------
#
# We initialize the arrays that will contain the daily means of the environmental
# data
#
t<-c(rep(0,31*4))
rh<-c(rep(0,31*4))
p<-c(rep(0,31*4))
svp<-c(rep(0,31*4))
pvp<-c(rep(0,31*4))
ah<-c(rep(0,31*4))
dad<-c(rep(0,31*4))
ad<-c(rep(0,31*4))
#
#-------------------------------------------------------------------------------
# We work on each month
#-------------------------------------------------------------------------------
#
for (q in 1:4) {
index<-q
add<-31*(q-1)
attach(List_2022[[q]])
#
# We extract the day of the month, the year, the hour from column Date
#
DateR<-read.table(text=Jan_2022[,1], sep ="/")
DateR2<-read.table(text=DateR[,3], sep =" ")
DateR3<-read.table(text=DateR2[,2], sep =":")
#
#-------------------------------------------------------------------------------
#
first_day<-DateR[1,1]
last_day<-DateR[length(DateR[,1]),1]
#
# We calculate the hourly mean for temperature (°C)
#
Temp_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
num<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
for (j in 1:length(DateR[,1])) {
if ( DateR[j,1]==d & DateR3[j,1]== h-1) {
if (is.na(Temp[j])==F) {
Temp_mean[d,h]<-(Temp_mean[d,h] + Temp[j])
num[d,h]<-(num[d,h]+1)
}
}
}
}
}
for (d in first_day:last_day) {
for (h in 1:24) {
if (num[d,h]>0) Temp_mean[d,h]<-Temp_mean[d,h]/num[d,h] else Temp_mean[d,h]<-NA
}
}
#
# We fill isolate missing values
#
for (d in first_day:last_day) {
for (h in 2:23) {
if (is.na(Temp_mean[d,(h-1)])==F&is.na(Temp_mean[d,h])==T&is.na(Temp_mean[d,(h-1)])==F) {
Temp_mean[d,h]<-(Temp_mean[d,(h-1)]+Temp_mean[d,(h+1)])/2
}
}
}
for (d in (first_day+1):(last_day-1)) {
if (is.na(Temp_mean[(d-1),(24)])==F&is.na(Temp_mean[d,1])==T&is.na(Temp_mean[d,2])==F) {
Temp_mean[d,1]<-(Temp_mean[(d-1),24]+Temp_mean[d,2])/2
}
}
#
# We calculate the daily mean for temperature (°C)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(Temp_mean[d,h])==T) count<-count+1
}
if (count<=mis) t[add+d]<-mean(Temp_mean[d,],na.rm = T) else t[add+d]<-NA
}
#
# We calculate the hourly mean relative humidity (%)
#
RH_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
num<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
for (j in 1:length(DateR[,1])) {
if ( DateR[j,1]==d & DateR3[j,1]== h-1) {
if (is.na(Hum[j])==F) {
RH_mean[d,h]<-(RH_mean[d,h] + Hum[j])
num[d,h]<-(num[d,h]+1)
}
}
}
}
}
for (d in first_day:last_day) {
for (h in 1:24) {
if (num[d,h]>0) RH_mean[d,h]<-RH_mean[d,h]/num[d,h] else RH_mean[d,h]<-NA
}
}
#
# We fill isolate missing values
#
for (d in first_day:last_day) {
for (h in 2:23) {
if (is.na(RH_mean[d,(h-1)])==F&is.na(RH_mean[d,h])==T&is.na(RH_mean[d,(h-1)])==F) {
RH_mean[d,h]<-(RH_mean[d,(h-1)]+RH_mean[d,(h+1)])/2
}
}
}
for (d in (first_day+1):(last_day-1)) {
if (is.na(RH_mean[(d-1),(24)])==F&is.na(RH_mean[d,1])==T&is.na(RH_mean[d,2])==F) {
RH_mean[d,1]<-(RH_mean[(d-1),24]+RH_mean[d,2])/2
}
}
#
# We calculate the daily mean relative humidity (%)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(RH_mean[d,h])==T) count<-count+1
}
if (count<=mis) rh[add+d]<-mean(RH_mean[d,],na.rm = T) else rh[add+d]<-NA
}
#
# We calculate the hourly mean pressure (hPa)
#
P_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
num<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
for (j in 1:length(DateR[,1])) {
if ( DateR[j,1]==d & DateR3[j,1]== h-1) {
if (is.na(Hum[j])==F) {
P_mean[d,h]<-(P_mean[d,h] + Bar[j])
num[d,h]<-(num[d,h]+1)
}
}
}
}
}
for (d in first_day:last_day) {
for (h in 1:24) {
if (num[d,h]>0) P_mean[d,h]<-P_mean[d,h]/num[d,h] else P_mean[d,h]<-NA
}
}
#
# We fill isolate missing values
#
for (d in first_day:last_day) {
for (h in 2:23) {
if (is.na(P_mean[d,(h-1)])==F&is.na(P_mean[d,h])==T&is.na(P_mean[d,(h-1)])==F) {
P_mean[d,h]<-(P_mean[d,(h-1)]+P_mean[d,(h+1)])/2
}
}
}
for (d in (first_day+1):(last_day-1)) {
if (is.na(P_mean[(d-1),(24)])==F&is.na(P_mean[d,1])==T&is.na(P_mean[d,2])==F) {
P_mean[d,1]<-(P_mean[(d-1),24]+P_mean[d,2])/2
}
}
#
# We calculate the daily mean pressure (hPa)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(P_mean[d,h])==T) count<-count+1
}
if (count<=mis) p[add+d]<-mean(P_mean[d,],na.rm = T) else p[add+d]<-NA
}
#
# We calculate hourly mean saturation vapor pressure (hPa)
#
Svp_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
if (is.na(Temp_mean[d,h])==F) {
Svp_mean[d,h]<-6.112*exp((17.62*Temp_mean[d,h])/(243.12 + Temp_mean[d,h]))
} else Svp_mean[d,h]<-NA
}
}
#
# We calculate daily mean saturation vapor pressure (hPa)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(Svp_mean[d,h])==T) count<-count+1
}
if (count<=mis) svp[add+d]<-mean(Svp_mean[d,],na.rm = T) else svp[add+d]<-NA
}
#
# We calculate hourly partial vapor pressure (hPa)
#
Pvp_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
if (is.na(Svp_mean[d,h])==F&is.na(RH_mean[d,h])==F) {
Pvp_mean[d,h]<-RH_mean[d,h]*Svp_mean[d,h]/100
} else Pvp_mean[d,h]<-NA
}
}
#
# We calculate daily mean partial vapor pressure (hPa)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(Pvp_mean[d,h])==T) count<-count+1
}
if (count<=mis) pvp[add+d]<-mean(Pvp_mean[d,],na.rm = T) else pvp[add+d]<-NA
}
#
# We calculate hourly absolute humidity (g/m^3)
#
AH_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
if (is.na(Pvp_mean[d,h])==F&is.na(Temp_mean[d,h])==F) {
AH_mean[d,h]<-100*Pvp_mean[d,h]*muw/( R*(273.15 + Temp_mean[d,h]) )
} else AH_mean[d,h]<-NA
}
}
#
# We calculate daily mean absolute humidity (g/m^3)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(AH_mean[d,h])==T) count<-count+1
}
if (count<=mis) ah[add+d]<-mean(AH_mean[d,],na.rm = T) else ah[add+d]<-NA
}
#
# We calculate hourly mean dry air density (g/m^3)
#
DAD_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
if (is.na(Pvp_mean[d,h])==F&is.na(Temp_mean[d,h])==F&is.na(P_mean[d,h])==F) {
DAD_mean[d,h]<-100*(P_mean[d,h]-Pvp_mean[d,h])*mud/( R*(273.15 + Temp_mean[d,h]) )
} else DAD_mean[d,h]<-NA
}
}
#
# We calculate daily mean dry air density (g/m^3)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(DAD_mean[d,h])==T) count<-count+1
}
if (count<=mis) dad[add+d]<-mean(DAD_mean[d,],na.rm = T) else dad[add+d]<-NA
}
#
# We calculate hourly mean air density (g/m^3)
#
AD_mean<-matrix(data=c(rep(0,31*24)), nrow = 31, ncol = 24)
#
for (d in first_day:last_day) {
for (h in 1:24) {
if (is.na(DAD_mean[d,h])==F&is.na(AH_mean[d,h])==F&is.na(P_mean[d,h])==F) {
AD_mean[d,h]<-AH_mean[d,h] + DAD_mean[d,h]
} else AD_mean[d,h]<-NA
}
}
#
# We calculate daily mean dry air density (g/m^3)
#
for (d in first_day:last_day) {
count<-0
for (h in 1:24) {
if (is.na(AD_mean[d,h])==T) count<-count+1
}
if (count<=mis) ad[add+d]<-mean(AD_mean[d,],na.rm = T) else ad[add+d]<-NA
}
#
# We set as NA the days without measures, building arrays of 31 days
#
for (d in 1:31) if (d<first_day | d>last_day) {
t[add+d]<-NA
rh[add+d]<-NA
p[add+d]<-NA
svp[add+d]<-NA
pvp[add+d]<-NA
ah[add+d]<-NA
dad[add+d]<-NA
ad[add+d]<-NA
}
#
# We plot mean temperature, relative humidity, pressure, saturation vapor pressure
#
plot (c((add+1):(add+31)),t[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "temperature (°C)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),rh[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "relative humidity (%)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),p[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "atmospheric pressure (hPa)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),svp[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "saturation vapor pressure (hPa)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),pvp[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "partial vapor pressure (hPa)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),ah[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "absolute humidity (g/m^3)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),dad[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "dry air density (g/m^3)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
plot (c((add+1):(add+31)),ad[(add+1):(add+31)], type = "b", main =
month_name[index], xlab = "day", ylab = "air density (g/m^3)", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
#
}
#
#-------------------------------------------------------------------------------
# We build a data frame that contains all the data of this study
#-------------------------------------------------------------------------------
#
attach(Score)
mydata<-data.frame(year = Year, month = Month, day = Day,
temperature = t, relative_hum = rh, pres = p, sat_vap_p = svp,
par_vap_p = pvp, absolute_hum = ah, dry_air_d = dad, air_d = ad,
score = ms)
attach(mydata)
#
#-------------------------------------------------------------------------------
# Spearman's correlation ranks
#-------------------------------------------------------------------------------
#
correlations<-list()
correlations[[1]]<-cor.test(score, temperature, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[2]]<-cor.test(score, relative_hum, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[3]]<-cor.test(score, pres, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[4]]<-cor.test(score, sat_vap_p, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[5]]<-cor.test(score, par_vap_p, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[6]]<-cor.test(score, absolute_hum, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[7]]<-cor.test(score, dry_air_d, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[8]]<-cor.test(score, air_d, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[9]]<-cor.test(temperature, air_d, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
correlations[[10]]<-cor.test(temperature, dry_air_d, alternative = "two.sided", method = "spearman",
exact = T, conf.level = 0.95, continuity = T)
#
#-------------------------------------------------------------------------------
# We search for the optimal environmental parameters
#-------------------------------------------------------------------------------
#
low_score<-subset.data.frame(mydata, score<5)
mean_score<-subset.data.frame(mydata, score==5)
high_score<-subset.data.frame(mydata, score>5)
#
# Hypotheses testing
#
# temperature
wilcox.test(low_score[,4], high_score[,4])
# relative humidity
wilcox.test(low_score[,5], high_score[,5])
# atm. pressure
wilcox.test(low_score[,6], high_score[,6])
# saturation vapor pressure
wilcox.test(low_score[,7], high_score[,7])
# partial vapor pressure
wilcox.test(low_score[,8], high_score[,8])
# absolute humidity
wilcox.test(low_score[,9], high_score[,9])
# dry air density
wilcox.test(low_score[,10], high_score[,10])
# air density
wilcox.test(low_score[,11], high_score[,11])
#
# Plotting
#
labels = c("temperature (°C)", "relative humidity (%)", "atm. pressure (hPa)",
"saturation vapor pressure (hPa)", "partial vapor pressure (hPa)",
"absolute humidity (g/m^3)", "dry air density (g/m^3)", "air density (g/m^3)")
for (i in 4:11) {
boxplot(low_score[,i], mean_score[,i], high_score[,i], ylab = labels[i-3], names =
c("score < 5", "score = 5", "score > 5") )
points(1:3, c(mean(low_score[,i], na.rm = T), mean(mean_score[,i], na.rm = T),
mean(high_score[,i], na.rm = T)), col = "red", pch = 18)
grid(col="darkgray")
}
#
#-------------------------------------------------------------------------------
# We plot the mean score for each month
#-------------------------------------------------------------------------------
#
for (i in 1:4) {
score_month<-subset.data.frame(mydata, month == i)
plot (c(1:31),score_month[,12], type = "b", main =
month_name[i], xlab = "day", ylab = "score", pch = 19,
lty = 2, col = "red")
grid(col="darkgray")
}
Advertisement

Seasonal pattern in my symptoms, a correlation study

Seasonal pattern in my symptoms, a correlation study

Abstract

I report here on the correlation between my ME/CFS symptoms and seven environmental parameters. I experience a regular improvement in my condition during the warmer months and I show here that the improvements directly correlate with air temperature and inversely correlate with air density and dry air density. No correlation could be found between my symptoms and relative humidity, absolute humidity, atmospheric pressure, and atmospheric particulate with a diameter \leq 2.5 {\mu}m (PM2.5). The correlation between symptoms and air density and dry air density might be driven by the correlation with temperature because density is strongly inversely correlated with the temperature of the air.

Introduction

My ME/CFS improves during summer, in the period of the year that goes from May/June to the end of September (when I am in Rome, Italy). I don’t know why. We know that ME/CFS tends to have an oscillating course in most patients (Chu L. et al. 2019), but the presence of a seasonal pattern in this patient population has not been investigated so far, to my knowledge. And yet, if you ask directly to patients, some of them say that they feel better in summer. Unfortunately, we don’t have scientific data on that, this is an area worth investigating with some carefully done surveys. I collected a functionality score (the higher, the more I function) for 190 days and I collected seven environmental parameters during these same days; I then calculated Spearman’s rank correlation coefficient between the score and each one of these parameters.

Data collection

I registered a functionality score three times a day (value from 3 to 7, the higher the better) for 199 days. I retrieved four environmental parameters for the same days (if available): mean temperature (T), mean atmospheric pressure (P), mean relative humidity (RH), and mean PM 2.5. While T, P, and RH were collected from Wunderground, PM 2.5 was retrieved from aqicn.

Data were collected during three separate periods, in three separate geographical regions: from March 1st to June 9th, 2021 in Arona (Tenerife, Spain); from February 1st to March 7th, 2020 in Rosario (Santa Fe, Argentina); from September 1st to October 31st, 2020 in Rome (Italy). These three periods were chosen for the following reasons: they are periods in which there is no big difference between the parameters of the air indoors and outdoors; they are periods in which my symptoms had their seasonal switch (I regularly improve during summer, from June to September, when I live in Rome). The closest weather station was used for each period, and when data were missing from one weather station, another one nearby was selected.

Calculation of derived parameters

The PM 2.5 value (particulate matter with a diameter equal to or below 2.5\mu m) was converted from AQI (air quality index) to \mu/m^{3} by solving the following equation with respect to C:

1)  AQI\,=\,\frac{I_{high}\,-\,I_{low}}{C_{high}\,-\,C_{low}}(C\,-\,C_{low})\,+\,I_{low}

Where the intervals I_{low}\,-\,I_{high} and C_{low}\,-\,C_{high} are the ones collected in Table 1 and taken from (Mintz D, 2018). So for example, if we have a value of PM 2.5 of 55 AQI, we select I_{low}\,=\,51\,I_{high}\,=100\,C_{low}\,=12.1\,C_{high}\,=\,35.4 and then we solve Eq. 1 with respect to C to get the corresponding value of PM 2.5 in (\mu/m^{3}).

C_{low}\,-\,C_{high}I_{low}\,-\,I_{high}
0.0–12.0 0–50
12.1–35.4 51–100
35.5–55.4101–150
55.5–150.4 151–200
150.5–250.4 201–300
250.5–350.4301–400
350.5–500.4 401–500
Table 1. Parameters to be used for solving Eq. 1 with respect to C.

Relative humidity is defined as follows

RH\;=\;100\frac{e'}{e'_w}

where {e'} is the partial pressure of water vapor in the air and e'_w is the saturation vapor pressure with respect to water (the vapor pressure in the air in equilibrium with the surface of the water). We then have

e'_w(p,t)\;=\;(1.0016+\frac{3.15}{10^{6}}p-\frac{0.074}{p})6.112e^{\frac{17.62t}{243.12+t}}

Here and in what follows we express pressure in hPa, with hPa\;=\;10^2 Pa, while t is the temperature in °C and T is the temperature in K, with T\;=\;273.15+t. Absolute humidity is simply given by

{\rho}_v\;=\;\frac{m_v}{V}

where m_v is the mass of vapor and V is the total volume occupied by the mixture (Ref, chapter 4). From the law of perfect gasses and considering that, according to Dalton’s law, in moisture, the partial pressure of each gas is the pressure it would have if it occupied the whole volume (Ref), we can also write

e'100\;=\;\frac{m_v}{\mu_{v}}R\frac{273,15 + t}{V}

with R\,=\,8.31\,\frac{J}{K\cdot mole},\,\mu_{v}\,=\,18.02\,\frac{g}{mole}. Then, by substituting e' and m_v in {\rho}_v, we have

2)  AH\,=\,\rho_v\,=\,\frac{6.112RH\mu_{v}}{R(273.15+t)}(1.0016+\frac{3.15}{10^{6}}p-\frac{0.074}{p})e^{\frac{17.62t}{243.12+t}} 

It is also possible to show that the density of dry air (which is due to all the gasses that constitute the air, but water vapor) is given by

3)  DAD\,=\,{\rho}_{dry}\,=\,100\frac{(p\,-\,e')\mu_{dry}}{R(273.15\,+\,t)}

Where we assume \mu_{dry}\,=\,28.97\,\frac{g}{mole}. These calculations are reviewed in more detail in (Maccallini P, 2021). We can then find AD:

4)  AD\,=\,{\rho}_{dry}+{\rho}_{v}

Statistical analysis

The correlation between each environmental parameter and the mean daily score of functionality was measured by the Spearman’s rank correlation coefficient, calculated as:

5)  \rho_{S,Y}\,=\,\frac{Cov(R(S)R(Y)}{\sqrt{Var(R(S))Var(R(Y))}}

where S is the score of functionality and R(S) is its rank, Y is one of the environmental parameters and R(Y) is its rank. The statistical significance for the correlation parameters was calculated by assuming that the null distribution of Spearman’s coefficient is approximated by a symmetric law of the second type, therefore it is possible to show that

6)  t\,=\,\rho_{S,Y}\sqrt{\frac{n-2}{1-\rho_{S,Y}^2}}  

follows a t-student distribution with n – 2 degrees of freedom, where n is the number of measures available for S, Y (the number of days for which we have both S and Y) (Kendall MG, “The Advanced Theory of Statistics”, vol 2, par 31.19), (Maccallini P. 2021). The correction for multiple comparisons was performed according to the Benjamini-Hochberg test. All the calculations and plotting were performed by a custom Octave script (see at the bottom of the article). R was used for the correction for multiple comparisons, by using the following commands:

pval<-c(0.000357, 0.5356, 0.5151, 0.238, 0.00201, 0.00536, 0.1428)
p.adjust(pval, method = "hochberg") 
Parameter
(daily mean)
\rhotpp’
T (°C)0.25633.6373.57\cdot 10^{-4} 0.0025
P (mbar)-0.04522-0.62060.53560.54
RH (%)-0.04752-0.65220.51510.54
AH (g/m^{3})0.085951.1830.2380.54
AD (g/m^{3})-0.223-3.1332.01\cdot 10^{-3} 0.012
DAD (g/m^{3}) -0.2013-2.8175.36\cdot 10^{-3} 0.027
PM2.5 (\mu/m^{3}) -0.1067-1.4720.14280.54

Table 2. Spearman correlation between my daily functionality score and several atmospheric parameters. T: temperature; P: pressure; RH: relative humidity; AH: absolute humidity; AD: air density; DAD: dry air density. P values after correction for multiple comparisons (Benjamini-Hochberg) are collected in the last column.

Results

The Spearman’s correlation coefficients between my score of functionality and each one of the environmental parameters considered are reported in Table 2. The only parameters to show a statistically significant correlation with my symptoms are temperature (T), air density (AD), and dry air density (DAD). While temperature is positively correlated with my score (the higher the temperature, the better I feel), air density and dry air density are inversely correlated (the less the density, the better I feel). The statistical significance of these three correlations survives after correction for multiple comparisons. No correlation was observed between my symptoms and atmospheric pressure, relative humidity (RH), absolute humidity (AH), and the concentration of atmospheric particulate with diameter \leq 2.5 {\mu}m (PM2.5). The distribution of good days (green) and bad days (red) on the plane AH-T and AD-T is reported in Figures 1, and 2. In Figures 3 and 4, it is caught an improvement and its relationship with temperature and air density, respectively. In Figures 5 and 6 a worsening of symptoms can be seen while temperature decreases and air density increases.

Figure 1. The mean daily temperature is on the x-axis while absolute humidity is on the y-axis. Points at the same relative humidity are indicated by the tick black curves. Green dots indicate days with a functionality score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5.

Figure 2. The mean daily temperature is on the x-axis while air density is on the y-axis. Green dots indicate days with a functionality score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5.

Figure 3. Improvement with increasing temperature. Green dots indicate days with a functionality score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5. Vertical dashed lines indicate the end-beginning of months.

Figure 4. Improvement with decreasing air density. Green dots indicate days with a well-being score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5. Vertical dashed lines indicate the end-beginning of months.

Figure 5. Worsening of symptoms with decreasing temperature. Green dots indicate days with a functionality score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5. Vertical dashed lines indicate the end-beginning of months.

Figure 6. Worsening of symptoms with increasing air density. Green dots indicate days with a functionality score above 5; red dots indicate days with a score below 5; blue dots indicate days with a score of 5. Vertical dashed lines indicate the end-beginning of months.

Discussion

The direct correlation between my degree of functionality and air temperature is the strongest (even if the correlation is low) and the most significant. Since air density and dry air density have a strong inverse dependence on temperature (see Eq. 3, 4), it seems likely that the correlation between my symptoms and AD and DAD is driven by the correlation with temperature. A more accurate statistical analysis (multinomial regression) might elucidate this point. Yet, further experiments (not shown here) have shown that increasing air temperature is not able to induce, by itself, an improvement in my symptoms, so this correlation might very well not indicate causation. On the other hand, a more complex experimental setting has allowed inducing, during winter, the improvement of symptoms that I would normally experience during summer. So I am currently designing better experiments in order to define the combination of parameters that are responsible for the seasonal pattern of my disease. The observation that the significant correlations here reported have very low coefficients is perhaps due to the fact that my score of functionality is particularly flat, and does not depict accurately the range of variability of my symptoms.

Notes

We calculated the daily mean absolute humidity by substituting in Eq. 2 the daily mean values for temperature, relative humidity, and pressure. In general, given a function x = x(t) and a function y = y(x), it is not true that the mean over t of y is given by y(mean of x over t), as it is easy to verify. But in our case, there is no big difference between the daily mean absolute humidity calculated by using Eq. 2 with daily mean values of T, P, and RH and the daily mean calculated from data taken every half an hour, as I verified for Rome, 2019 (see Figure 7). The same is true for air density (Figure 8). For further details see (Maccallini P. 2021)

Figure 7. Absolute humidity as a function of the day of the year for the year 2019, for the city of Rome (2019). The black line is the mean daily value calculated from measures of T, P, and RH taken every half an hour. The blue one is the value calculated from the mean daily values of T, P, and RH.

Figure 8. Air density as a function of the day of the year for the year 2019, for the city of Rome (2019). Black line: the daily mean of the values calculated from the actual measures for T, P, and RH. Blue line: the value calculated by using the daily means for T, P, and RH.

Supplementary material

The environmental measures and the symptom scores used for this study are available for download:

The script

Note: this script is pretty long for the following reasons: Octave is less concise than R in IO-related instructions; it plots and calculates more output than what is shown above; I calculate the p values without using a built-in function for that (I tend to avoid built-in functions, but I am now changing my mind).

% file name = Correlation
% date of creation = 08/11/2021
%-------------------------------------------------------------------------------
% It uses file correlation.xlsx
%-------------------------------------------------------------------------------
clear all
close all
pkg load io
missing = 999999999; % missing measure
%-------------------------------------------------------------------------------
% Physical parameters
%-------------------------------------------------------------------------------
R = 8.31; % constant for ideal gasses (J/K*mole)
mu_v = 18.01528; % molecular weight of water (g/mole)
mu_d = 28.97; % molecular weight of dry air (g/mole)
mu_N2 = 28.013; % molecular weight of N2 (g/mole)
mu_O2 = 31.999; % molecular weight of O2 (g/mole)
mu_CO2 = 44.010; % molecular weight of CO2 (g/mole)
mf_N2 = 0.78084; % molar fraction of N2
mf_O2 = 0.20946 % molar fraction of O2
mf_CO2 = 0.00039445 % molar fraction of CO2
%-------------------------------------------------------------------------------
% We read the name of the sheet and its number of lines directly from the .xlsx file
%-------------------------------------------------------------------------------
file_name = 'Correlation.xlsx';
[filetype, sh_names, fformat, nmranges] = xlsfinfo (file_name);
sheet = char(sh_names(1,1)); % name of the first sheet
n_str = substr(char(sh_names(1,2)),5) % number of lines as string
n_num = str2num(n_str) % number of lines as number
%-------------------------------------------------------------------------------
% We read column F (mean daily temperature, Celsius)
%-------------------------------------------------------------------------------
range = strcat('F2:F',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
t = num;
%-------------------------------------------------------------------------------
% We read column I (mean daily relative humidity, percentage)
%-------------------------------------------------------------------------------
range = strcat('I2:I',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
RH = num;
%-------------------------------------------------------------------------------
% We read column K (max daily pressure, mbar)
%-------------------------------------------------------------------------------
range = strcat('K2:K',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
pmax = num;
%-------------------------------------------------------------------------------
% We read column L (min daily pressure, mbar)
%-------------------------------------------------------------------------------
range = strcat('L2:L',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
pmin = num;
%-------------------------------------------------------------------------------
% We read column M, N, O (score 1, 2, 3 respectively)
%-------------------------------------------------------------------------------
range = strcat('M2:M',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
score1 = num;
range = strcat('N2:N',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
score2 = num;
range = strcat('O2:O',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
score3 = num;
%-------------------------------------------------------------------------------
% We read column P (min daily PM 2.5, AQI)
%-------------------------------------------------------------------------------
range = strcat('P2:P',n_str)
[num,tex,raw,limits] = xlsread (file_name, sheet, range);
pm25 = num;
%-------------------------------------------------------------------------------
% We calculate mean daily pressure, absolute humidity, air density
%-------------------------------------------------------------------------------
for i = 1:n_num-1
if ( t(i)!= missing )
p(i) = ( pmin(i) + pmax(i) )/2;
score(i) = ( score1(i) + score2(i) + score3(i) )/3;
AH(i) = 1000*(mu_v/1000)*(RH(i)/(R*(273.15+t(i))))*(1.0016+3.15*10^(-6)*p(i)-0.074/p(i))*6.112*e^(17.62*t(i)/(243.12+t(i)));
ep (i) = (RH(i)/100)*(1.0016+3.15*10^(-6)*p(i)-0.074/p(i))*6.112*e^(17.62*t(i)/(243.12+t(i)));
AD_dry (i) = 1000*100*(p(i) - ep(i))*(mu_d/1000)/(R*(273.15+t(i)));
AD (i) = AH(i) + AD_dry (i);
N2D(i) = 1000*100*(p(i) - ep(i))*mf_N2*(mu_N2/1000)/(R*(273.15+t(i)));
O2D(i) = 1000*100*(p(i) - ep(i))*mf_O2*(mu_O2/1000)/(R*(273.15+t(i)));
CO2D(i) = 1000*100*(p(i) - ep(i))*mf_CO2*(mu_CO2/1000)/(R*(273.15+t(i)));
else
p(i) = missing;
score(i) = missing;
AH(i) = missing;
AD_dry(i) = missing;
AD (i) = missing;
N2D(i) = missing;
O2D(i) = missing;
CO2D(i) = missing;
endif
endfor
%-------------------------------------------------------------------------------
% We put all these arrays into a single array: this will help us to write a more
% compact script
%-------------------------------------------------------------------------------
k = 1;
for i = 1:n_num-1
if ( t(i)!= missing )
array(1,k) = t(i);
array(2,k) = p(i);
array(3,k) = pmin(i);
array(4,k) = pmax(i);
array(5,k) = RH(i);
array(6,k) = AH(i);
array(7,k) = AD(i);
array(8,k) = N2D(i);
array(9,k) = O2D(i);
array(10,k) = CO2D(i);
array(11,k) = score(i);
k = k+1;
endif
endfor
dim = size(array);
n = k - 1;
k = 1;
for i = 1:n_num-1
if ( pm25(i)!= missing )
array2(1,k) = score(i);
array2(2,k) = pm25(i);
k = k+1;
endif
endfor
m = k - 1;
%clear p score AD N2D O2D CO2D pm25 % we free some memory
%-------------------------------------------------------------------------------
% We calculate mean daily PM 2.5 density (microg/m^3) from daily PM 2.5 AQI
%-------------------------------------------------------------------------------
CLH = [0, 12, 12.1, 35.4, 35.5, 55.4, 55.5, 150.4, 150.5, 250.4, 250.5, 350.4, 350.5, 500.4];
ILH = [0, 50, 51, 100, 101, 150, 151, 200, 201, 300, 301, 400, 401, 500];
for k=1:m % k defines the day
for i=1:5005 % i defines the mass
mass = (i-1)/10;
for j=1:13 % j defines the index
if ( and(mass>=CLH(j),mass<=CLH(j+1) ) )
Itemp = ( ( ( ILH(j+1) - ILH(j) )/( CLH(j+1) - CLH(j) ) )*( mass - CLH(j) ) ) + ILH(j);
endif
endfor
if (abs(Itemp - array2(2,k))<=1)
array2(3,k) = mass;
endif
endfor
endfor
%-------------------------------------------------------------------------------
% We Compute Spearman’s rank correlation coefficient rho between each mean daily
% athmosferic parameter and the mean daily score for the same day. In order to do
% that, we need to create arrays without the days with missing measures. tvalue
% has a student's t-distribution with n-2 degrees of freedom (see par 31.19 of Kendall Vol. 2)
% We Compute also the coefficients a and b of the regression line
%-------------------------------------------------------------------------------
x = array(dim(1),1:n);
for h = 1:dim(1)-1
y = array(h,1:n);
rho(h) = spearman (x, y);
tvalue(h) = rho(h)*sqrt( (n - 2)/(1 - rho(h)^2) );
a(h) = cov(x,y)/var(x);
b(h) = mean(y) - a(h)*mean(x);
endfor
clear x
x = array2(1,1:m);
for h = dim(1):dim(1)+1
y = array2(h-dim(1)+2,1:m);
rho(h) = spearman (x, y);
tvalue(h) = rho(h)*sqrt( (n - 2)/(1 - rho(h)^2) );
a(h) = cov(x,y)/var(x);
b(h) = mean(y) - a(h)*mean(x);
endfor
%-------------------------------------------------------------------------------
% We Compute the p value (two-tailed) related to each rho
%-------------------------------------------------------------------------------
ni = n - 2; % degrees of freedom of t-distribution
for h = 1:dim(1)+1
s(h) = ni/(tvalue(h)^2 + ni); % betainc computes B(x,a,b)/B(a,b)
pval(h) = 2*( 0.5*betainc(s(h), ni/2, 1/2) ); % so it is in fact what is usually referred to as
endfor % regularized incomplete beta function!
%-------------------------------------------------------------------------------
% We plot the physical parameters against the mean score
%-------------------------------------------------------------------------------
label_y = cell(dim(1)+1);
label_y(1) = 'daily mean temperature (°C)';
label_y(2) = 'daily mean pressure (mbar)';
label_y(3) = 'daily min pressure (mbar)';
label_y(4) = 'daily max pressure (mbar)';
label_y(5) = 'daily mean relative humidity (%)';
label_y(6) = 'daily mean absolute humidity (g/m^{3})';
label_y(7) = 'daily mean air density (g/m^{3})';
label_y(8) = 'daily mean N_{2} (g/m^{3})';
label_y(9) = 'daily mean O_{2} (g/m^{3})';
label_y(10) = 'daily mean CO_{2} (g/m^{3})';
label_y(11) = 'daily mean PM 2.5 (AQI)';
label_y(12) = 'daily mean PM 2.5 ({\mu}g/m^{3})';
for h = 1:dim(1)-1
figure(h)
for i=1:n
plot (array(dim(1),i), array(h,i), 'o','markersize',5, 'markerfacecolor', 'r')
hold on
endfor
xlabel ('daily mean symptom score', 'fontsize',15)
ylabel (label_y(h), 'fontsize',15)
rhostr = num2str(rho(h));
pvalstr = num2str(pval(h));
tvaluestr = num2str(tvalue(h));
titlestr = strcat('Spearman correlation {\rho} = ', rhostr,', p = ', pvalstr, ', t = ', tvaluestr, ', delay = 0');
title (titlestr,'fontsize',15)
grid on
val2 = max(array(h,:));
val1 = min(array(h,:));
axis ([3.5,6.5,val1,val2])
plot ([3.5,6.5], [a(h)*3.5 + b(h), a(h)*6.5 + b(h)], '--b', 'linewidth', 2)
endfor
for h = dim(1):dim(1)+1
figure(h)
for i=1:m
plot (array2(1,i), array2(h-dim(1)+2,i), 'o','markersize',5, 'markerfacecolor', 'r')
hold on
endfor
xlabel ('daily mean symptom score', 'fontsize',15)
ylabel (label_y(h), 'fontsize',15)
rhostr = num2str(rho(h));
pvalstr = num2str(pval(h));
tvaluestr = num2str(tvalue(h));
titlestr = strcat('Spearman correlation {\rho} = ', rhostr,', p = ', pvalstr, ', t = ', tvaluestr, ', delay = 0');
title (titlestr,'fontsize',15)
grid on
val2 = max(array2(h-dim(1)+2,:));
val1 = min(array2(h-dim(1)+2,:));
axis ([3.5,6.5,val1,val2])
plot ([3.5,6.5], [a(h)*3.5 + b(h), a(h)*6.5 + b(h)], '--b', 'linewidth', 2)
endfor
%-------------------------------------------------------------------------------
% We plot the chart temp-absolute humidity, with iso-relative humidity curves at
% an athmospheric pressure of 1,013.25 mbar
%-------------------------------------------------------------------------------
% molecular weight of water
mu_v = 18.01528; % g/mol
% array of relative humidity (%)
RH2(1) = 10;
for i=2:20
RH2(i) = RH2(i-1)+5;
endfor
% array of temperature (celsius)
tmin= 10;
tmax= 30;
for j=tmin:tmax
t2(j) = j-1;
endfor
% array of pressure (hPa=mbar)
p_atm = 1013.25;
% array of absolute humidity (grams/m^3)----------------------------------------
for k=1:7
for i=1:20
for j=tmin:tmax
AH2(k,i,j) = 1000*(mu_v/1000)*(RH2(i)/(R*(273.15+t2(j))))*(1.0016+3.15*10^(-6)*p_atm-0.074/p_atm)*6.112*e^(17.62*t2(j)/(243.12+t2(j)));
endfor
endfor
endfor
f = h;
f = f + 1;
figure(f)
for i=1:2:20
plot (t2(tmin:tmax), AH2(k,i,tmin:tmax),'-k','LineWidth',2)
label = num2str(RH2(i));
label = strjoin ({label,'%'}, ' ');
text (tmax+0.5, AH2(k,i,tmax)+0.5,label,'fontsize',16)
hold on
endfor
axis ([tmin,tmax],"tic[xyz]", "label[xyz]")
xticks (t2(tmin:tmax))
yticks ([1:1:50])
xlabel ('temperature (°C)')
ylabel ('absolute humidity (g/{m^3})')
grid on
pressure_str = num2str(p_atm);
title_str = strcat('air pressure = ',pressure_str," mbar");
title (title_str)
%-------------------------------------------------------------------------------
% On the chart we have just plotted, we plot a red dot for a score of 4, a blue dot
% for a score of 5, a green dot for a score of 6. We consider score1, score2, score3
% instead of the mean scores. Scores are plotted assuming as coordinates temperature
% and absolute humidity of the day they refer to.
%-------------------------------------------------------------------------------
for i=1:n_num-1
if ( t(i)!= missing )
if (score1(i) == 4)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
endif
if (score1(i) == 5)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
endif
if (score1(i) == 6)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
endif
if (score2(i) == 4)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
endif
if (score2(i) == 5)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
endif
if (score2(i) == 6)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
endif
if (score3(i) == 4)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
endif
if (score3(i) == 5)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
endif
if (score3(i) == 6)
plot (t(i),AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
endif
endif
endfor
%-------------------------------------------------------------------------------
% On the plane temp-AD, we plot a red dot for a score of 4, a blue dot
% for a score of 5, a green dot for a score of 6. We consider score1, score2, score3
% instead of the mean scores. Scores are plotted assuming as coordinates temperature
% and absolute humidity of the day they refer to.
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=1:n_num-1
if ( t(i)!= missing )
if (score1(i) == 4)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
if (score1(i) == 5)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
endif
if (score1(i) == 6)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
endif
if (score2(i) == 4)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
if (score2(i) == 5)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
endif
if (score2(i) == 6)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
endif
if (score3(i) == 4)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
if (score3(i) == 5)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
endif
if (score3(i) == 6)
plot (t(i),AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
endif
endif
endfor
axis ([tmin,tmax])
xlabel (label_y(1), 'fontsize',15)
ylabel (label_y(7), 'fontsize',15)
grid on
grid minor
%-------------------------------------------------------------------------------
% We plot the environmental data for the period March 1st to June 9th 2021
% (Canary Island)
%-------------------------------------------------------------------------------
% temperature
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=1:102-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,t(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,t(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,t(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([31,31],[min(array(1,:)),max(array(1,:))],'--k','linewidth',1)
text(31-20,max(array(1,:))-2,'March', 'fontsize',15)
hold on
plot([62,62],[min(array(1,:)),max(array(1,:))],'--k','linewidth',1)
text(62-20,max(array(1,:))-2,'April', 'fontsize',15)
hold on
plot([93,93],[min(array(1,:)),max(array(1,:))],'--k','linewidth',1)
text(93-20,max(array(1,:))-2,'May', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(1), 'fontsize',15)
title ('Canary Islands, Mar-Jun 2021; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([1,101,min(array(1,:)),max(array(1,:))])
%-------------------------------------------------------------------------------
% absolute humidity
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=1:102-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([31,31],[min(array(6,:)),max(array(6,:))],'--k','linewidth',1)
text(31-20,max(array(6,:))-2,'March', 'fontsize',15)
hold on
plot([62,62],[min(array(6,:)),max(array(6,:))],'--k','linewidth',1)
text(62-20,max(array(6,:))-2,'April', 'fontsize',15)
hold on
plot([93,93],[min(array(6,:)),max(array(6,:))],'--k','linewidth',1)
text(93-20,max(array(6,:))-2,'May', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(6), 'fontsize',15)
title ('Canary Islands, Mar-Jun 2021; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([1,101,min(array(6,:)),max(array(6,:))])
%-------------------------------------------------------------------------------
% air density
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=1:102-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([31,31],[min(array(7,:)),max(array(7,:))],'--k','linewidth',1)
text(31-20,max(array(7,:))-2,'March', 'fontsize',15)
hold on
plot([62,62],[min(array(7,:)),max(array(7,:))],'--k','linewidth',1)
text(62-20,max(array(7,:))-2,'April', 'fontsize',15)
hold on
plot([93,93],[min(array(7,:)),max(array(7,:))],'--k','linewidth',1)
text(93-20,max(array(7,:))-2,'May', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(7), 'fontsize',15)
title ('Canary Islands, Mar-Jun 2021; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([1,101,min(array(7,:)),max(array(7,:))])
%-------------------------------------------------------------------------------
% We plot the environmental data for the period February to March 2020
% (Rosario, santa Fe)
%-------------------------------------------------------------------------------
% temperature
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=103-1:138-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,t(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,t(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,t(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([131,131],[min(array(1,:)),max(array(1,:))],'--k','linewidth',1)
text(131-20,max(array(1,:))-2,'Feb', 'fontsize',15)
hold on
text(131+10,max(array(1,:))-2,'March', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(1), 'fontsize',15)
title ('Rosario, Feb-Mar 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([103-1,138-1,min(array(1,:)),max(array(1,:))])
%-------------------------------------------------------------------------------
% absolute humidity
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=103-1:138-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([131,131],[min(array(6,:)),max(array(6,:))],'--k','linewidth',1)
text(131-20,max(array(6,:))-2,'Feb', 'fontsize',15)
hold on
text(131+10,max(array(6,:))-2,'March', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(6), 'fontsize',15)
title ('Rosario, Feb-Mar 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([103-1,138-1,min(array(6,:)),max(array(6,:))])
%-------------------------------------------------------------------------------
% air density
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=103-1:138-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([131,131],[min(array(7,:)),max(array(7,:))],'--k','linewidth',1)
text(131-20,max(array(7,:))-2,'Feb', 'fontsize',15)
hold on
text(131+10,max(array(7,:))-2,'March', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(7), 'fontsize',15)
title ('Rosario, Feb-Mar 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([103-1,138-1,min(array(7,:)),max(array(7,:))])
%-------------------------------------------------------------------------------
% We plot the environmental data for the period Sept to Oct 2020
% (Rome)
%-------------------------------------------------------------------------------
% temperature
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=139-1:199-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,t(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,t(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,t(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([168,168],[min(array(1,:)),max(array(1,:))],'--k','linewidth',1)
text(138+10,max(array(1,:))-2,'Sep', 'fontsize',15)
hold on
text(168+5,max(array(1,:))-2,'Oct', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(1), 'fontsize',15)
title ('Rome, Sep-Oct 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([139-1,199-1,min(array(1,:)),max(array(1,:))])
%-------------------------------------------------------------------------------
% absolute humidity
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=139-1:199-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AH(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AH(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AH(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([168,168],[min(array(6,:)),max(array(6,:))],'--k','linewidth',1)
text(138+10,max(array(6,:))-2,'Sep', 'fontsize',15)
hold on
text(168+5,max(array(6,:))-2,'Oct', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(6), 'fontsize',15)
title ('Rome, Sep-Oct 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([139-1,199-1,min(array(6,:)),max(array(6,:))])
%-------------------------------------------------------------------------------
% air density
%-------------------------------------------------------------------------------
f = f + 1;
figure(f)
for i=139-1:199-1
if ( t(i)!= missing )
if (score(i) > 5)
plot (i,AD(i),'o','markersize',4,'markerfacecolor','g','markeredgecolor','g')
hold on
elseif (score(i) == 5 )
plot (i,AD(i),'o','markersize',4,'markerfacecolor','b','markeredgecolor','b')
hold on
else
plot (i,AD(i),'o','markersize',4,'markerfacecolor','r','markeredgecolor','r')
hold on
endif
endif
endfor
plot([168,168],[min(array(7,:)),max(array(7,:))],'--k','linewidth',1)
text(138+10,max(array(7,:))-2,'Sep', 'fontsize',15)
hold on
text(168+5,max(array(7,:))-2,'Oct', 'fontsize',15)
grid on
grid minor
xlabel ('day', 'fontsize',15)
ylabel (label_y(7), 'fontsize',15)
title ('Rome, Sep-Oct 2020; green: mean score > 5; blue: mean score = 5; red: mean score < 5','fontsize',15)
axis([139-1,199-1,min(array(7,:)),max(array(7,:))])

Update on the UK Biobank: analysis on variants with a minor allele frequency between 0.01 and 0.05

Update on the UK Biobank: analysis on variants with a minor allele frequency between 0.01 and 0.05

Abstract

I report on the association between variants with a minor allele frequency included between 1% and 5% and the phenotypes: self-reported Chronic Fatigue Syndrome and chronic asthenia not otherwise specified (ICD10: R53). The analysis was performed on a metadata file generated from the UK Biobank database. Patients with self-reported CFS (both sexes) showed a significant increase (p = 2.26\cdot10^{-9}) in the frequency of the alternative allele of variant rs148723539, 10:131677233:G:A on h19, an SNP of the intronic region of gene EBF3. Female patients with R53 showed a significant association (p = 5.40\cdot10^{-9}) with variant rs182581502, 7:56492485:G:T on h19, which belongs to the intronic region of pseudogene RP13-492C18.2 (gene_id: ENSG00000237268.2). This analysis completes a previous one (here) performed only on common variants (minor allele frequency above 5%).

Introduction

In a previous blog post (here), I reported on the association between self-reported Chronic Fatigue Syndrome and a region of chromosome 13, in particular variant rs11147812. This result was obtained by performing a set of analyses on a metadata file generated by Neale’s Lab from the UK Biobank dataset. The analysis included about 14 million variants, 1659 patients (451 males), and 300k controls. I focused on common variants (minor allele frequency above 0.05) and I used a cut-off for statistical significance of 5.0\cdot10^{-8}. I performed a similar analysis on chronic asthenia not otherwise specified (ICD10: R53) and I found an association between male patients and variant rs62092652.

Here, I report on the results of the same set of analyses for variants with a minor allele frequency below 0.05 and above 0.01. I used a more stringent cut-off for statistical significance of 1.0\cdot10^{-8} (read this blog post for an introduction to the p value in GWAS).

Methods

I considered again the phenotypes: self-reported CFS and R53. I filtered out variants too far from the Hardy-Weinberg equilibrium (in the total UK Biobank sample) by using a cut-off of 1.0\cdot10^{-6} when testing the null hypothesis: the alleles are in HWE. I filtered out variants with expected minor allele count below 25, in cases.

Results

When considering self-reported CFS, the only statistically significant association was between variant rs148723539 (10:131677233:G:A on h19) and the group of patients that included both sexes (Table 1). In particular, the association is between the alternative allele and patients (in other words, patients have a higher frequency of the alternative allele A than controls). This is an intronic variant of gene EBF3. No association was found after stratification by sex.

variantconsmin AFp valgene
rs148723539intron
variant
0.01012.26e-09EBF3
Table 1. Self-reported Chronic Fatigue Syndrome, both sexes (n = 1659 patients).

When considering chronic asthenia not otherwise specified (ICD10: R53), the analysis for the whole sample (both sexes) gave no associations. When considering females only, variant rs182581502 (7:56492485:G:T on h19) reached statistical significance with the alternative allele more frequent in patients than in controls (Table 2). This is an intronic variant of pseudogene RP13-492C18.2 (gene_id: ENSG00000237268.2). No associations were found for males only.

rsidconsmin AFp valgene
rs182581502intron
variant
0.02865.40e-09RP13-492C18.2
Table 2. Chronic asthenia not otherwise specified (ICD10: R53), females only (n = 456).

Variant rs182581502 is associated with changes in gene expression of gene CHCHD2 in several tissues (see this page). The alternative allele (the one more frequent in female R53 patients) is associated with an increase in the expression of CHCHD2 in the hippocampus and basal ganglia, for instance. No effects on gene expression have been found for rs148723539.

A concise summary of the results of my analysis of the UK Biobank database is presented in the following table.

Analysis of a summary statistics generated from the UK Biobank database: new potential risk loci for ME/CFS and not otherwise specified chronic asthenia

Analysis of a summary statistics generated from the UK Biobank database: new potential risk loci for ME/CFS and not otherwise specified chronic asthenia

… the only thing that is arbitrary about the various orders of a set of terms is our attention, for the terms themselves have always all the orders of which they are capable.

Bertrand Russell, Introduction to Mathematical Philosophy

Full-text article

I present here an analysis of the summary statistics elaborated by Neale’s Lab from the raw data released by the UK Biobank, a large-scale biomedical database that includes genetic data on 500,000 UK citizens. I filtered out variants with a minor allele frequency below 0.05, I included only variants that reached a p value below 5⋅10^−7 (even though only p < 5⋅10^−8 was considered for definitive associations), and I filtered out variants with a p value for the Hardy-Weinberg equilibrium below 10^−6.

Females with self-reported Chronic Fatigue Syndrome are associated with a region of chromosome 13 with high linkage disequilibrium, from position 41353297 to 41404706 (on h19) which includes gene SLC25A15. After applying statistical methods of fine-mapping (approximation of the joint model and posterior inclusion probability, PIP), I found rs11147812 (13:41404706:T:C) as the possible causal variant within this region. This same variant is associated with the whole sample of patients (males + females) at 5⋅10^−8 < p < 5⋅10^−7. The relaxed cutoff for statistical significance of 5⋅10^−7 has been recently proposed as a way to increase the rate of true positives without a substantial increase in false positive associations, in studies with a sample of 130,000. Patients presented a reduced frequency of the alternative allele at position 13:41404706, and this is expected to reduce the expression of SLC25A15 in adipose tissue and other tissues. Gene SLC25A15 encodes mitochondrial ornithine transporter I which is involved in the transport of ornithine inside mitochondria.

Males with chronic asthenia not otherwise specified (ICD10: R53) showed an association with a region in high LD of chromosome 18 (18:55452281 to 18:55460845). The only variant from this region with p < 5⋅10^−8 is rs62092652 (18:55454761:G:A) and it is confirmed as the causal variant in this region by PIP calculation. The alternative allele at this position is associated with a reduced expression of gene ATP8B1 in the brain. Since the alternative allele is more frequent in patients than in controls (frequency of 0.3501 and 0.2422, respectively), we may expect to find a reduced expression of ATP8B1 in the brain of patients. ATP8B1 encodes a member of the P-type cation transport ATPase family, which transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to another. These two molecules are both phospholipids found in biological membranes.

Genetic data from the UK Biobank were also used to test previously proposed associations between ME/CFS and variants on TNF: none of the proposed associations was confirmed.

No overlap of the genetic signal from self-reported CFS and R53 was found with the psychiatric traits: bipolar II and depression.

This document also includes an introduction to some of the methods currently employed in the study of GWAS summary statistics. 

Full-text article

Tryptophan metabolites in the cerebrospinal fluid of ME/CFS patients

Tryptophan metabolites in the cerebrospinal fluid of ME/CFS patients

Abstract

I present here a comparison between the levels of tryptophan (Trp), kynurenine (Kyn), serotonin (Ser), Kyn/Trp, and Ser/Trp in cerebrospinal fluid of 44 ME/CFS patients vs 21 sedentary controls. Raw data were retrieved from (Baraniuk JN et al. 2021). Stratification by sex has been included in the analysis. No differences can be found between patients and controls.

Methods

Once downloaded the raw data of (Baraniuk JN et al. 2021), I searched for metabolites of tryptophan metabolism by using the keywords: try, kyn, ser, quinolinic, picolinic, anthranilic, xanthurenic, melatonin. I retrieved the data for tryptophan, kynurenine, and serotonin, and I used them to calculate the ratios kynurenine/tryptophan and serotonin/tryptophan, for each patient. I then organized the data in three csv files (see supplementary material), one including the whole sample, one including females only, and one with males only. For each of the five variables Trp, Kyn, Ser, Kyn/Trp, Ser/Trp a statistical test was performed by the R script displayed at the bottom of this page (supplementary material). For the sample including both sexes, I run both the Student’s t-test and the Wilcoxon test, while for the other two samples only the latter test was performed. I removed a female patient from the data because of her unusually high level of serotonin (0.202), but this does not change the conclusion of this analysis.

ME/CFS (44)Sedentary control (21)t-test
p value
Wilcoxon
p value
Trp2.55 (0.652)2.42 (0.688)0.470.60
Kyn0.0234 (0.0271)0.0154 (0.0200)0.240.27
Ser0.0110 (0.00597)0.00805 (0.00356)0.0410.028
Kyn/Trp0.00936 (0.0113)0.00580 (0.00789)0.2000.27
Ser/Trp0.00448 (0.00227)0.00349 (0.00188)0.0870.037
Table 1. Comparison between the levels of tryptophan (Trp), kynurenine (Kyn), serotonin (Ser), Kyn/Trp, and Ser/Trp in cerebrospinal fluid of 44 ME/CFS patients and 21 controls. Both the two-tailed Student’s t-test and the Wilcoxon test were employed. No correction for multiple comparisons is applied. Levels are reported as: mean (standard deviation of the sample).

Results

Serotonin and Ser/Trp are significantly elevated in ME/CFS patients vs sedentary controls (p = 0.028 and 0.037, respectively), Table 1, Figure 1. But once we apply a correction for three independent variables, these differences are no more statistically significant. When we stratified by sex, none of the comparisons led to a statistically significant difference between patients and controls (Figure 2, Figure 3). The conclusion is that no difference can be found in the level of Trp, Kyn, Ser, Kyn/Trp, and Ser/Trp in cerebrospinal fluid of ME/CFS patients when compared to sedentary controls. One female patient presented a very elevated serotonin level (she was removed from the analysis), another female patient showed elevated kynurenine, one male with ME/CFS had elevated tryptophan, and another male patient showed elevated kynurenine. The larger number of patients compared to controls might explain the presence, in patients, of some extreme values of the parameters considered, when compared to controls.

Figure 1. Comparison between the levels of tryptophan (Trp), kynurenine (Kyn), serotonin (Ser), Kyn/Trp, and Ser/Trp in cerebrospinal fluid of 44 ME/CFS patients and 21 controls. For each variable, the median, the 1st, and the 3rd quartile are indicated.
Figure 2. Comparison between the levels of tryptophan (Trp), kynurenine (Kyn), serotonin (Ser), Kyn/Trp, and Ser/Trp in cerebrospinal fluid of 35 female ME/CFS patients and 11 female controls. For each variable, the median, the 1st, and the 3rd quartile are indicated. For each comparison, the p value from the Wilcoxon test is displayed.
Figure 3. Comparison between the levels of tryptophan (Trp), kynurenine (Kyn), serotonin (Ser), Kyn/Trp, and Ser/Trp in cerebrospinal fluid of 9 male ME/CFS patients and 10 male controls. For each variable, the median, the 1st, and the 3rd quartile are indicated. For each comparison, the p value from the Wilcoxon test is displayed.

Supplementary material

The following R script calculates Table 1 and plots Figures 1, 2, and 3. It reads the three csv files below (click to download), one with both sexes lumped together, one with males, and one with females, respectively.

Trp_met.csv

Trp_met_females.csv

Trp_met_males.csv

# file name: tryptophan_wilcox_both_sexes
#
samples<-read.csv("Trp_met.csv", sep=";", header = TRUE) # we read the data
attach(samples) # this allows us to refer to the labels as variables 
#
# we define the number of patients and controls
#
ncfs<-length(samples[Group=="cfs0",1])
nsc<-length(samples[Group=="sc0",1])
#
# we store the measures of each metabolite for patients
#
kyn_cfs<-Kyn[1:ncfs]
trp_cfs<-Trp[1:ncfs]
ser_cfs<-Ser[1:ncfs]
#
# we store the measures of each metabolite for controls
#
a<-ncfs+1
kyn_sc<-Kyn[a:length(Kyn)]
trp_sc<-Trp[a:length(Kyn)]
ser_sc<-Ser[a:length(Kyn)]
#
# we calculate the ratios kyn/trp and ser/trp for patients
#
kyn_trp_cfs<-kyn_cfs/trp_cfs
ser_trp_cfs<-ser_cfs/trp_cfs
#
# we calculate the ratios kyn/trp and ser/trp for controls
#
kyn_trp_sc<-kyn_sc/trp_sc
ser_trp_sc<-ser_sc/trp_sc
#
# we plot the data for each metabolite: box-plots with points
#
plot(factor(Group),Kyn,xlab="groups",ylab="Kynurenine")
points(factor(Group),Kyn,pch=21,bg="red")
plot(factor(Group),Trp,xlab="groups",ylab="Tryptophan")
points(factor(Group),Trp,pch=21,bg="red")
plot(factor(Group),Ser,xlab="groups",ylab="Serotonin")
points(factor(Group),Ser,pch=21,bg="red")
#
# we plot the data for the two ratios
#
plot(factor(Group),c(kyn_trp_cfs,kyn_trp_sc),xlab="groups",ylab="Kyn/Trp")
points(factor(Group),c(kyn_trp_cfs,kyn_trp_sc),pch=21,bg="red")
plot(factor(Group),c(ser_trp_cfs,ser_trp_sc),xlab="groups",ylab="Ser/Trp")
points(factor(Group),c(ser_trp_cfs,ser_trp_sc),pch=21,bg="red")
#
# we run the Wilcoxon test for each metabolite and ratio
#
wilcox.test(kyn_cfs,kyn_sc)
wilcox.test(trp_cfs,trp_sc)
wilcox.test(ser_cfs,ser_sc)
wilcox.test(kyn_trp_cfs,kyn_trp_sc)
wilcox.test(ser_trp_cfs,ser_trp_sc) 

An update on the UK Biobank: fine-mapping

An update on the UK Biobank: fine-mapping

The man who does not make his own tools does not make his own sculpture.

Irving Stone, The Agony and The Ecstasy

An analysis of the UK Biobank database carried out by Neale’s Lab pointed out a significant correlation between a region of chromosome 13 and females with self-reported Chronic Fatigue Syndrome (see this blog post). This result was further analyzed in a subsequent paper in which it was suggested that the causal variant within this region may be rs7337312 (position 13:41353297:G:A on reference genome GRCh37, forward strand) (Dibble J. et al. 2020). It is included in a regulatory region of gene SLC25A15, which encodes mitochondrial ornithine transporter I, and the potential biological effects of this variant have been discussed in the aforementioned study. I made an attempt at my own discussion of this variant (here).

It is not unusual for a GWAS study to associate a trait with a long sequence of variants belonging to the same chromosome and in a close physical relationship one with the other, due to high linkage disequilibrium often present between variants close to one another (see this blog post). Several methods have been developed in the last years to further refine the association and find the variant that causes the disease and drives the association of its nearby SNPs. These methods are based on both purely mathematical considerations and on the analysis of the known consequences of the variants, and we refer to them by the evocative name of fine-mapping (Schaid DJ et al. 2018).

I tried to develop my own method and software for a purely mathematical approach to fine-mapping and I applied it to the region associated with ME/CFS in females, and I found variant rs11147812 (13:41404706:T:C on h19) as the most likely causal one. The detailed discussion of the method I followed and the scripts I wrote are available here for download. It might be noted that this variant belongs to a genetic entity (pseudogene TPTE2P5) that harbors variants that have been strongly associated with the following hematic parameters (see GWAS catalog).

The alternative allele of rs11147812 has been reported to increase (effect size > 0) the expression of gene SLC25A15 in fat (see this page) with a p value of 2.85\cdot10^{-20}. Since the alternative variant is less frequent in patients than in controls, the prediction is that we have a reduced expression of SLC25A15 in fat, in female CFS patients.

Diastolic blood pressure (reduction)Evangelou E et al. 2018
Blood volume occupied by platelets (increase)Astle WG et al. 2016
Basophil count (decrease)Masahiro K et al. 2018
Table. Significant associations between traits and variants of TPTE2P5.

Note. The header image shows the convergence of the linear method used by my script for fine-mapping the region on chromosome 13. The detailed discussion of the method I followed and the scripts I wrote are available here for download.

A second glance at the UK Biobank Database

FeaturedA second glance at the UK Biobank Database

Note. This blog post should be read on a laptop. With smaller screens, some of the tables cannot be correctly formatted.

Introduction: what has been found

The UK Biobank is a large-scale biomedical database that includes genetic data on 500,000 UK citizens (Sudlow, C et al. 2015). Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is among the phenotypes reported by participants. The total number of ME/CFS patients included is 1208 females and 451 males, with control groups of more than 150 k individuals each. For each individual 805,426 variants (both SNPs and short InDels) were genotyped. From them, up to 30 million variant can be imputed using haplotype maps. This is the biggest GWAS study on ME/CFS patients ever made, so far. Many research groups have analyzed this huge quantity of data and the conclusion, according to (Dibble J. et al. 2020), is that the only variants to be significantly different between patients and controls are included in gene SLC25A15, which encodes mitochondrial ornithine transporter I, particularly variant rs7337312 (position13:41353297:G:A on reference genome GRCh37, forward strand). But this difference is statistically significant only when we consider female patients vs female controls (Table 1). For a detailed discussion of this result, the best reference is the already mentioned (Dibble J. et al. 2020). I wrote an analysis myself on this same result and on its possible implications, available here. My focus, while writing that document, was also understanding the architecture of the metadata generated by using the UK Biobank, by Neale’s Lab (R), one of the research groups that performed the statistical analyses. It was not obvious, at the beginning, how I could derive for each variant genotyped a table like Table 1. In fact, it is the result of the elaboration of five files, four of them of considerable size. Once I realized how to read and manipulate these files, I manually derived the data I was interested in, at first. Then I decide to write a code that could do the work in an automated fashion (see below).

Table 1. Frequencies and allele counts for variant rs7337312 (gene SLC25A15 ) among 1208 CFS female patients, 451 CFS male patients, 192945 female controls, 166537 male controls, total UK Biobank sample, and general population (global genome AD). P values for patients vs controls have been calculated for both sexes. I derived this table from metadata by Neale’s Lab (R).

What has not been found

Now, even if we are certainly disappointed by the fact that the positive result of these analyses does not add much to what we already knew about ME/CFS, it is important to stress that negative results are also relevant. Why? Because we can rule out the huge amount of genetic associations made through the years between fatigue in general, and CFS in particular, and genetic variants (see for instance the encyclopedic review Whang T. et al. 2017). As an example, I will focus on the variants of TNF alpha gene collected in Table 2. In 2006, an Italian group found that the minor allele of variant rs1799724 (TNF-857) was significantly more frequent in 54 ME/CFS patients than in 224 controls ( Carlo-Stella N. et al. 2006). In 2020, a group from Germany reported a statistically significant difference for the same variant, but only when ME/CFS patients without infectious triggered onset were considered (Steiner S. et al. 2020). Two other variants of gene TNF alpha were associated with morning fatigue among oncology outpatients with breast, prostate, lung, or brain cancer ( Dhruva A. et al. 2015). Are these variants different between the 1659 ME/CFS patients and the about 300 k controls in the UK Biobank database?

VariantPosition on GRCh37 (fs) Disease p val Reference
rs1799724 ch6:31,542,482 ME/CFS
ME/CFS w/o ITO
0.0028
0.0430
Carlo-Stella N. et al. 2006
Steiner S. et al. 2020
rs1800629
rs3093662
ch6:31,543,031
ch6:31,544,189
Morning fatigue in oncology outpatients
with breast, prostate, lung, or brain cancer
0.0250
0.0040
Dhruva A. et al. 2015
Table 2. Variants on gene TNF associated with ME/CFS, with ME/CFS without Infectious Triggered Onset (ITO), and with morning fatigue in oncology outpatients with breast, prostate, lung, or brain cancer.

To answer this question I have selected a suitable region of chromosome 6 (ch6:31,539,768 – ch6:31,546,495) that includes not only all the variants in Table 2, but the whole TNF gene. I then analyzed this region with a code I wrote in GNU Octave, version 6.3.0. This code generates three .csv files (one for each of the following groups: females only, males only, both sexes) with information on each one of the variants genotyped within the region mentioned (see Supplementary Material). An extract from these files, with only the variants collected in Table 2, can be seen in Table 3, Table 4, and Table 5. None of these variants reaches the statistically significant threshold, which for this kind of studies is considered to be p<5×10^(-8), for variants with a minor allele frequency above 5% (even lower for less common variants) (Fadista J et al. 2016).

rsidminref altmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.1500.0770.072
1800629AGA0.1970.7060.1950.197
3093662GAG0.0760.5900.0730.076
Table 3. Frequencies of the same variants of Table 2 from Neale’s Lab Metadata. This is the comparison between 1656 ME/CFS patients (females + males) and 359482 controls (females + males). min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

rsidminrefaltmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.1670.0780.071
1800629AGA0.1970.8570.1980.196
3093662GAG0.0760.6030.0730.076
Table 4. As in Table 3, but in this case the comparison is between 1208 female ME/CFS patients and 192945 female controls. min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

rsidminrefaltmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.6090.0750.072
1800629AGA0.1970.3140.1860.199
3093662GAG0.0760.8560.0740.075
Table 5. As in Table 3 and Table 4, but in this case the comparison is between 451 male ME/CFS patients and 166537 male controls. min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

Another way to conveniently visualize the differences between patients and controls (if any) is to plot p values in function of the position of the variants they refer to. We actually plot the log10 of 1/p or, in other words, -log10(p). This kind of diagram is known as Manhattan plot and in Figure 1 you see the manhattan plot for region ch6:31,539,768 – ch6:31,546,495, in the case of females + males. As you can see, none of the variants has a p value below 0.01. In Figure 2 you find the same plot, this time for females only (left) and males only (right).

Figure 1. Manhattan plot for ch6:31,539,768 – ch6:31,546,495, in the case of female + male CFS patients vs controls. Red dot is rs1799724, blue dot is rs1800629, yellow dot is rs3093662. None of these three variants reaches a p value below 10^-1. Plotted by manhattan_plot.m (see supplementary material).
Figure 2. Manhattan plot for ch6:31,539,768 – ch6:31,546,495, in the case of female CFS patients vs female controls (left) and male CFS patients vs male controls (right). Red dot is rs1799724, blue dot is rs1800629, yellow dot is rs3093662. None of these three variants reaches a p value below 10^-1.

Conclusions

In conclusion, these codes of mine provide as output the frequency of both the minor and the major allele among 1659 ME/CFS patients, stratified by sex, for about 15 million variants (including SNPs and InDels). They also provide the p value (beta test) for the comparison of these patients with about 300 k controls, along with general information on the variants (such as position, minor allele frequency among 500 k UK citizens, predicted consequences, etc). These codes do so by analyzing the metadata generated by Neale’s Lab, a research group that worked on the UK Biobank dataset. In this blog post, I have shown how to use these data to test previously published genetic associations between ME/CFS and gene TNF alpha.

Supplementary material

The files that I used as input for my codes are the following ones (click on them to download):

variants_TNF

20002_1482.gwas.imputed_v3.both_sexes_TNF

20002_1482.gwas.imputed_v3.females_TNF

20002_1482.gwas.imputed_v3.males_TNF

The output generated by TNF.m (see below) consists in the following files (click on them to download) which were used to build tables from 3 to 5:

TNF_total_both_sexes

TNF_total_females

TNF_total_males

Details on these files can be found in this PDF. The code that generates the plots is pretty simple and can be downloaded here. The code that generates the three output files from the four input ones is the one that follows.

% file name = TNF
% date of creation = 25/08/2021
% it generates the file TNF_tot by using the package IO
% see https://octave.sourceforge.io/io/ for a description
clear all
close all
pkg load io
%-------------------------------------------------------------------------------
% Data source
%-------------------------------------------------------------------------------
% The collection of the analysis generated by Neale’s Lab with the UK Biobank
% data is available here:
% https://docs.google.com/spreadsheets/d/1kvPoupSzsSFBNSztMzl04xMoSC3Kcx3CrjVf4yBmESU/edit?ts=5b5f17db#gid=178908679
% In line 2 of that file, you can download an explanation of the labels (README).
% At lines 9 and 10 you can download a collection of the phenotypes and of some
% of the attributes of the relative sample, for females and males respectively
% (phenotypes.female.tsv, phenotypes.male.tsv). We are interested in phenotype 20002_1482
% (Non-cancer illness code, self-reported: chronic fatigue syndrome). In line 11
% you find a file that contains the details on each variant (variants.tsv). You
% must unzip the files before opening them. I have used File viewer plus for reading
% this file (Excel doesn’t seem to open it).
%-------------------------------------------------------------------------------
% Length and position of TNF
%-------------------------------------------------------------------------------
% The position of TNF goes from ch6:31,543,342 to ch6:31,546,113 on the reference
% genome GRCh37 (alias h19), according to NCBI (https://www.ncbi.nlm.nih.gov/gene/7124).
% The two variants we are interested in has a position ch6:31,542,482 (rs1799724, TNF-857)
% and ch6:31,543,031 (rs1800629), on GRCh37. So we must include them, along with
% the whole TNF gene. Since rs1799724 falls also in gene LTA, I include this gene
% as well (LTA goes from ch6:31,539,876 to ch6:31,542,101). Then a suitable interval
% in this dataset is ch6:31,539,768 to ch6:31,546,495.
%-------------------------------------------------------------------------------
% numbers of cases and controls
%-------------------------------------------------------------------------------
n_cases_f = 1208
n_controls_f = 192945
n_cases_m = 451
n_controls_m = 166537
n_cases_b = n_cases_f + n_cases_m
n_controls_b = n_controls_f + n_controls_m
%-------------------------------------------------------------------------------
% name of the gene
%-------------------------------------------------------------------------------
gene = 'TNF';
%-------------------------------------------------------------------------------
% number of variants = upper limit for range - 1
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
[filetype, sh_names, fformat, nmranges] = xlsfinfo (file_name);
strn = char(sh_names(1,2));
strn = substr(strn,5) % upper limit for range as a string
n = str2num(strn) % upper limit for range as a number
%-------------------------------------------------------------------------------
% reading and storing variants_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant = text;
% reading and storing the chromosome
range = strcat('B2:B',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
chr = num;
% reading and storing the position on chromosome
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pos = int32(num);
% reading and storing the ref allele
range = strcat('D2:D',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ref = text;
% reading and storing the alt allele
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
alt = text;
% reading and storing the rsid
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
rsid = raw;
% reading and storing the consequence
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence = text;
% reading and storing the consequence category
range = strcat('I2:I',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence_category = text;
% reading and storing the Alternate allele count (calculated using hardcall genotypes)
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC = int32(num);
% reading and storing the Alternate allele frequency (calculated using hardcall genotypes)
range = strcat('M2:M',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AF = double(num);
% reading and storing the Minor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele)
range = strcat('N2:N',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_allele = text;
% reading and storing the Minor allele frequency (calculated using hardcall genotypes)
range = strcat('O2:O',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF = double(num);
% reading and storing the Hardy-Weinberg p-value
range = strcat('P2:P',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
p_hwe = double(num);
% reading and storing the Number of samples with defined genotype at this variant
range = strcat('Q2:Q',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_called = int32(num);
% reading and storing the Number of samples without a defined genotype at this variant
range = strcat('R2:R',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_not_called = int32(num);
% testing by writing some of the data
i = 2
variant (i)
chr (i)
pos (i)
ref (i)
alt (i)
rsid (i)
consequence (i)
consequence_category (i)
AC (i)
AF (i)
minor_allele (i)
minor_AF (i)
p_hwe (i)
n_called (i)
n_not_called(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.females_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.females_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_f = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_f = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_f = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_f = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_f = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_f = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_f = double(num);
% testing by writing some of the data
i = 2
variant_f(i)
minor_AF_f(i)
low_confidence_variant_f(i)
n_complete_samples_f(i)
AC_f(i)
ytx_f(i)
pval_f(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.males_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.males_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_m = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_m = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_m = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_m = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_m = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_m = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_m = double(num);
% testing by writing some of the data
i = 2
variant_m(i)
minor_AF_m(i)
low_confidence_variant_m(i)
n_complete_samples_m(i)
AC_m(i)
ytx_m(i)
pval_m(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.both_sexes_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.both_sexes_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_b = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_b = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_b = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_b = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_b = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_b = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_b = double(num);
% testing by writing some of the data
i = 2
variant_b(i)
minor_AF_b(i)
low_confidence_variant_b(i)
n_complete_samples_b(i)
AC_b(i)
ytx_b(i)
pval_b(i)
%-------------------------------------------------------------------------------
% checking for wrong alignments between these 4 files
%-------------------------------------------------------------------------------
error = 0.;
for i=1:n-1
a = variant(i);
b = variant_f(i);
c = variant_m(i);
d = variant_b(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,c);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,d);
if ( (test==0) );
error = error +1;
endif
endfor
error
%-------------------------------------------------------------------------------
% defining major and minor alleles among all the variants
%-------------------------------------------------------------------------------
for i=1:n-1
frequency = AF(i); %frequency of the alternative allele
if ( frequency <= 0.5 )
minor_allele2(i)=alt(i);
major_allele(i)=ref(i);
else
minor_allele2(i)=ref(i);
major_allele(i)=alt(i);
endif
endfor
%-------------------------------------------------------------------------------
% defining alternate allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
alt_AF_f(i) = ytx_f(i)/(2n_cases_f); alt_AF_m(i) = ytx_m(i)/(2n_cases_m);
alt_AF_b(i) = ytx_b(i)/(2n_cases_b); endfor
%------------------------------------------------------------------------------- 
% defining alternate allele frequencies in control groups %------------------------------------------------------------------------------- 
for i=1:n-1 alt_AF_f_controls(i) = ( AC_f(i)- ytx_f(i) )/(2n_controls_f);
alt_AF_m_controls(i) = ( AC_m(i)- ytx_m(i) )/(2n_controls_m); alt_AF_b_controls(i) = ( AC_b(i)- ytx_b(i) )/(2n_controls_b);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f(i) = 1-alt_AF_f(i);
ref_AF_m(i) = 1-alt_AF_m(i);
ref_AF_b(i) = 1-alt_AF_b(i);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in control groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f_controls(i) = 1-alt_AF_f_controls(i);
ref_AF_m_controls(i) = 1-alt_AF_m_controls(i);
ref_AF_b_controls(i) = 1-alt_AF_b_controls(i);
endfor
%-------------------------------------------------------------------------------
% checking for errors by comparing minor_allele with minor_allele2
%-------------------------------------------------------------------------------
error2 = 0.;
for i=1:n-1
a = minor_allele(i);
b = minor_allele2(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error2 = error2 +1;
endif
endfor
error2
%-------------------------------------------------------------------------------
% writing of GENE_total_females.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_f{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_f(i);
columns{i+1,15} = alt_AF_f(i);
columns{i+1,16} = alt_AF_f_controls(i);
columns{i+1,17} = ref_AF_f(i);
columns{i+1,18} = ref_AF_f_controls(i);
columns{i+1,19} = n_cases_f;
columns{i+1,20} = n_controls_f;
endfor
file_name = strcat(gene,'_total_females.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of GENE_total_males.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_m{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_m(i);
columns{i+1,15} = alt_AF_m(i);
columns{i+1,16} = alt_AF_m_controls(i);
columns{i+1,17} = ref_AF_m(i);
columns{i+1,18} = ref_AF_m_controls(i);
columns{i+1,19} = n_cases_m;
columns{i+1,20} = n_controls_m;
endfor
file_name = strcat(gene,'_total_males.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of CLYBL_total_both_sexes.csv
%-------------------------------------------------------------------------------
columns = cell(1523,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_b{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_b(i);
columns{i+1,15} = alt_AF_b(i);
columns{i+1,16} = alt_AF_b_controls(i);
columns{i+1,17} = ref_AF_b(i);
columns{i+1,18} = ref_AF_b_controls(i);
columns{i+1,19} = n_cases_b;
columns{i+1,20} = n_controls_b;
endfor
file_name = strcat(gene,'_total_both_sexes.csv')
cell2csv (file_name, columns," ")

A complete (preload) failure

FeaturedA complete (preload) failure

Introduction

Some days ago, David Systrom offered an overview of his work on cardiopulmonary testing in ME/CFS during a virtual meeting hosted by the Massachusetts ME/CFS & FM Association and the Open Medicine Foundation. In this blog post, I present an introduction to the experimental setting used for Systrom’s work (paragraph 1), a brief presentation of his previous findings (paragraph 2), and an explanation of his more recent discoveries in his cohort of patients (paragraph 3). In paragraph 4 you’ll find a note on how to support his research.

1. Invasive Cardiopulmonary Exercise Testing

It is a test that allows for the determination of pulmonary, cardiac, and metabolic parameters in response to physical exertion of increasing workload. It is, mutatis mutandis, the human equivalent of an engine test stand. A stationary bike with a mechanical resistance that increases by 10 to 50 Watts for minute is usually employed for assessing the patient in a upright position, but a recumbent bike can also be used in some instances. Distinguishing between these two different settings might be of pivotal relevance in ME/CFS and POTS. I shall now briefly describe some of the measurements that can be collected during invasive cardiopulmonary exercise testing (iCPET) and their biological meaning. For a more accurate and in-depth account, please refer to (Maron BA et al. 2013), (Oldham WM et al. 2016). I have used these papers as the main reference for this paragraph, unless otherwise specified.

Gas exchange. A face mask collects the gasses exchanged by the patient during the experiment and allows for monitoring of both oxygen uptake per unit of time (named VO_2) and carbon dioxide output (VCO_2), measured in mL/min. Gas exchange is particularly useful for the determination of the anaerobic threshold (AT), i.e. the point in time at which the diagram of VCO_2 in function of VO_2 displays an abrupt increase in its derivative: at this workload, the patient starts relying more on her anaerobic energy metabolism (glycolysis, for the most part) with a build-up of lactic acid in tissues and blood (see Figure 1).

Figure 1. Diagram of VCO_2 in function of VO_2. The point in which there is a change in the derivative with respect to VO_2 is called “anaerobic threshold” (AT). AT is highlighted with a vertical line in this picture. This diagram is from an actual CPET of a patient.

Oxygen uptake for unit of time at AT (called VO_2max) can be considered an integrated function of patient’s muscular, pulmonary, and cardiac efficiency during exercise. It is abnormal when its value is below 80% of what predicted according to patient’s age, sex, and height. Importantly, according to some studies there might be no difference in VO_2max between ME/CFS patients and healthy controls, unless the exercise test is repeated a day after the first measure: in this case the value maxVO_2 for patients is significantly lower than for controls (VanNess JM et al. 2007), (Snell CR and al. 2013).

Another measure derived from the assessing of gas exchange is minute ventilation (VE, measured in L/min) which represents the total volume of gas expired per minute. The link between VE and VO_2 is as follows:

VO_2\;=\;VE\cdot(inspired\;VO_2\; -\; expired\;VO_2)

Maximum voluntary ventilation (MVV) is the maximum volume of air that is voluntarily expired at rest. During incremental exercise, a healthy person should be able to maintain her VE at a value ∼0.7 MVV and it is assumed that if the ratio VE/MVV is above 0.7, then the patient has a pulmonary mechanical limit during exercise. If VE is normal, then an early AT suggests an inefficient transport of oxygen from the atmosphere to muscles, not due to pulmonary mechanics, thus linked to either pulmonary vascular abnormalities or muscular/mitochondrial abnormalities. It is suggested that an abnormally high derivative of the diagram of VE in function of VCO_2 and/or a high ratio VE/VCO_2 at AT (these are measures of how efficiently the system gets rid of CO_2) are an indicator of poor pulmonary vascular function.

Respiratory exchange ratio (RER) is a measure of the effort that the patient puts into the exercise. It is measured as follows:

RER=\frac{VCO_2}{VO_2}

and an RER>1.05 indicates a sufficient level of effort. In this case the test can be considered valid.

Arterial catheters. A sensor is placed just outside the right ventricle (pulmonary artery, Figure 2) and another one is placed in the radial artery: they allow for measures of intracardiac hemodynamics and arterial blood gas data, respectively. By using this setting, it is possible to indirectly estimate cardiac output (Qt) by using Fick equation:

Qt=\frac{VO_2}{arterial\;O_2 - venous\;O_2}

where the arterial\;O_2 is measured by the radial artery catheter and the venous one is measured by the one in the pulmonary artery (ml/L). An estimation for an individual’s predicted maximum Qt (L/min) can be obtained by dividing her predicted VO_2max by the normal maximum value of  arterial\;O_2 - venous\;O_2 during exercise, which is 149 mL/L:

predicted\; Qt\;max=\frac{predicted\; VO_{2}max}{149 \frac{mL}{L}}

If during iCPET the measured Qt max is below 80% of the predicted maximum cardiac output (as measured above), associated with reduced VO_2max, then a cardiac abnormality might be suspected. Stroke volume (SV), defined as the volume of blood ejected by the left ventricle per beat, can be obtained from the Qt according to the following equation:

Qt=SV\cdot HR\;\xrightarrow\;SV\;=\;\frac{Qt}{HR}\;=\;\frac{\frac{VO_2}{arterial\; O_2 - venous\; O_2}}{HR}

where HR stands for heart rate. One obvious measure from the pulmonary catheter is the mean pulmonary artery pressure (mPAP). The right atrial pressure (RAP) is the blood pressure at the level of the right atrium. Pulmonary capillary wedge pressure (PCWP) is an estimation for the left atrial pressure. It is obtained by the pulmonary catheter. The mean arterial pressure (MAP) is the pressure measured by the radial artery catheter and it is a proxy for the pressure in the left atrium. RAP, mPAP, and PCWP are measured by the pulmonary catheter (the line in red) which from the right atrium goes through the tricuspid valve, enters the right ventricle, and then goes along the initial part of the pulmonary artery (figure 2).

Figure 2. Right atrial pressure (RAP) is the pressure of the right atrium, mean pulmonary arterial pressure (mPAP) is the pressure of the right ventricle, pulmonary capillary wedge pressure (PCWP) is an estimation of the pressure of the left atrium. Mean arterial pressure gives a measure of the pressure of the left ventricle. RAP, mPAP, and PCWP are measured by the pulmonary catheter (the line in red) which from the right atrium goes through the tricuspid valve, enters the right ventricle, and then goes across the initial part of the pulmonary artery (R).

Derived parameters. As seen, Qt (cardiac output) is derived from actual direct measures collected by this experimental setting, by using a simple mathematical model (Fick equation). Another derived parameter is pulmonary vascular resistance (PVR) which is obtained using the particular solution of the Navier-Stokes equations (the dynamic equation for Newtonian fluids) that fits the geometry of a pipe with a circular section. This solution is called the Poiseuille flow, and it states that the difference in pressure between the extremities of a pipe with a circular cross-section A and a length L is given by

\Delta\;P\;=\;\frac{8\pi\mu L}{A^2}Q

where \mu is a mechanical property of the fluid (called dynamic viscosity) and Q is the blood flow (Maccallini P. 2007). As the reader can recognize, this formula has a close resemblance with Ohm’s law, with P analogous to the electric potential, Q analogous to the current, and \frac{8\pi\mu L}{A^2} analogous to the resistance. In the case of PVR, Q is given by Qt while \Delta\;P\;=\;mPAP\;-\;PCWP. Then we have:

PVR\;=\;80\frac{\;mPAP\;-\;PCWP}{Qt}

where the numeric coefficient is due to the fact that PVR is usually measured in \frac{dyne\cdot s}{cm^5} and 1 dyne is 10^5 Newton while 1 mmHg is 1333 N/m².

2. Preload failure

A subset of patients with exercise intolerance presents with preload-dependent limitations to cardiac output. This phenotype is called preload failure  (PLF) and is defined as follows: RAP max < 8 mmHg, Qt and VO_2max <80% predicted, with normal mPAP (<25 mmHg) and normal PVR (<120 \frac{dyne\cdot s}{cm^5}) (Maron BA et al. 2013). This condition seems prevalent in ME/CFS and POTS. Some of these patients have a positive cutaneous biopsy for small-fiber polyneuropathy (SFPN), even though there seems to be no correlation between hemodynamic parameters and the severity of SFPN. Intolerance to exercise in PLF seems to improve after pyridostigmine administration, mainly through potentiation of oxygen extraction in the periphery. A possible explanation for PLF in non-SFPN patients might be a more proximal lesion in the autonomic nervous system (Urbina MF et al. 2018), (Joseph P. et al. 2019). In particular, 72% of PLF patients fits the IOM criteria for ME/CFS and 27% meets the criteria for POTS. Among ME/CFS patients, 44% has a positive skin biopsy for SFPN. One possible cause for damage to the nervous system (both in the periphery and centrally) might be TNF-related apoptosis-inducing ligand (TRAIL) which has been linked to fatigue after radiation therapy; TRAIL increases during iCPET among ME/CFS patients (see video below).

3. Latest updates from David Systrom

During the Massachusetts ME/CFS & FM Association and Open Medicine Foundation Fall 2020 Event on Zoom, David Systrom reported on the results of iCPET in a set of ME/CFS patients. The VO_2max is lower in patients vs controls (figure 3, up). As mentioned before, VO_2max is an index that includes contributions from cardiac performances, pulmonary efficiency, and oxygen extraction rate in the periphery. In other words, a low VO_2max gives us no explanation on why it is low. This finding seems to be due to different reasons in different patients even though the common denominator among all ME/CFS patients of this cohort is a low pressure in the right atrium during upright exercise (low RAP, figure 3, left). But then, if we look at the slope of Qt in function of VO_2 (figure 3, right) we find three different phenotypes. Those with a high slope are defined “high flow” (in red in figure 3). Then we have a group with a normal flow (green) and a group with a low flow (blue). If we look then at the ability to extract oxygen by muscles (figure 3, below) expressed by the ratio

\frac{arterial\;O_2 - venous\;O_2}{HB}

we can see that the high flow patients reach the lowest score. In summary, all ME/CFS patients of this cohort present with poor VO_2max and preload failure. A subgroup, the high flow phenotype, has poor oxygen extraction capacity at the level of skeletal muscles.

Figure 3. The results presented by David Systrom are here displayed around a schematic representation of the circulatory system. VO_2 is a global measure of the efficiency of the circulatory system. CO, which stands for cardiac output (indicated Qt in this blog post) is related to the output of the left half of the heart. RAP is the pressure of the right atrium. By Paolo Maccallini.

Now the problem is: what is the reason for the preload failure? And in the high flow phenotype, why the muscles can’t properly extract oxygen from blood? As mentioned, about 44% of ME/CFS patients in this cohort has SFPN but there is no correlation between the density of small-fibers in the skin biopsies and the hemodynamic parameters. Eleven patients with poor oxygen extraction (high flow) had their muscle biopsy tested for mitochondrial function (figure 4) and all but one presented a reduction in the activity of citrate synthase (fourth column): this is the enzyme that catalyzes the last/first step of Krebs cycle and it is considered a global biomarker for mitochondrial function. Some patients also have defects in one or more steps of the electron transport chain (fifth column) associated with genetic alterations (sixth column). Another problem in high flow patients might be a dysfunctional vasculature at the interface between the vascular system and skeletal muscles (but this might be true for the brain too), rather than poor mitochondrial function.

Figure 4. Eleven patients with high flow (poor oxygen extraction) underwent a muscle biopsy. Mitochondrial function has been assessed in these samples and all the patients but one presented a reduced activity for the enzyme citrate synthase (4th column). Defects in the oxygen transport chain and in the mitochondrial chromosome have also been documented in 4 of them (column 5th and column 6th).

The use of an acetylcholinesterase inhibitor (pyridostigmine) improved the ability to extract oxygen in the high flow group, without improving cardiac output, as measured with a CPET, after one year of continuous use of the drug. This might be due to better regulation of blood flow in the periphery. This paragraph is an overview of the following video:

4. Funding

The trial on the use of pyridostigmine in ME/CFS at the Brigham & Women’s Hospital by Dr. David Systrom is funded by the Open Medicine Foundation (R). This work is extremely important, as you have seen, both for developing diagnostic tools and for finding treatments for specific subgroups of patients. Please, consider a donation to the Open Medicine Foundation to speed up this research. See how to donate.


The equations of this blog post were written using \LaTeX (see this article).

Is it that bad?

Is it that bad?

When I got sick, about 20 years ago, for the first time I started thinking about diseases and loss of health. And I remember coming to the conclusion of how fortunate I was, from a physical standpoint. Not of how fortunate I had been in the past (that was obvious), when I could conquer mountains, running on rocks with my 15 Kg bike on the shoulders; no, of how fortunate I was in that very moment, while confined at home, mostly lying horizontally. I was fortunate, I could still move my hands, I had still all my body, even if I couldn’t use it in the outside world, even if most of the usual activities of a 20 years old man were far beyond reaching, I realized how fortunate I was. And after 20 years I still say that, as for the physical functioning, I am a very lucky man. There are even some years, during the core of the summer, in which I can run for some minutes in the sun. I am blessed.

I am aware that this might offend some patients, but I think that those who have ME/CFS in the vast majority of cases are fortunate too, from a physical standpoint. Yes, it is annoying to need help from others for so many things, but with assistance and some arrangements, you can go on with a productive life… unless you have cognitive impairment.

Actually, I never felt fortunate concerning my cognitive functioning. That has been a true tragedy, I have lost my entire life because of the cognitive damage, by any means because of the limitations of my body. I would have had a meaningful life (according to any standard) even with physical limitations way worse than mine if I had had a functioning brain. But I lost it when I was 20 before I could get the best from it and that is the only real tragedy.

As an example of what I am saying, consider a person like Stephen Hawking: he has been a prolific scientist, he had a stellar career (literally), he shaped our culture with his books for the general public, he was a father of three, and so forth. And yet for most of his life, while he was doing all these things, his physical functioning was way worse than the one of the average ME/CFS patient (worse even of a very severe patient, probably). For many years he could use only the muscles of the head. That’s it. Think about that. Was he missing or invisible? Definitely not!

Was he a disabled man? Yes, technically speaking he was, but in fact, he has never really been disabled, if we judge from his accomplishments. He once said that the disease somehow even helped him in his work, because he could concentrate better on his quantum-relativistic equations¹.

So what I am saying is that if we consider the physical functioning in ME/CFS, it is a negligible problem in most of the cases. The only thing that matters is the impairment in cognition (in the cases in which it is present), especially if it starts at a young age (consider that after you reach your thirties there is very little chance that your brain will give a significant contribution to humanity, even if you are perfectly healthy, so it is not a big loss if you get sick after that age). That is disastrous and there is no wheelchair you can use for it. For all the other symptoms you can find a way to adjust, just as Hawking did in a far worse situation.

This might be one of the reasons for the bad reputation of ME/CFS: there isn’t awareness about cognitive issues, no one talk about them (and in some cases, they are in fact not present at all). But I am pretty sure that the real source of disability in these patients, the lack of productivity, is due to their cognitive problems (when present). Also because in a world like ours, you can work even without using most of your muscles. It is not impossible, I would say it is the rule for a big chunk of the population.

The following one is an interview with Norwegian neurologist Kristian Sommerfelt, in which he points out some analogous considerations. He has done some research on ME/CFS with the group of Fluge and Mella, including the well-known study on pyruvate dehydrogenase. From the subtitles of the video (minute 3:38):

“This [the cognitive problem] is a very typical ME symptom and some of what I believe causes the main limitation. I don’t think the main limitation is that they’re becoming fatigued and exhausted by moving around, walking, running, or having to sit still. If it were just that, I think many ME-patients could have had a much better life. But the problem is that just actively using the mind leads to problems with exactly that, using the mind. It comes to a stop, or slow down, depending on how ill they are.”


¹ It seems that Stephen Hawking had a very unusual presentation of Amyotrophic Lateral Sclerosis (ALS): one half of those with this disease die within 30 months from the first symptom; moreover one out of two ALS patients has a form of cognitive impairment which in some cases can be diagnosed as frontotemporal dementia [R]. So, Stephen Hawking was somehow lucky, in his tragedy, and he doesn’t represent the average ALS patient. I mentioned his case as an example of a person with very severe physical impairment and no apparent cognitive decline, not as an example of the average ALS patient. 

Testing hypotheses

Testing hypotheses

Introduction

My ME/CFS improves during summer, in the period of the year that goes from May/June to the end of September. I don’t know why. I have several hypotheses. One possible reason for the improvement in summer is an interaction between the light from the Sun and some parts of my physiology, the immune system for instance. We know that ME/CFS tends to have an oscillating course in most of the patients (Chu L. et al. 2019), but the presence of a seasonal pattern in this patient population has not been investigated so far, to my knowledge. And yet, if you ask directly to patients, many of them say that they feel better in summer. Unfortunately, we don’t have scientific data on that, this is an area worth investigating with some carefully done survey.

Seasonal variation of the immune system

The immune system has a high degree of variation for several reasons (Brodin P et Davis MM 2017). In particular, there are studies about the seasonal fluctuations in the expression of some crucial genes of the immune response (Dopico XC et al. 2014).

How does this regulation happen? Different mechanisms are possible, some of them might be related to changes in the light we receive from the Sun as the Earth rotates around it. We know that the length of the day has an effect on innate immunity: the more the hours of light, the lower the power of the innate immune system (Pierre K. et al. 2016). We also know that ultraviolet radiation, particularly UVB, is an agonist for the aryl hydrocarbon receptor (AhR) (Navid F. et al. 2013). This receptor seems to reduce the expression of the major histocompatibility complex II (MHC II) in dendritic cells (DCs), thus reducing their antigen-presenting activity (Rothhammer V. et Quintana F.J. 2019). UVB might be able to reach dendritic cells when they are circulating near the skin, during summer, thus inhibiting their antigen-presenting activity. Infrared radiation, on the other hand, seems to have an effect on energy metabolism: in Fall we lose a significant amount of infrared radiation in a wavelength range (0.7-1.0 nm) that is known to have an effect on mitochondrial activity (Nguyen L.M. et al. 2013) and it might perhaps have an indirect effect on immunity too.

As further proof of seasonal fluctuation in immunity, some immunological diseases have this kind of seasonality: Rheumatoid arthritis (Nagamine R. et al. 2014) and Rheumatic fever (Coelho Mota C.C. et al. 2010) are two examples. Moreover, the prevalence of Multiple Sclerosis is directly proportional to the latitude (Simpson S. et al. 2011). We also know that there is seasonal fluctuation in serum autoantibodies (Luong T.H. et al. 2001).

Of course, sunlight might be just one of the variables into play. The other aspect I am considering is the seasonal distribution of some common pathogens. Streptococcus, Enteroviruses and Fungi of the genus Penicillium are known to have a seasonal distribution with a peak in Fall and/or Winter (Ana S.G. et al. 2006), (Salort-Pons M et al. 2018), (Coelho Mota C.C. et al. 2010). Common influenza has this pattern too. Rheumatic fever, a disease due to an abnormal immune response to Streptococcus, has its flares in Fall because Streptococcus is more common in that period of the year (Coelho Mota C.C. et al. 2010). Even the composition of the gut microbiota has a seasonal pattern (Koliada A. et al. 2020). I am currently investigating my immunosignature, measured with an array of 150.000 random peptides, to see if I can find some relevant pathogen in my case. You can find this study here.

(A few months after I wrote these notes a pivotal study has been published on these same topics, avalilable here).

An experiment

I moved from Rome (Italy) to Rosario (Argentina) at the beginning of January. I was very sick and I steadily improved after about 40 days. I became a less severe ME/CFS patients and I could work several hours a day and care for myself, granted that I did not exceed with aerobic exercise. At the end of March, I started deteriorating as it usually happens at the end of September, when I am in Rome. In order to study this phenomenon, I have built a complete model of solar radiation at sea level, which considers the inclination of sunrays in function of the latitude and of the day of the year. It takes into account the effect of the atmosphere (both diffusion and absorption) and the eccentricity of the orbit (Maccallini P. 2019). If you look at the figure below (a byproduct of my mathematical model) you can see that when I started deteriorating in Rosario, the power of sunrays at noon in that city was still as high as it is in Rome during the summer solstice (this is due to the fact that the Earth is closer to the Sun in this period and to the fact that Rosario is closer to the Equator than Rome is).

Rome_vs_Rosario

So I have to discard the original idea that the power within the infrared range, or the ultraviolet radiation, or the visible one is responsible for my improvement in summer. If I still have to consider that sunlight has something to do with my improvement, I must conclude that it is the length of the day the relevant parameter: I may need more than 12 hours of light to feel better. Why? Because the longer the day, the lower the strength of the innate immunity. This is now my working hypothesis and I will start from the following mathematical model to pursue this research: (Pierre K. et al. 2016).

I will also use full-spectrum lamps early in the morning and in the evening to reproduce a 15 hours day, so to dampen down my innate immune system in a safe, drug-free way. I have to reproduce a day of 15 hours and see what happens. In the figure below the hours of the day at dawn and at dusk and the length of the day for Rome, for each day of the year (this is also a plot from my model).

ore di luce

What follows is the script I have coded in order to plot the first figure of this post. More details on this model of solar radiation are here: (Maccallini P. 2019). As a further note, I would like to acknowledge that I started pursuing this avenue in the summer of 2009: I was building the mathematical model of solar radiation (see figure below, made in 2009) but as the summer finished, I turned into a statue and I had to stop working on it. When I improved, about a year later I started working on the systematic analysis of the mechanical equilibrium of planar structures (it is a chapter of this book). I am proud of that analysis, but it has not been very useful for my health…

trigonometria sferica 2

% file name = sun emissive power sea level Rosario vs Roma
% date of creation = 07/11/2019
% sun emissive power per unit area, per unit wavelength at sea level
clear all
% three parameters of the orbit
A = 6.69*( 10^(-9) ); % 1/km
B = 1.12*( 10^(-10) ); % 1/km
delta = pi*313/730;
% the two parameters of Plunk's law
C_1 = 3.7415*( 10^(-16) ); % W*m^2
C_2 = 1.4388*( 10^(-2) ); % mK
% Stefan-Boltzmann parameter ( W/( (m^2)*(K^4) ) )
SB = 5.670*( 10^(-8) );
% radius of the photosphere (m)
R_S = 696*(10^6); % m
% temperature of the photosphere (K)
T_S = 5875;
% conversion of units of measurments
N = 20; % dots for the equator
R = 3.8; % radius of the orbit
ro_E = 1.3; % radius of the earth
lambda_Rosario = -32*pi/180; % latitude of Rosario (radiants)
lambda_Roma = 41*pi/180; % latitude of Rome (radiants)
delta = 23.45*pi/180; % tilt angle
% the array of theta
theta(1) = 0; % winter solstice (21/22 December)
i_ws = 1;
day = 2*pi/365;
days = [1:1:366];
for i = 2:366
theta(i) = theta(i-1) + day;
if ( abs( theta(i) - (pi/2) ) <= day )
i_se = i; % spring equinox (20 March)
endif
if ( abs( theta(i) - pi ) <= day )
i_ss = i; % summer solstice (20/21 June)
endif
if ( abs( theta(i) - (3*pi/2) ) <= day )
i_ae = i; % autumn equinox (22/23 September)
endif
endfor
% the array of the radius (m)
for i=1:1:366
o_omega (i) = (10^3)/[ A + ( B*sin(theta(i) + delta ) ) ]; % m
endfor
% the array of the wavelength in micron
N = 471;
L(1) = 0.3;
L(N) = 5.0;
delta_L = ( L(N) - L(1) )/(N-1);
for j = 2:N-1
L (j) = L(j-1) + delta_L;
endfor
% the array of beta*L
load beta_int.mat S
% the array of L in metres
L_m = L*( 10^(-6) );
% angle psi
psi(1) = 0;
minute = pi/(12*60);
for i = 2:(24*60)+1
psi(i) = psi(i-1) + minute;
endfor
% -----------------------------------------------------------------------------
% Rosario
lambda = lambda_Rosario
% angle between n and r at noon in Rosario
for i= [i_ws, i_ae, i_ss, i_se]
for j=1:(24*60) + 1
% scalar product between n and r
scalar_p(j) = [cos(lambda)*sin(psi(j))*cos(delta) + sin(lambda)*sin(delta)]*( -cos(theta(i)) )+ [(-1)*cos(lambda)*cos(psi(j))]*( -sin(theta(i)) );
endfor
% value of psi at noon
for j=1:(24*60) + 1
if ( ( scalar_p(j) ) == ( max( scalar_p ) ) )
j_noon = j;
psi_noon (i) = psi(j);
endif
endfor
% angle between n and r at noon
cos_gamma (i) = scalar_p(j_noon);
endfor
% the array of the emissive power (W/(m^2)*micron) in Rosario
for i = i_se:i_se
for j=1:N
num = C_1*( (R_S)^2 );
den = ( (L_m(j)^5)*( (e^(C_2/( L_m(j)*T_S ))) - 1)*( (o_omega(i))^2 ) )*10^6;
power(j,i) = ( num/den )*( e^(-S(j)/cos_gamma (i)) );
endfor
% plotting
plot (L (1:N), power(1:N,i), '-r', "linewidth", 2)
xlabel('wavelenght ({\mu})');
ylabel('W/m^{2}{\mu}');
axis ([0.3,5,0,1500])
grid on
endfor
hold on
% -----------------------------------------------------------------------------
% Rome
lambda = lambda_Roma
% angle between n and r at noon in Rosario
for i= [i_ws, i_ae, i_ss, i_se]
for j=1:(24*60) + 1
% scalar product between n and r
scalar_p(j) = [cos(lambda)*sin(psi(j))*cos(delta) + sin(lambda)*sin(delta)]*( -cos(theta(i)) )+ [(-1)*cos(lambda)*cos(psi(j))]*( -sin(theta(i)) );
endfor
% value of psi at noon
for j=1:(24*60) + 1
if ( ( scalar_p(j) ) == ( max( scalar_p ) ) )
j_noon = j;
psi_noon (i) = psi(j);
endif
endfor
% angle between n and r at noon
cos_gamma (i) = scalar_p(j_noon);
endfor
% the array of the emissive power (W/(m^2)*micron) in Rosario
for i = [i_ae, i_ss]
for j=1:N
num = C_1*( (R_S)^2 );
den = ( (L_m(j)^5)*( (e^(C_2/( L_m(j)*T_S ))) - 1)*( (o_omega(i))^2 ) )*10^6;
power(j,i) = ( num/den )*( e^(-S(j)/cos_gamma (i)) );
endfor
endfor
hold on
plot (L (1:N), power(1:N,i_ae), '-k', "linewidth", 2)
plot (L (1:N), power(1:N,i_ss), '--k', "linewidth", 2)
legend ('spring equinox in Rosario', 'autumn equinox in Rome', 'summer solstice in Rome', "location",'NORTHEAST')
hold on
plot ([0.4,0.4], [0,1500], '--k', "linewidth", 1)
plot ([0.7,0.7], [0,1500], '--k', "linewidth", 1)

Immunosignature analysis of a ME/CFS patient. Part 1: viruses

Immunosignature analysis of a ME/CFS patient. Part 1: viruses

“Each hypothesis suggests its own criteria, its own means of proof, its own methods of developing the truth; and if a group of hypotheses encompass the subject on all sides, the total outcome of means and of methods is full and rich.”

Thomas Chrowder Chamberlain, “The Method of Multiple Working Hypotheses” (Download)

The purpose of the following analysis is to search for the viral epitopes that elicited – in a ME/CFS patient – IgGs against a set of 6 peptides, determined thanks to an array of 150.000 random peptides of 16 amino acids each. These peptides were used as query sequences in a BLAST search against viral proteins. No human virus was found. Three phages of bacterial human pathogens were identified, belonging to the classes Actinobacteria and γ-Proteobacteria. One of these bacteria, Serratia marcescens, was identified in a similar study on 21 ME/CFS cases.  

(a commentary in Dutch is available here)

1. The quest for a pathogen

Scientists have been speculating about an infectious aetiology of ME/CFS for decades, without ever being able to link the disease to a specific pathogen. The idea that the disease might be triggered and/or maintained by an infection is due to the observation that for most of the patients the onset occurs after an infectious illness (Chu, L. et al. 2019). It has also been observed that after a major infection (whether parasitic, viral or bacterial) about 11% of the population develops ME/CFS (Mørch K et al. 2013), (Hickie I. et al. 2006).

In recent years, the advent of new technologies for pathogen hunting has given renewed impulse to the search for ongoing infection in this patient population. A 2018 study, investigating the genetic profile of peripheral blood for prokaryotic and eukaryotic organisms reported that most of the ME/CFS patients have DNA belonging to the eukaryotic genera Perkinsus and Spumella and to the prokaryotic class β-proteobacteria (alone or in combination) and that these organisms are statistically more present in patients than in controls (Ellis J.E. et al. 2018). Nevertheless, a previous metagenomic analysis of plasma by another group revealed no difference in the content of genetic material from bacteria and viruses between ME/CFS patients and healthy controls (Miller R.R. et al. 2016). Moreover, metagenomic analysis pursued in various samples from ME/CFS patients by both Stanford University and Columbia University has come empty (data not published, R, R).

2. Immunological methods

Another way of investigating the presence of current and/or past infections that might be specific of this patient population is to extract the information contained in the adaptive immune response. This can be made in several ways, each of them having their own limits. One way would be to collect the repertoire of T cell receptors (TCRs) of each patient and see if they have been elicited by some particular microorganism. This is a very complex and time-consuming method that has been developed in recent years and that I have described in details going through all the recent meaningful publications (R). The main limitation of this method is that, surprisingly, TCRs are not specific for a single epitope (Mason DA 1998), (Birnbaum ME et al. 2014), so their analysis is unlikely to reveal what agent selected them. On the other hand, the advantage of this method is that T cell epitopes are linear ones, so they are extremely suited for BLAST searches against protein databases. An attempt at applying this method to ME/CFS is currently underway: it initially gave encouraging results (R), then rejected by further analysis.

Another possible avenue for having access to the information registered by adaptive immunity is to investigate the repertoire of antibodies. The use of a collection of thousands of short random peptides coated to a plate has been recently proposed as an efficient way to study the response of B cells to cancer (Stafford P. et al. 2014), infections (Navalkar K.A. et al. 2014), and immunization (Legutki JB et al. 2010). This same method has been applied to ME/CFS patients and it has shown the potential of identifying an immunosignature that can differentiate patients from controls (Singh S. et al. 2016), (Günther O.P. et al. 2019). But what about the antigens eliciting that antibody profile? Given a set of peptides one’s antibodies react to, a possible solution for interpreting the data is to use these peptides as query sequences in a BLAST search against proteins from all the microorganisms known to infect humans. This has been done for ME/CFS, and the analysis led to several matches among proteins from bacteria, viruses, endogenous retroviruses and even human proteins (in fact, both this method and the one previously described can detect autoimmunity as well) (Singh S. et al. 2016).  There are several problems with this approach, though. First of all, the number of random peptides usually used in these arrays is not representative of the variety of possible epitopes of the same length present in nature. If we consider the paper by Günther O.P. and colleagues, for instance, they used an array of about 10^5 random peptides with a length of 12 amino acids each, with the number of all the possible peptides of the same length being  20^12 ∼ 4·10^15. This means that many potential epitopes one has antibodies to are not represented in the array. Another important limitation is that B cell epitopes are mainly conformational ones, which means that they are assembled by the folding of the proteins they belong to (Morris, 2007), the consequence of this being that the subset of random peptides one’s serum react to are in fact linear epitopes that mimic conformational ones (they are often called mimotopes) (Legutki JB et al. 2010). This means that a BLAST search of these peptides against a library of proteins from pathogens can lead to completely misleading results.

Recently an array of overlapping peptides that cover the proteins for many know viruses has been successfully used for the study of acute flaccid myelitis (AFM). This technology, called VirScan, has succeeded in linking AFM to enteroviruses where metagenomic of the cerebrospinal fluid has failed (Shubert R.D. et al. 2019). This kind of approach is probably better than the one employing arrays of random peptides, for pathogen hunting. The reason being that a set of only 150.000 random peptides is unlikely to collect a significant amount of B cell epitopes from viruses, bacteria etc. Random peptides are more suited for the establishment of immunosignatures.

3. My own analysis

I have recently got access to the results of a study I was enrolled in two years ago. My serum was diluted and applied to an array of 150.000 peptides with a length of 16 random amino acids (plus four amino acids used to link the peptides to the plate). Residues Threonine (T), Isoleucine (I) and Cysteine (C) were not included in the synthesis of peptides. Anti-human-IgG Ab was employed as a secondary antibody. The set of peptides my IgGs reacted to has been filtered with several criteria, one of them being subtracting the immune response common to healthy controls, to have an immune signature that differentiates me from healthy controls. The end result of this process is the set of the following six peptides.

1 ALHHRHVGLRVQYDSG
2 ALHRHRVGPQLQSSGS
3 ALHRRQRVLSPVLGAS
4 ALHRVLSEQDPQLVLS
5 ALHVRVLSQKRPLQLG
6 ALHLHRHVLESQVNSL

Table 1. My immunosignature, as detected by an array of 150.000 random peptides 20-amino-acid long, four of which are used for fixing them to the plate and are not included here.

The purpose of the following analysis is to search for the viral epitopes that elicited this immune response. To overcome the limitations enumerated at the end of the previous paragraph I have decided to search within the database of viral proteins for exact matches of the length of 7 amino acids. Why this choice? A survey of a set of validated B cell epitopes found that the average B cell epitope has a linear stretch of 5 amino acids (Kringelum, et al., 2013); according to another similar work, the average linear epitope within a conformational one has a length of 4-7 amino acids (Andersen, et al., 2006). To filter the matches and to reduce the number of matches due to chance, I opted for the upper limit of this length. I excluded longer matches to limit the number of mimotopes for conformational epitopes. Moreover, I decided to look only for perfect matches (excluding the possibility of gaps and substitutions) so to simplify the analysis. It is worth mentioning that a study of cross-reactive peptides performed for previous work (Maccallini P. 2016), (Maccallini P. et al. 2018) led me to the conclusion that cross-reactive 7-amino-acid long peptides might often have 100% identity.

Sample
Figure 1. For each match, the matching protein and the organism it belongs to are reported. The protein ID has a link to the NCBI protein database, while the name of the organism has a link to the NCBI taxonomy browser. The host of the microorganism is also indicated, as well as its habitat, with links to further information.

So, to recap, I use the following method: BLAST search (blastp algorithm) against viral proteins (taxid 10239), a perfect match (100% identity) of at least 7-amino-acid peptides (≥43% query cover), max target sequences: 1000, substitution matrix: BLOSUM62.

4. Results

Table 2 is a collection of the matches I found with the method described above. You can look at figure 1 to see how to read the table.

ALHHRHVGLRVQYDSG (102_1_F_viruses)
9-LRVQYDS-15
QDP64279.1(29-35)
Prokaryotic dsDNA virus sp.
Archea, Ocean
8-GLRVQYD-14
AYV76690.1(358-364)
Terrestrivirus sp
Amoeba, forest soil
ALHRHRVGPQLQSSGS (102_2_F_viruses)
2-LHRHRVG-8
YP_009619965.1(63-69)
Stenotrophomonas phage vB_SmaS_DLP_5
Stenotrophomonas maltophilia (HP)
ALHRRQRVLSPVLGAS (102_3_F_viruses)
2-LHRRQRV-8
QHN71154.1 (288-294)
Mollivirus kamchatka
Protozoa (R)
8-VLSPVLG-14
QDB71078.1 (24-30)
Serratia phage Moabite
Serratia marcescens (HP)
ALHRVLSEQDPQLVLS (102_4_F_viruses)
7-SEQDPQL-13
BAR30981.1 (151-157)
uncultured Mediterranean phage uvMED
Archea and Bacteria, Med. sea
3-HRVLSEQ-9
AXS67723.1 (494-500)
Cryptophlebia peltastica nucleopolyhedrovirus
invertebrates
2-LHRVLSE-8
YP_009362111.1 (74-80)
Marco virus
Ameiva ameiva
ALHLHRHVLESQVNSL (102_6_F_viruses)
2-LHLHRHV-8
YP_009119106.1 (510-516)
Pandoravirus inopinatum
Acanthamoeba
4-LHRHVLE-10
ASZ74651.1 (61-67)
Mycobacterium phage Phabba
Mycobacterium smegmatis mc²155 (HP)

Table 2. Collection of the matches for the BLAST search of my unique set of peptides against viral proteins (taxid 10239). HP: human pathogen. See figure 1 for how to read the table.

5. Discussion

There are no human viruses detected by this search. There are some bacteriophages and three of them have as hosts bacteria that are known to be human pathogens. Bacteriophages (also known as phages) are viruses that use the metabolic machinery of prokaryotic organisms to replicate (figure 2). It is well known that bacteriophages can elicit specific antibodies in humans: circulating IgGs to naturally occurring bacteriophages have been detected (Dąbrowska K. et al. 2014) as well as specific antibodies to phages injected for medical or experimental reasons (Shearer WT et al. 2001), as reviewed here: (Jonas D. Van Belleghem et al. 2019). According to these observations, one might expect that when a person is infected by a bacterium, this subject will develop antibodies not only to the bacterium itself but also to its phages.

phage
Figure 2. Half of all viruses have an almost regular icosahedral shape, but several phages present an irregular icosahedral shape, with a prolate capsid (Luque and Reguera 2013). On the left a wrong representation of a phage. It is wrong because the capsid has 24 faces, instead of 20. On the right, the representation of a regular icosahedron made by Leonardo Da Vinci for De Divina Proportione, a mathematical book by Luca Pacioli.

If that is the case, we can use our data in table 2 to infer a possible exposure of our patient to the following bacterial pathogens: Stenotrophomonas maltophilia (HP), Serratia marcescens (HP), Mycobacterium smegmatis mc²155 (HP). In brackets, there are links to research about the pathogenicity for humans of each species. M. smegmatis belongs to the class Actinobacteria, while S. maltophila and S. marcescens are included in the class γ-Proteobacteria.

Interesting enough, Serratia marcescens was identified as one of the possible bacterial triggers for the immunosignature of a group of 21 ME/CFS patients, in a study that employed an array of 125.000 random peptides (Singh S. et al. 2016). This bacterium accounts for rare nosocomial infections of the respiratory tract, the urinary tract, surgical wounds and soft tissues. Meningitis caused by Serratia marcescens has been reported in the pediatric population (Ashish Khanna et al. 2013).

Mollivirus kamchatka is a recently discovered giant virus whose hosts are presumed to be protozoa that inhabit the soil of subarctic environment (Christo-Fourox E. et al. 2020). Not sure what the meaning might be in this context.

6. Next step

The next step will be to perform a similar BLAST search against bacterial proteins to see, among other things,  if I can find matches with the six bacteria identified by the present analysis. A further step will be to pursue an analogous study for eukaryotic microorganisms and for human proteins (in search for autoantibodies).

Historia magistra vitae?

Historia magistra vitae?

What is happening with the CCI-hypothesis in the ME/CFS community closely resembles what happened in Italy (mainly, but not only) with the CCVI-hypothesis of Multiple Sclerosis (MS). There was this new avenue, completely unexpected and very fascinating (to me, at least), that linked MS to a defect in the venous system of the neck, named Chronic cerebrospinal venous insufficiency (CCVI) by the Italian researcher Paolo Zamboni [1]. Several MS patients underwent surgery to correct one or more veins of the neck and described themselves as cured of MS thanks to this surgery. Among them also a prominent patient advocate, Pavarotti’s wife, who gave enormous publicity to this kind of technique [2].

The diagnosis of CCVI was somehow subjective, and only CCVI-literate doctors could do it properly. The same applied for the surgery. Several surgeons in private practice started doing the surgery on MS patients, earning a lot of money in a very short period of time.

Does this seem familiar?

After a decade and several well-designed studies, no correlation between CCVI and MS has been demonstrated [3], [4].

I am not saying that there is no correlation between CCI and ME/CFS. We don’t know yet. I personally find interesting these new hypotheses about the effect of abnormal mechanical strains on the functioning of the brainstem and the possible link to ME/CFS-like symptoms and I am trying to study this new field (see this blog-post), among all the other hypotheses about the aetiology of ME/CFS.

What I would like to point out with this post is that it is perfectly possible that several patients improve with this kind of surgery even in the absence of any link between CCI and ME/CFS. This is a weird (and fascinating) phenomenon that we have already seen in other diseases. It always has the same pattern: a somehow subjective diagnosis that only a few physicians can do, a surgery or a drug that many physicians are warning against, a huge amount of patients who say that they have recovered after the intervention.