“Each hypothesis suggests its own criteria, its own means of proof, its own methods of developing the truth; and if a group of hypotheses encompass the subject on all sides, the total outcome of means and of methods is full and rich.”
Thomas Chrowder Chamberlain, “The Method of Multiple Working Hypotheses” (Download)
The purpose of the following analysis is to search for the viral epitopes that elicited – in a ME/CFS patient – IgGs against a set of 6 peptides, determined thanks to an array of 150.000 random peptides of 16 amino acids each. These peptides were used as query sequences in a BLAST search against viral proteins. No human virus was found. Three phages of bacterial human pathogens were identified, belonging to the classes Actinobacteria and γ-Proteobacteria. One of these bacteria, Serratia marcescens, was identified in a similar study on 21 ME/CFS cases.
(a commentary in Dutch is available here)
1. The quest for a pathogen
Scientists have been speculating about an infectious aetiology of ME/CFS for decades, without ever being able to link the disease to a specific pathogen. The idea that the disease might be triggered and/or maintained by an infection is due to the observation that for most of the patients the onset occurs after an infectious illness (Chu, L. et al. 2019). It has also been observed that after a major infection (whether parasitic, viral or bacterial) about 11% of the population develops ME/CFS (Mørch K et al. 2013), (Hickie I. et al. 2006).
In recent years, the advent of new technologies for pathogen hunting has given renewed impulse to the search for ongoing infection in this patient population. A 2018 study, investigating the genetic profile of peripheral blood for prokaryotic and eukaryotic organisms reported that most of the ME/CFS patients have DNA belonging to the eukaryotic genera Perkinsus and Spumella and to the prokaryotic class β-proteobacteria (alone or in combination) and that these organisms are statistically more present in patients than in controls (Ellis J.E. et al. 2018). Nevertheless, a previous metagenomic analysis of plasma by another group revealed no difference in the content of genetic material from bacteria and viruses between ME/CFS patients and healthy controls (Miller R.R. et al. 2016). Moreover, metagenomic analysis pursued in various samples from ME/CFS patients by both Stanford University and Columbia University has come empty (data not published, R, R).
2. Immunological methods
Another way of investigating the presence of current and/or past infections that might be specific of this patient population is to extract the information contained in the adaptive immune response. This can be made in several ways, each of them having their own limits. One way would be to collect the repertoire of T cell receptors (TCRs) of each patient and see if they have been elicited by some particular microorganism. This is a very complex and time-consuming method that has been developed in recent years and that I have described in details going through all the recent meaningful publications (R). The main limitation of this method is that, surprisingly, TCRs are not specific for a single epitope (Mason DA 1998), (Birnbaum ME et al. 2014), so their analysis is unlikely to reveal what agent selected them. On the other hand, the advantage of this method is that T cell epitopes are linear ones, so they are extremely suited for BLAST searches against protein databases. An attempt at applying this method to ME/CFS is currently underway: it initially gave encouraging results (R), then rejected by further analysis.
Another possible avenue for having access to the information registered by adaptive immunity is to investigate the repertoire of antibodies. The use of a collection of thousands of short random peptides coated to a plate has been recently proposed as an efficient way to study the response of B cells to cancer (Stafford P. et al. 2014), infections (Navalkar K.A. et al. 2014), and immunization (Legutki JB et al. 2010). This same method has been applied to ME/CFS patients and it has shown the potential of identifying an immunosignature that can differentiate patients from controls (Singh S. et al. 2016), (Günther O.P. et al. 2019). But what about the antigens eliciting that antibody profile? Given a set of peptides one’s antibodies react to, a possible solution for interpreting the data is to use these peptides as query sequences in a BLAST search against proteins from all the microorganisms known to infect humans. This has been done for ME/CFS, and the analysis led to several matches among proteins from bacteria, viruses, endogenous retroviruses and even human proteins (in fact, both this method and the one previously described can detect autoimmunity as well) (Singh S. et al. 2016). There are several problems with this approach, though. First of all, the number of random peptides usually used in these arrays is not representative of the variety of possible epitopes of the same length present in nature. If we consider the paper by Günther O.P. and colleagues, for instance, they used an array of about 10^5 random peptides with a length of 12 amino acids each, with the number of all the possible peptides of the same length being 20^12 ∼ 4·10^15. This means that many potential epitopes one has antibodies to are not represented in the array. Another important limitation is that B cell epitopes are mainly conformational ones, which means that they are assembled by the folding of the proteins they belong to (Morris, 2007), the consequence of this being that the subset of random peptides one’s serum react to are in fact linear epitopes that mimic conformational ones (they are often called mimotopes) (Legutki JB et al. 2010). This means that a BLAST search of these peptides against a library of proteins from pathogens can lead to completely misleading results.
Recently an array of overlapping peptides that cover the proteins for many know viruses has been successfully used for the study of acute flaccid myelitis (AFM). This technology, called VirScan, has succeeded in linking AFM to enteroviruses where metagenomic of the cerebrospinal fluid has failed (Shubert R.D. et al. 2019). This kind of approach is probably better than the one employing arrays of random peptides, for pathogen hunting. The reason being that a set of only 150.000 random peptides is unlikely to collect a significant amount of B cell epitopes from viruses, bacteria etc. Random peptides are more suited for the establishment of immunosignatures.
3. My own analysis
I have recently got access to the results of a study I was enrolled in two years ago. My serum was diluted and applied to an array of 150.000 peptides with a length of 16 random amino acids (plus four amino acids used to link the peptides to the plate). Residues Threonine (T), Isoleucine (I) and Cysteine (C) were not included in the synthesis of peptides. Anti-human-IgG Ab was employed as a secondary antibody. The set of peptides my IgGs reacted to has been filtered with several criteria, one of them being subtracting the immune response common to healthy controls, to have an immune signature that differentiates me from healthy controls. The end result of this process is the set of the following six peptides.
1 | ALHHRHVGLRVQYDSG |
2 | ALHRHRVGPQLQSSGS |
3 | ALHRRQRVLSPVLGAS |
4 | ALHRVLSEQDPQLVLS |
5 | ALHVRVLSQKRPLQLG |
6 | ALHLHRHVLESQVNSL |
Table 1. My immunosignature, as detected by an array of 150.000 random peptides 20-amino-acid long, four of which are used for fixing them to the plate and are not included here.
The purpose of the following analysis is to search for the viral epitopes that elicited this immune response. To overcome the limitations enumerated at the end of the previous paragraph I have decided to search within the database of viral proteins for exact matches of the length of 7 amino acids. Why this choice? A survey of a set of validated B cell epitopes found that the average B cell epitope has a linear stretch of 5 amino acids (Kringelum, et al., 2013); according to another similar work, the average linear epitope within a conformational one has a length of 4-7 amino acids (Andersen, et al., 2006). To filter the matches and to reduce the number of matches due to chance, I opted for the upper limit of this length. I excluded longer matches to limit the number of mimotopes for conformational epitopes. Moreover, I decided to look only for perfect matches (excluding the possibility of gaps and substitutions) so to simplify the analysis. It is worth mentioning that a study of cross-reactive peptides performed for previous work (Maccallini P. 2016), (Maccallini P. et al. 2018) led me to the conclusion that cross-reactive 7-amino-acid long peptides might often have 100% identity.

So, to recap, I use the following method: BLAST search (blastp algorithm) against viral proteins (taxid 10239), a perfect match (100% identity) of at least 7-amino-acid peptides (≥43% query cover), max target sequences: 1000, substitution matrix: BLOSUM62.
4. Results
Table 2 is a collection of the matches I found with the method described above. You can look at figure 1 to see how to read the table.
ALHHRHVGLRVQYDSG (102_1_F_viruses) | |
9-LRVQYDS-15 QDP64279.1(29-35) |
Prokaryotic dsDNA virus sp. |
Archea, Ocean | |
8-GLRVQYD-14 AYV76690.1(358-364) |
Terrestrivirus sp |
Amoeba, forest soil | |
ALHRHRVGPQLQSSGS (102_2_F_viruses) | |
2-LHRHRVG-8 YP_009619965.1(63-69) |
Stenotrophomonas phage vB_SmaS_DLP_5 |
Stenotrophomonas maltophilia (HP) | |
ALHRRQRVLSPVLGAS (102_3_F_viruses) | |
2-LHRRQRV-8 QHN71154.1 (288-294) |
Mollivirus kamchatka |
Protozoa (R) | |
8-VLSPVLG-14 QDB71078.1 (24-30) |
Serratia phage Moabite |
Serratia marcescens (HP) | |
ALHRVLSEQDPQLVLS (102_4_F_viruses) | |
7-SEQDPQL-13 BAR30981.1 (151-157) |
uncultured Mediterranean phage uvMED |
Archea and Bacteria, Med. sea | |
3-HRVLSEQ-9 AXS67723.1 (494-500) |
Cryptophlebia peltastica nucleopolyhedrovirus |
invertebrates | |
2-LHRVLSE-8 YP_009362111.1 (74-80) |
Marco virus |
Ameiva ameiva | |
ALHLHRHVLESQVNSL (102_6_F_viruses) | |
2-LHLHRHV-8 YP_009119106.1 (510-516) |
Pandoravirus inopinatum |
Acanthamoeba | |
4-LHRHVLE-10 ASZ74651.1 (61-67) |
Mycobacterium phage Phabba |
Mycobacterium smegmatis mc²155 (HP) |
Table 2. Collection of the matches for the BLAST search of my unique set of peptides against viral proteins (taxid 10239). HP: human pathogen. See figure 1 for how to read the table.
5. Discussion
There are no human viruses detected by this search. There are some bacteriophages and three of them have as hosts bacteria that are known to be human pathogens. Bacteriophages (also known as phages) are viruses that use the metabolic machinery of prokaryotic organisms to replicate (figure 2). It is well known that bacteriophages can elicit specific antibodies in humans: circulating IgGs to naturally occurring bacteriophages have been detected (Dąbrowska K. et al. 2014) as well as specific antibodies to phages injected for medical or experimental reasons (Shearer WT et al. 2001), as reviewed here: (Jonas D. Van Belleghem et al. 2019). According to these observations, one might expect that when a person is infected by a bacterium, this subject will develop antibodies not only to the bacterium itself but also to its phages.

If that is the case, we can use our data in table 2 to infer a possible exposure of our patient to the following bacterial pathogens: Stenotrophomonas maltophilia (HP), Serratia marcescens (HP), Mycobacterium smegmatis mc²155 (HP). In brackets, there are links to research about the pathogenicity for humans of each species. M. smegmatis belongs to the class Actinobacteria, while S. maltophila and S. marcescens are included in the class γ-Proteobacteria.
Interesting enough, Serratia marcescens was identified as one of the possible bacterial triggers for the immunosignature of a group of 21 ME/CFS patients, in a study that employed an array of 125.000 random peptides (Singh S. et al. 2016). This bacterium accounts for rare nosocomial infections of the respiratory tract, the urinary tract, surgical wounds and soft tissues. Meningitis caused by Serratia marcescens has been reported in the pediatric population (Ashish Khanna et al. 2013).
Mollivirus kamchatka is a recently discovered giant virus whose hosts are presumed to be protozoa that inhabit the soil of subarctic environment (Christo-Fourox E. et al. 2020). Not sure what the meaning might be in this context.
6. Next step
The next step will be to perform a similar BLAST search against bacterial proteins to see, among other things, if I can find matches with the six bacteria identified by the present analysis. A further step will be to pursue an analogous study for eukaryotic microorganisms and for human proteins (in search for autoantibodies).