Note. This blog post should be read on a laptop. With smaller screens, some of the tables cannot be correctly formatted.

Introduction: what has been found

The UK Biobank is a large-scale biomedical database that includes genetic data on 500,000 UK citizens (Sudlow, C et al. 2015). Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is among the phenotypes reported by participants. The total number of ME/CFS patients included is 1208 females and 451 males, with control groups of more than 150 k individuals each. For each individual 805,426 variants (both SNPs and short InDels) were genotyped. From them, up to 30 million variant can be imputed using haplotype maps. This is the biggest GWAS study on ME/CFS patients ever made, so far. Many research groups have analyzed this huge quantity of data and the conclusion, according to (Dibble J. et al. 2020), is that the only variants to be significantly different between patients and controls are included in gene SLC25A15, which encodes mitochondrial ornithine transporter I, particularly variant rs7337312 (position13:41353297:G:A on reference genome GRCh37, forward strand). But this difference is statistically significant only when we consider female patients vs female controls (Table 1). For a detailed discussion of this result, the best reference is the already mentioned (Dibble J. et al. 2020). I wrote an analysis myself on this same result and on its possible implications, available here. My focus, while writing that document, was also understanding the architecture of the metadata generated by using the UK Biobank, by Neale’s Lab (R), one of the research groups that performed the statistical analyses. It was not obvious, at the beginning, how I could derive for each variant genotyped a table like Table 1. In fact, it is the result of the elaboration of five files, four of them of considerable size. Once I realized how to read and manipulate these files, I manually derived the data I was interested in, at first. Then I decide to write a code that could do the work in an automated fashion (see below).

Table 1. Frequencies and allele counts for variant rs7337312 (gene SLC25A15 ) among 1208 CFS female patients, 451 CFS male patients, 192945 female controls, 166537 male controls, total UK Biobank sample, and general population (global genome AD). P values for patients vs controls have been calculated for both sexes. I derived this table from metadata by Neale’s Lab (R).

What has not been found

Now, even if we are certainly disappointed by the fact that the positive result of these analyses does not add much to what we already knew about ME/CFS, it is important to stress that negative results are also relevant. Why? Because we can rule out the huge amount of genetic associations made through the years between fatigue in general, and CFS in particular, and genetic variants (see for instance the encyclopedic review Whang T. et al. 2017). As an example, I will focus on the variants of TNF alpha gene collected in Table 2. In 2006, an Italian group found that the minor allele of variant rs1799724 (TNF-857) was significantly more frequent in 54 ME/CFS patients than in 224 controls ( Carlo-Stella N. et al. 2006). In 2020, a group from Germany reported a statistically significant difference for the same variant, but only when ME/CFS patients without infectious triggered onset were considered (Steiner S. et al. 2020). Two other variants of gene TNF alpha were associated with morning fatigue among oncology outpatients with breast, prostate, lung, or brain cancer ( Dhruva A. et al. 2015). Are these variants different between the 1659 ME/CFS patients and the about 300 k controls in the UK Biobank database?

VariantPosition on GRCh37 (fs) Disease p val Reference
rs1799724 ch6:31,542,482 ME/CFS
ME/CFS w/o ITO
0.0028
0.0430
Carlo-Stella N. et al. 2006
Steiner S. et al. 2020
rs1800629
rs3093662
ch6:31,543,031
ch6:31,544,189
Morning fatigue in oncology outpatients
with breast, prostate, lung, or brain cancer
0.0250
0.0040
Dhruva A. et al. 2015
Table 2. Variants on gene TNF associated with ME/CFS, with ME/CFS without Infectious Triggered Onset (ITO), and with morning fatigue in oncology outpatients with breast, prostate, lung, or brain cancer.

To answer this question I have selected a suitable region of chromosome 6 (ch6:31,539,768 – ch6:31,546,495) that includes not only all the variants in Table 2, but the whole TNF gene. I then analyzed this region with a code I wrote in GNU Octave, version 6.3.0. This code generates three .csv files (one for each of the following groups: females only, males only, both sexes) with information on each one of the variants genotyped within the region mentioned (see Supplementary Material). An extract from these files, with only the variants collected in Table 2, can be seen in Table 3, Table 4, and Table 5. None of these variants reaches the statistically significant threshold, which for this kind of studies is considered to be p<5×10^(-8), for variants with a minor allele frequency above 5% (even lower for less common variants) (Fadista J et al. 2016).

rsidminref altmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.1500.0770.072
1800629AGA0.1970.7060.1950.197
3093662GAG0.0760.5900.0730.076
Table 3. Frequencies of the same variants of Table 2 from Neale’s Lab Metadata. This is the comparison between 1656 ME/CFS patients (females + males) and 359482 controls (females + males). min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

rsidminrefaltmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.1670.0780.071
1800629AGA0.1970.8570.1980.196
3093662GAG0.0760.6030.0730.076
Table 4. As in Table 3, but in this case the comparison is between 1208 female ME/CFS patients and 192945 female controls. min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

rsidminrefaltmin_AFp valalt
cases
alt
contr
1799724TCT0.0720.6090.0750.072
1800629AGA0.1970.3140.1860.199
3093662GAG0.0760.8560.0740.075
Table 5. As in Table 3 and Table 4, but in this case the comparison is between 451 male ME/CFS patients and 166537 male controls. min = minor allele; ref = reference allele; alt = alternate allele; min_AF = minor allele frequency. alt cases = frequency of alternative allele in cases; alt contr = frequency of alternate allele in controls. Elaboration made by the code TNF.m I wrote in Octave (see supplementary material).

Another way to conveniently visualize the differences between patients and controls (if any) is to plot p values in function of the position of the variants they refer to. We actually plot the log10 of 1/p or, in other words, -log10(p). This kind of diagram is known as Manhattan plot and in Figure 1 you see the manhattan plot for region ch6:31,539,768 – ch6:31,546,495, in the case of females + males. As you can see, none of the variants has a p value below 0.01. In Figure 2 you find the same plot, this time for females only (left) and males only (right).

Figure 1. Manhattan plot for ch6:31,539,768 – ch6:31,546,495, in the case of female + male CFS patients vs controls. Red dot is rs1799724, blue dot is rs1800629, yellow dot is rs3093662. None of these three variants reaches a p value below 10^-1. Plotted by manhattan_plot.m (see supplementary material).
Figure 2. Manhattan plot for ch6:31,539,768 – ch6:31,546,495, in the case of female CFS patients vs female controls (left) and male CFS patients vs male controls (right). Red dot is rs1799724, blue dot is rs1800629, yellow dot is rs3093662. None of these three variants reaches a p value below 10^-1.

Conclusions

In conclusion, these codes of mine provide as output the frequency of both the minor and the major allele among 1659 ME/CFS patients, stratified by sex, for about 15 million variants (including SNPs and InDels). They also provide the p value (beta test) for the comparison of these patients with about 300 k controls, along with general information on the variants (such as position, minor allele frequency among 500 k UK citizens, predicted consequences, etc). These codes do so by analyzing the metadata generated by Neale’s Lab, a research group that worked on the UK Biobank dataset. In this blog post, I have shown how to use these data to test previously published genetic associations between ME/CFS and gene TNF alpha.

Supplementary material

The files that I used as input for my codes are the following ones (click on them to download):

variants_TNF

20002_1482.gwas.imputed_v3.both_sexes_TNF

20002_1482.gwas.imputed_v3.females_TNF

20002_1482.gwas.imputed_v3.males_TNF

The output generated by TNF.m (see below) consists in the following files (click on them to download) which were used to build tables from 3 to 5:

TNF_total_both_sexes

TNF_total_females

TNF_total_males

Details on these files can be found in this PDF. The code that generates the plots is pretty simple and can be downloaded here. The code that generates the three output files from the four input ones is the one that follows.

% file name = TNF
% date of creation = 25/08/2021
% it generates the file TNF_tot by using the package IO
% see https://octave.sourceforge.io/io/ for a description
clear all
close all
pkg load io
%-------------------------------------------------------------------------------
% Data source
%-------------------------------------------------------------------------------
% The collection of the analysis generated by Neale’s Lab with the UK Biobank
% data is available here:
% https://docs.google.com/spreadsheets/d/1kvPoupSzsSFBNSztMzl04xMoSC3Kcx3CrjVf4yBmESU/edit?ts=5b5f17db#gid=178908679
% In line 2 of that file, you can download an explanation of the labels (README).
% At lines 9 and 10 you can download a collection of the phenotypes and of some
% of the attributes of the relative sample, for females and males respectively
% (phenotypes.female.tsv, phenotypes.male.tsv). We are interested in phenotype 20002_1482
% (Non-cancer illness code, self-reported: chronic fatigue syndrome). In line 11
% you find a file that contains the details on each variant (variants.tsv). You
% must unzip the files before opening them. I have used File viewer plus for reading
% this file (Excel doesn’t seem to open it).
%-------------------------------------------------------------------------------
% Length and position of TNF
%-------------------------------------------------------------------------------
% The position of TNF goes from ch6:31,543,342 to ch6:31,546,113 on the reference
% genome GRCh37 (alias h19), according to NCBI (https://www.ncbi.nlm.nih.gov/gene/7124).
% The two variants we are interested in has a position ch6:31,542,482 (rs1799724, TNF-857)
% and ch6:31,543,031 (rs1800629), on GRCh37. So we must include them, along with
% the whole TNF gene. Since rs1799724 falls also in gene LTA, I include this gene
% as well (LTA goes from ch6:31,539,876 to ch6:31,542,101). Then a suitable interval
% in this dataset is ch6:31,539,768 to ch6:31,546,495.
%-------------------------------------------------------------------------------
% numbers of cases and controls
%-------------------------------------------------------------------------------
n_cases_f = 1208
n_controls_f = 192945
n_cases_m = 451
n_controls_m = 166537
n_cases_b = n_cases_f + n_cases_m
n_controls_b = n_controls_f + n_controls_m
%-------------------------------------------------------------------------------
% name of the gene
%-------------------------------------------------------------------------------
gene = 'TNF';
%-------------------------------------------------------------------------------
% number of variants = upper limit for range - 1
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
[filetype, sh_names, fformat, nmranges] = xlsfinfo (file_name);
strn = char(sh_names(1,2));
strn = substr(strn,5) % upper limit for range as a string
n = str2num(strn) % upper limit for range as a number
%-------------------------------------------------------------------------------
% reading and storing variants_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('variants_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant = text;
% reading and storing the chromosome
range = strcat('B2:B',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
chr = num;
% reading and storing the position on chromosome
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pos = int32(num);
% reading and storing the ref allele
range = strcat('D2:D',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ref = text;
% reading and storing the alt allele
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
alt = text;
% reading and storing the rsid
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
rsid = raw;
% reading and storing the consequence
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence = text;
% reading and storing the consequence category
range = strcat('I2:I',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
consequence_category = text;
% reading and storing the Alternate allele count (calculated using hardcall genotypes)
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC = int32(num);
% reading and storing the Alternate allele frequency (calculated using hardcall genotypes)
range = strcat('M2:M',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AF = double(num);
% reading and storing the Minor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele)
range = strcat('N2:N',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_allele = text;
% reading and storing the Minor allele frequency (calculated using hardcall genotypes)
range = strcat('O2:O',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF = double(num);
% reading and storing the Hardy-Weinberg p-value
range = strcat('P2:P',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
p_hwe = double(num);
% reading and storing the Number of samples with defined genotype at this variant
range = strcat('Q2:Q',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_called = int32(num);
% reading and storing the Number of samples without a defined genotype at this variant
range = strcat('R2:R',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_not_called = int32(num);
% testing by writing some of the data
i = 2
variant (i)
chr (i)
pos (i)
ref (i)
alt (i)
rsid (i)
consequence (i)
consequence_category (i)
AC (i)
AF (i)
minor_allele (i)
minor_AF (i)
p_hwe (i)
n_called (i)
n_not_called(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.females_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.females_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_f = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_f = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_f = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_f = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_f = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_f = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_f = double(num);
% testing by writing some of the data
i = 2
variant_f(i)
minor_AF_f(i)
low_confidence_variant_f(i)
n_complete_samples_f(i)
AC_f(i)
ytx_f(i)
pval_f(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.males_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.males_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_m = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_m = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_m = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_m = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_m = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_m = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_m = double(num);
% testing by writing some of the data
i = 2
variant_m(i)
minor_AF_m(i)
low_confidence_variant_m(i)
n_complete_samples_m(i)
AC_m(i)
ytx_m(i)
pval_m(i)
%-------------------------------------------------------------------------------
% reading and storing 20002_1482.gwas.imputed_v3.both_sexes_GENE.xlsx
%-------------------------------------------------------------------------------
file_name = strcat('20002_1482.gwas.imputed_v3.both_sexes_',gene,'.xlsx')
sheet = 'Sheet1'
% reading and storing chr:position:ref:alt
range = strcat('A2:A',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
variant_b = text;
% reading and storing the Minor allele frequency in n_complete_samples_f
range = strcat('C2:C',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
minor_AF_b = double(num);
% reading and storing the quality control
range = strcat('E2:E',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
low_confidence_variant_b = raw;
% reading and storing the numeber of individuala genotyped in cases + controls
range = strcat('F2:F',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
n_complete_samples_b = double(num);
% reading and storing the Alternate allele count within n_complete_samples_f
range = strcat('G2:G',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
AC_b = double(num);
% reading and storing the number of alternative alleles in n_cases_f
range = strcat('H2:H',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
ytx_b = double(num);
% reading and storing the p values between cases and controls
range = strcat('L2:L',strn)
[num,text,raw,limits] = xlsread (file_name, sheet, range);
pval_b = double(num);
% testing by writing some of the data
i = 2
variant_b(i)
minor_AF_b(i)
low_confidence_variant_b(i)
n_complete_samples_b(i)
AC_b(i)
ytx_b(i)
pval_b(i)
%-------------------------------------------------------------------------------
% checking for wrong alignments between these 4 files
%-------------------------------------------------------------------------------
error = 0.;
for i=1:n-1
a = variant(i);
b = variant_f(i);
c = variant_m(i);
d = variant_b(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,c);
if ( (test==0) );
error = error +1;
endif
test = strcmp(a,d);
if ( (test==0) );
error = error +1;
endif
endfor
error
%-------------------------------------------------------------------------------
% defining major and minor alleles among all the variants
%-------------------------------------------------------------------------------
for i=1:n-1
frequency = AF(i); %frequency of the alternative allele
if ( frequency <= 0.5 )
minor_allele2(i)=alt(i);
major_allele(i)=ref(i);
else
minor_allele2(i)=ref(i);
major_allele(i)=alt(i);
endif
endfor
%-------------------------------------------------------------------------------
% defining alternate allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
alt_AF_f(i) = ytx_f(i)/(2n_cases_f); alt_AF_m(i) = ytx_m(i)/(2n_cases_m);
alt_AF_b(i) = ytx_b(i)/(2n_cases_b); endfor
%------------------------------------------------------------------------------- 
% defining alternate allele frequencies in control groups %------------------------------------------------------------------------------- 
for i=1:n-1 alt_AF_f_controls(i) = ( AC_f(i)- ytx_f(i) )/(2n_controls_f);
alt_AF_m_controls(i) = ( AC_m(i)- ytx_m(i) )/(2n_controls_m); alt_AF_b_controls(i) = ( AC_b(i)- ytx_b(i) )/(2n_controls_b);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in case groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f(i) = 1-alt_AF_f(i);
ref_AF_m(i) = 1-alt_AF_m(i);
ref_AF_b(i) = 1-alt_AF_b(i);
endfor
%-------------------------------------------------------------------------------
% defining reference allele frequencies in control groups
%-------------------------------------------------------------------------------
for i=1:n-1
ref_AF_f_controls(i) = 1-alt_AF_f_controls(i);
ref_AF_m_controls(i) = 1-alt_AF_m_controls(i);
ref_AF_b_controls(i) = 1-alt_AF_b_controls(i);
endfor
%-------------------------------------------------------------------------------
% checking for errors by comparing minor_allele with minor_allele2
%-------------------------------------------------------------------------------
error2 = 0.;
for i=1:n-1
a = minor_allele(i);
b = minor_allele2(i);
% function strcmp (string_1, strin_2) returns 1 if the two strings are identical
% it resturns 0 if the two strings have at least a different character in the same position
test = strcmp(a,b);
if ( (test==0) );
error2 = error2 +1;
endif
endfor
error2
%-------------------------------------------------------------------------------
% writing of GENE_total_females.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_f{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_f(i);
columns{i+1,15} = alt_AF_f(i);
columns{i+1,16} = alt_AF_f_controls(i);
columns{i+1,17} = ref_AF_f(i);
columns{i+1,18} = ref_AF_f_controls(i);
columns{i+1,19} = n_cases_f;
columns{i+1,20} = n_controls_f;
endfor
file_name = strcat(gene,'_total_females.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of GENE_total_males.csv
%-------------------------------------------------------------------------------
columns = cell(n,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_m{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_m(i);
columns{i+1,15} = alt_AF_m(i);
columns{i+1,16} = alt_AF_m_controls(i);
columns{i+1,17} = ref_AF_m(i);
columns{i+1,18} = ref_AF_m_controls(i);
columns{i+1,19} = n_cases_m;
columns{i+1,20} = n_controls_m;
endfor
file_name = strcat(gene,'_total_males.csv')
cell2csv (file_name, columns," ")
%-------------------------------------------------------------------------------
% writing of CLYBL_total_both_sexes.csv
%-------------------------------------------------------------------------------
columns = cell(1523,20);
columns{1,1} = "variant";
columns{1,2} = "chr";
columns{1,3} = "pos";
columns{1,4} = "rsid";
columns{1,5} = "minor_allele";
columns{1,6} = "major_allele";
columns{1,7} = "ref";
columns{1,8} = "alt";
columns{1,9} = "minor_AF";
columns{1,10} = "low_confidence_variant";
columns{1,11} = "p_hwe";
columns{1,12} = "consequence";
columns{1,13} = "consequence_category";
columns{1,14} = "pval";
columns{1,15} = "alt_AF_cases";
columns{1,16} = "alt_AF_controls";
columns{1,17} = "ref_AF_cases";
columns{1,18} = "ref_AF_controls";
columns{1,19} = "n_cases";
columns{1,20} = "n_controls";
for i=1:n-1
columns{i+1,1} = variant{i};
columns{i+1,2} = chr(i);
columns{i+1,3} = int32(pos(i));
columns{i+1,4} = rsid{i};
columns{i+1,5} = minor_allele{i};
columns{i+1,6} = major_allele{i};
columns{i+1,7} = ref{i};
columns{i+1,8} = alt{i};
columns{i+1,9} = minor_AF(i);
columns{i+1,10} = low_confidence_variant_b{i};
columns{i+1,11} = p_hwe(i);
columns{i+1,12} = consequence{i};
columns{i+1,13} = consequence_category{i};
columns{i+1,14} = pval_b(i);
columns{i+1,15} = alt_AF_b(i);
columns{i+1,16} = alt_AF_b_controls(i);
columns{i+1,17} = ref_AF_b(i);
columns{i+1,18} = ref_AF_b_controls(i);
columns{i+1,19} = n_cases_b;
columns{i+1,20} = n_controls_b;
endfor
file_name = strcat(gene,'_total_both_sexes.csv')
cell2csv (file_name, columns," ")
Advertisement

One thought on “A second glance at the UK Biobank Database

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s