A genetic perspective on the association between exercise and mental health in the era of genome-wide association studies


    • Triangulation across genetically informative designs supports causal effects of exercise behaviour on mental health.

    • These causal effects co-exist with genetic correlation between exercise behaviour and mental health.

    • Genetic moderation of positive mental health effects explains how causal effects co-exist with genetic pleiotropy.

    • Research strategies using genomic information are needed to improve the success of interventions on exercise behaviour.


Regular exercise is associated with mental health throughout the life course but the chain-of-causality underlying this association remains contested. I review results from genetically informative designs that examine causality, including the discordant monozygotic twin design, multivariate genetic models, Mendelian Randomization, and stratification on polygenic risk scores. Triangulation across the results from these and the standard designs for causal inference (RCT, prospective studies) in the extant literature supports the existence of causal effects of exercise on mental health as well as residual confounding by genetic factors that independently influence participation in regular exercise and mental health outcomes. I present an update of our earlier model for the genetic determinants of voluntary exercise behaviour. The model allows causal effects of regular exercise on mental health to co-exist with genetic pleiotropy through differences in the genetic sensitivity to the mental health benefits of exercise. The model encourages research on strategies that use genomic information to improve the success of interventions on regular exercise behaviour.

1. Introduction

In the first volume of the Mental Health and Physical Activity journal in 2008, we reviewed a number of large scale studies in monozygotic (MZ) and dizygotic (DZ) twins that showed how genetic pleiotropy and gene-by-exercise interaction contribute to the association between voluntary exercise behaviours and mental health in the population at large (de Geus & De Moor, 2008). Based on the available evidence at the time, we developed a model of the genetic determinants of exercise behaviour. This model challenged the tacit assumptions of many population-based intervention campaigns that (1) the exercise-wellbeing association mainly reflects causal effects of exercise, and that (2) exercise programs exert similar beneficial effects on all participants.

In the past decade, much progress has been made in the genetic epidemiology of complex behavioural traits. These studies no longer rely solely on twin-family designs, but employ arrays of directly measured single-nucleotide polymorphisms (SNPs) across the entire genome. Large international consortia performing meta-analyses of genome-wide association studies on hundreds of thousands of persons have uncovered sets of genetic variants influencing mental health (Baselmans et al., 2019; Wray et al., 2018) as well as sets of genetic variants influencing regular physical activity (Klimentidis et al., 2018). In parallel, the toolkit for inferring causality using genetically informative designs has greatly expanded (Davey-Smith & Hemani, 2014; Kong et al., 2018; Minica, Dolan, Boomsma, de Geus, & Neale, 2018). Here, I revisit the nature of the association between regular physical activity and mental health using these newly available datasets. I also update our earlier model of the genetic determinants of voluntary exercise behaviour.

2. Associations between complex traits

Regular physical activity and mental health both qualify as complex behavioural traits that are influenced by a myriad of factors acting at many levels. For physical activity this is aptly illustrated by a series of systematic literature reviews by the DEDIPAC consortium that identified 106 potential determinants in seven umbrella domains of biological, psychological, behavioural, physical (e.g. environmental), socio-cultural, socio-economic, and policy determinants (Carlin et al., 2017; Condello et al., 2017; Cortis et al., 2017; Jaeschke et al., 2017; O'Donoghue et al., 2018; Puggina et al., 2018). Although evidence supports a role for most of these determinants, their relative importance is far from equal in terms of the amount of explained variance. A huge literature now supports ‘genetics’ as the monolithic factor claiming the largest chunk of the observed interindividual variation in physical activity (Lightfoot et al., 2017; Van der Zee & de Geus, 2019). Although this findings applies to all classes of physical activity, it is particularly well-established for all domains of voluntary exercise behaviour (van der Zee, Helmer, Boomsma, Dolan, & de Geus, 2020). Heritability can be as high as 80% in late adolescence and does not get below 40% across the adult life span (de Geus, Bartels, Kaprio, Lightfoot, & Thomis, 2014; Huppertz et al., 2016; Lightfoot et al., 2017; Van der Zee & de Geus, 2019).

Like exercise behaviour, mental health is a complex multifactorial behavioural trait. Factors influencing mental health also broadly range from the genetic to the socio-economic and cultural domain. If we constrain mental health to the specific absence of recurring symptoms of anxiety and depression, we encounter a large number of risk and protective factors that co-determine the onset and course of anxious-depressive disorders that include epigenetic modifications (Jovanova et al., 2018), HPA-axis functioning (Holsboer, 2000), immune system functioning (Raison & Miller, 2003), brain structures regulating emotion and responses to re-warding and aversive stimuli (Nestler et al., 2002; Savitz & Drevets, 2013), personality (Middeldorp et al., 2011), childhood trauma (Heim, Shugart, Craighead, & Nemeroff, 2010), stressful life events (Kendler, Karkowski, & Prescott, 1999), social network strength, and socioeconomic status (Lorant et al., 2003). Many of these factors –even those described as ‘environmental” - can be influenced by genetic variation between individuals (Vinkhuyzen, van der Sluis, de Geus, Boomsma, & Posthuma, 2010) as a consequence of active and evocative gene-environment correlation. Not surprisingly, therefore, anxiety and depressive disorders display substantial heritability, with estimates varying closely around 40% (Flint & Kendler, 2014; Hettema, Neale, & Kendler, 2001). When mental health is taken literally, i.e. not as the absence of anxious and depressive symptoms but as the presence of higher levels of psychological wellbeing, similar heritability estimates are encountered. Traits like happiness, satisfaction with life, and quality of life show heritability between 30% and 40% (Bartels, 2015; Stubbe, Posthuma, Boomsma, & de Geus, 2005; Weiss, Bates, & Luciano, 2008).

While there is no doubt about the existence of an association between exercise behaviour and mental health (Chekroud et al., 2018) we need to appreciate that their multifactorial determinants will cause some confounding to be the rule rather than the exception. Confounding can bias the reported effect sizes of the association, or even generate the association in the absence of a true causal effect. One approach in epidemiology to reduce the impact of confounding is to use a prospective design, where the exposure is measured at baseline and the outcome is measured at a distant follow-up. This design rules out ‘contemporary confounders’ that in cross-sectional designs can independently influence the exposure and outcome. For instance, going through a stressful phase of life can lead to feelings of unhappiness and at the same time be the cause of reduced opportunity/drive to exercise. Also simultaneous recording of exercise activities and depressive symptoms may suffer from the consistency motif (Podsakoff, MacKenzie, & Podsakoff, 2012) where answers to depression items about energy, fatigue and behavioural motivation may be brought in line with exercise reporting. These types of confounding would be strongly attenuated in longitudinal follow-up, which is why prospective studies are greatly favoured in epidemiology. Prospective designs also address the potential problem of reverse causality. Reverse causality in the exercise-mental health association arises when mental health itself is a necessary condition to engage in regular exercise. Emotionally well-adjusted, outgoing, self-regulating and self-confident individuals with low levels of stress could be simply more attracted to sports and exercise, and only such persons may have the necessary energy and self-discipline to maintain an exercise regime. Prospective designs can demonstrate (or rule out) the existence of such reverse directional causation by adding a prediction of the exercise behaviour at follow-up by mental health at baseline.

Most of the prospective studies on the association between exercise and mental health reported that regular exercise at baseline was associated with less depression and anxiety at follow-up (Schuch et al., 2018; Stubbs et al., 2017). Findings are not uniform. Some studies find no significant longitudinal association (Birkeland, Torsheim, & Wold, 2009; Cooper-Patrick, Ford, Mead, Chang, & Klag, 1997; Kritz-Silverstein, Barrett-Connor, & Corbeau, 2001; Strohle et al., 2007; Weyerer, 1992) or an association limited to subgroups (Edman, Lynch, & Yates, 2014; Farmer et al., 1988) or to symptom counts but not extending to clinical diagnosed depression (Stavrakakis et al., 2013). Furthermore, various studies have demonstrated the existence of simultaneous reverse causal effects (Azevedo Da Silva et al., 2012; Jerstad, Boutelle, Ness, & Stice, 2010; Ku, Fox, Chen, & Chou, 2012; Lindwall, Larsman, & Hagger, 2011; Pinto Pereira, Geoffroy, & Power, 2014; Roshanaei-Moghaddam, Katon, & Russo, 2009; Stavrakakis, de Jonge, Ormel, & Oldehinkel, 2012).

The bidirectional causality shown in these studies provides support for both the ‘protection hypothesis’ and the ‘inhibition hypothesis’. The protection hypothesis maintains that regular exercise can decrease depressive symptoms through its biological (increased neuroplasticity, angiogenesis, lowered inflammation and cortisol levels, anti-oxidative effects) and psychosocial (self-esteem, social support) actions (Kandola, Ashdown-Franks, Hendrikse, Sabiston, & Stubbs, 2019). According to the ‘inhibition hypothesis’ the lack of energy, anhedonia, and social withdrawal, seen so prominently in depression, all exert a negative influence on regular exercise behaviour (Goodwin, 2003).

Taken that both regular exercise and mental health are heritable traits, there is the alternative possibility of genetic confounding, which would mimic bidirectional causality. Part of the many genetic variants influencing these trait may overlap, creating horizontal genetic pleiotropy when they influence these traits through independent routes (Minica et al., 2018; Verbanck, Chen, Neale, & Do, 2018). Scoring low on extraversion and high on neuroticism might prevent the adoption of regular exercise behaviour while simultaneously putting the individual at higher risk for low mental health (De Moor & De Geus, 2018). This creates a thorny issue for inferring causality, even in prospective designs, as is illustrated in Fig. 1. In this simplified example, the ground truth is that the association between regular exercise behaviour at baseline and depressive symptom at follow-up is partly causal and partly due to underlying genetic factors that have an opposite, but independent, effect on both traits. This means that the genetic variants that lead to decreased risk for depression may also influence voluntary exercise behaviour. If these genetic variants are expressed throughout the life course, their effects on exercise behaviour can precede their effects on mental health at a later time point. This will overestimate the beneficial effects of exercise in prospective studies. In the example in Fig. 1, the causal path coefficient β is overestimated twofold and explained variance fourfold (4% rather than the true 1%). Optimistic biases in the effect size of exercise on health outcomes can be detrimental to, e.g. cost-effectiveness analyses comparing interventions using ‘exercise as medicine’ to pharmacological or behavioural interventions.

  1. Download : Download high-res image (136KB)
  2. Download : Download full-size image

Fig. 1. The thorny issue of causality.

Note: In the example depicted, heritability of MVPA at age 40 and depressive symptoms at age 50 is respectively 45% (direct genetic effect following path tracing rules: 0.67*1*.67) and 32% (sum of the direct genetic effect: 0.545*1*.545 and the indirect effect through MVPA's causal chain: 0.3*0.671*-0.1). The correlation between the latent genetic factors (Rg) is −0.30 (horizontal pleiotropy) and the prospective correlation between MVPA and depression is −0.20. These values were chosen to closely resemble the empirical findings from a longitudinal twin study on exercise behaviour and anxious depressive symptoms (De Moor et al., 2008). True causal β (−0.10) is overestimated 2-fold and the amount of explained variance (1%) in depression by earlier MVPA levels is overestimated 4-fold.

A particularly strong design to deal with such genetic confounding is the randomized controlled trial (RCT). Translated to our main question, this amounts to the deliberate manipulation of the exercise exposure in an intervention group compared to that of a control group, with group assignment being completely random. In sufficiently large samples, this results in a random distribution of the (genetic) confounders over control and experimental groups that differ systematically only by exposure to the exercise intervention. Confidence in the causal hypothesis is then bolstered by observing a larger pre-to-post assessment increase in mental health in the exercising group. Often cited reviews on this topic, including those with meta-analyses, have expressed cautious optimism about the efficacy of exercise interventions for the enhancement of mental health in the population at large (Lawlor and Hopker 2001; Dunn et al., 2005; Daley 2008; Cooney et al., 2013). However, a strong case has been made that such reviews may have underestimated the effect sizes by the poor choices made in the meta-analytic methodology (Ekkekakis, Hall, & Petruzzello, 2008). Furthermore, small or inconsistent effect sizes across studies could simply reflect the short duration of the exercise interventions (weeks or months) in such RCTs, the at-best modest sample sizes used, and in particular the strong selection bias introduced in almost all these studies. In an attempt to increase the yield of the experimental exposure to exercise, exercise training studies typically exclude persons that are already engaged in some form of regular exercise at baseline. This (understandable) bias of including sedentary persons may inadvertently exclude exactly those individuals that reap the largest psychological benefits of exercise, thus underestimating the success of exercise in a true population-based sample (De Moor & De Geus, 2018).

More robust findings have been found in RCTs in individuals who had low initial levels of wellbeing at the start of the exercise program, like clinically depressed or clinically anxious patients (Schuch et al., 2016a, Schuch et al., 2016b; Schuch et al., 2016a, Schuch et al., 2016b; Trivedi et al., 2011). A meta-review of these studies reports beneficial psychological effects of exercise in e.g. depressive and bipolar disorder that match or even exceed those of pharmacological treatment (Stubbs et al., 2018). While there have also been a number of RCTs did not find antidepressant effects of exercise, yielding a high media exposure, these have been criticised for their poor design (Ekkekakis, Hartman, & Ladwig, 2018). However, the positive findings in patients able and willing to engage in exercise might overestimate the effects of exercise in the full spectrum of patients, as RCTs may suffer from hidden self-selection bias, expectation and social desirability effects and selective drop-out (De Moor & De Geus, 2018; Ekkekakis, 2008; Kruisdijk, Hendriksen, Tak, Beekman, & Hopman-Rock, 2018).

2.1. Using triangulation to address the thorny issue of causality

Given the difficulties in inferring causality, why is it that physical activity guideline committees around the world recommend us with such great confidence that increasing our physical activity will boost our health and wellbeing? This confidence is based on well-wrought ‘triangulation’ across a variety of different research designs that together should overcome the limitations and assumptions of any single design (Lawlor, Tilling, & Smith, 2016). Specifically, the beneficial effects of physical activity on mental health are distilled from the convergence of results across (1) cross-sectional studies taking into account known and measured confounders as covariates, (2) prospective studies with a cross-lagged panel design, and (3) well-conducted RCTs. Even if exceptions remain, the majority of studies support the beneficial effect, whereas the reverse – a detrimental effect - is rarely encountered. Based on such triangulation, physical activity guideline committees conclude that a beneficial effect of physical activity on mental health is real (Stubbs et al., 2018; USDHS, 2018; WHO, 2010).

Even so, future guideline committees might be able to provide even more robust evidence and better effect size estimates if they systematically include more recent research designs in the triangulation for causality, in particular genetically informative designs like those based on twin studies (Fletcher & Lehrer, 2011; Kendler et al., 1999) or on Mendelian Randomization (Davey-Smith & Hemani, 2014; Pingault et al., 2018; Speed et al., 2019). The latter two designs are particularly useful to address the concerns raised above on genetic confounding. They exploit genetic information in the study design either through the expected degree of genetic similarity based on kinship (e.g. MZ and DZ twin or parent-offspring pairings) or by measuring genetic variation at DNA level, usually in the form of single nucleotide polymorphisms (SNPs).

3. Twin and genome-wide association studies

The most intuitive part of the twin design is the use of genetically identical monozygotic (MZ) twin pairs who are discordant for an exposure. For example, one twin being a non-exerciser (e.g. less than 240 MET minutes weekly) and the genetically identical co-twin being a vigorous exerciser (e.g. more than 1200 MET minutes weekly). If regular exercise has a true causal effect on mental health, and the association is not due to confounding by genetic factors or the family environment shared as children, one can expect that within these discordant MZ pairs the twin who exercises regularly has better mental health than the co-twin who does not exercise. In contrast, if the association is due to genetic confounding then the genetic protection against mental disorders should be as effective in the non-exercising twin as in the exercising twin, and both twins would be expected to have comparable mental health.

Another approach to test the causal hypothesis is the bivariate twin-family model (De Moor, Boomsma, Stubbe, Willemsen, & de Geus, 2008). The advantage of this model is that it uses the full data from both MZ and dizygotic (DZ) twins as well as any other family members (parents or siblings of twins). Bivariate twin models test the association between exercise behaviour and depression symptoms as a function of overlapping (‘common to both traits’) genetic, shared environmental and unique environmental factors. Even if they do not model a direct causal effect of one trait on the other,1 they are still informative on whether their association could be causal. The line of reasoning is as follows. If the observed association reflects a causal effect, all genetic and environmental factors that influence the causal agent (e.g. exercise behaviour) will, through the causal chain, carry over to the outcome (e.g. depressive symptoms). In other words, if exercise behaviour is found to be influenced by latent genetic and unique environmental factors, significant paths must be found in a bivariate twin analysis between these latent exercise factors and depressive symptoms. Both the absence of a significant genetic or the absence of a significant unique environmental correlation are not compatible with the causal hypothesis.

We previously reviewed the evidence from discordant MZ and bivariate twin studies that investigated the prospective associations between voluntary exercise activities in leisure time and indices of mental health in adults and adolescence like lack of anxiety, depression and internalizing problems, and high subjective wellbeing (de Geus & De Moor, 2008). This review quite consistently suggested that the ‘third underlying factor’ of shared genetics is of importance in explaining these associations. In particular, a significant genetic correlation between exercise behaviour and mental health has emerged as a very robust finding. At the same time these studies failed to support a causal effect of exercise behaviour on mental health.

A caveat in these twin studies was that they hinged on having sufficient statistical power to detect intrapair regression in change scores or the presence of a significant correlation between unique environmental factors. As these factors also contain (uncorrelated) measurement error, this may be problematic. Although the use of repeated measures of the exercise exposure in multivariate extensions of the bivariate design can partly address this problem (De Moor et al., 2008) this would still not overcome the bias that arises when the measurement error in exercise behaviour is correlated to the outcome, i.e. those low in mental health under- or over reporting exercise activities. This touches on a second caveat in the reviewed twin studies: they all used self-reported exercise behaviour.

For total daily physical activity, self-report has shown to be poorly correlated to objective recordings (Lee, Macfarlane, Lam, & Stewart, 2011; Prince et al., 2008). A somewhat alarming finding in this regard is that the two largest prospective studies relating accelerometer recordings to mental health both failed to detect an association between total physical activity or MVPA and depressive symptoms (Kandola, Lewis, Osborn, Stubbs, & Hayes, 2020; Toseeb et al., 2014). An extra complication is that there seems to be a larger discrepancy between self-report and accelerometers in those with depressive symptoms; they tend to report higher subjective levels of total physical activity than is encountered with accelerometers (Schuch et al., 2017).

In defence of the twin studies using self-report only: they did focus on voluntary exercise activities of moderate to vigorous intensity done in leisure time. This is possibly the subset of the MVPA that is least sensitive to recall bias. When self-report is limited to leisure time exercise activities, recall may be much easier due to the planned and structured nature of such activities. When self-report is expanded to other forms of voluntarily engagement in regular MVPA, for instance active transportation, intensive gardening, do-it-yourself home improvements or dancing, it may become less reliable explaining the deviation from MVPA recorded by the accelerometer signal. In addition, accelerometers will pick up a host of non-voluntary physical activities related to work and school that are even harder to recall with any precision.

So far, MVPA has been measured with accelerometers in an only few twin studies, and none of these twin studies have addressed the link between accelerometer-derived MVPA and mental health directly (den Hoed et al., 2013; Schutte et al., 2020; Waller et al., 2018). Nonetheless, they did indicate that the concerns for genetic confounding also apply here. Substantial heritability, varying between 46% and 55% has been found for accelerometer–derived MVPA suggesting that the heritability of self-reported exercise behaviour is not a mere reflection of recall, comprehension, and social-desirability biases linked to heritable personality traits. Indeed, a direct comparison between self-reported and device-based measurements of MVPA showed higher heritability for device-based MVPA (Schutte et al., 2020) and this pattern was repeated in genome-wide association (GWA) testing in the UK biobank study (Doherty et al., 2018; Klimentidis et al., 2018).

The latter UK biobank study represents the largest GWA study of physical activity traits to date with various measures based on self-report (nmax = 377,234) including self-report of doing strenuous sports or other exercises (SSOE) for 2–3 days/week or more for a duration of 15–30 min or greater. In addition, wrist-worn accelerometers in a large subset (nmax = 91,084) allowed the objective assessment of total physical activity and MVPA. Using all SNPs tested, chip heritability’ estimates for self-report PA measures were approximately 5% whereas estimates for the accelerometer-based measures were much higher at 10% for moderate intensity activity up to 21% for overall activity (Doherty et al., 2018; Klimentidis et al., 2018). This large GWA study not only confirmed a significant genetic contribution to physical activity behaviours using estimation methods that do not depend on a twin or family design,2 it also presents a number of unique advantages over twin and family studies by pinpointing the actual genetic loci involved in physical activity. These genetic loci can inform research elucidating the biophysiological pathways leading from genetic variation to behavioural variation. They are also a source of additional methods in our ‘triangulation toolkit’ to test the causal effects of exercise on mental health outcomes. Of these, the Mendelian Randomization (MR) technique has gained by far the most popularity in the past years as the method of choice for causal inference in observational research (Davey-Smith & Hemani, 2014; Lawlor, Harbord, Sterne, Timpson, & Davey-Smith, 2008).

4. Mendelian Randomization

Instead of correlating latent genetic and environmental factors that are thought to influence two traits as done in a twin or family design, MR is based on actual measured genetic variants. The logic for counterfactual testing of causality remains intact. A genetic instrumental variable, typically a single nucleotide polymorphism (SNP), that influences an exposure variable (such as exercise behaviour) should, through the causal chain, also predict the outcome (such as mental health). Failure of a genetic variant that causes the exposure to also influence the outcome is seen as a falsification of the causal hypothesis. In reverse, if the genetic variant does influence the outcome, and it is reasonable to assume this happens only through its effect on the exposure, this greatly increases our confidence in the causal hypothesis. An example of such a genetic instrument could be a gene variant that is exclusively expressed in muscle tissue and increases the individual's ability to perform intense exercise without large discomfort. If people carrying one or more of these variants are proportionally less likely to be depressed, then this is supportive of the hypothesis that exercise causally reduces the incidence of depression because a direct effect on depression of a muscle-expressed gene is less plausible (although also not impossible, e.g. through an effects on muscle IL-6 production).

MR is the experiment of nature that comes most close to an RCT in which participants are allocated to different exposure levels independently of confounding. The big advantage of MR is that it is based on measured genetic variants and can be applied to any large population-based cohort, whereas twin-based methods for causality testing rely on the estimation of effects of latent (unmeasured) genetic factors that can only be applied in cohorts based on twin registries. A second advantage of MR is that it captures prolonged exposures rather than the weeks or months of exposure within an RCT. A third advantage is that MR rules out reverse causality, even in cross-sectional designs, as lifetime exposure caused by DNA variants is essentially determined at birth (the label ‘MR’ honouring Mendel's laws of segregation and independent assortment during meiosis) and always precedes the outcome. The genetic instruments for exposure and outcome do not need to be confined to a single SNP. Regression of multiple SNP-outcome effects on SNP-exposure effects can be combined into a single effect size using the variance–weighting methods of meta-analysis.

If a genetic instrument is also available for the outcome (e.g. a SNP significantly associated with depression) bidirectional MR can explicitly test a possible reciprocal causal relationships between exposure and outcome (Davey-Smith & Hemani, 2014). MR does not need new data collection but can effectively reuse the available statistics from large-scale, non-overlapping GWA studies. The genetic instruments for both exposure and outcome can be extracted from the summary statistics of GWA meta-analyses on the exposure and outcome. Employing a bidirectional MR design, Choi et al. (2019) combined the results from the previous mentioned UK biobank GWA for accelerometer-based physical activity and that of the meta-analysis of GWA studies from the Psychiatric Genomics Consortium (Wray et al., 2018) on lifetime diagnosis of major depression. Using a genetic instrument consisting of 10 SNPs associated at p < 1 × 10−7 with device-based physical activity, they found a meta-analytic odds ratio of 0.74 (95%CI 0.59–0.92) for the risk of major depressive disorder per 1-SD unit increase in mean daily acceleration detected by the accelerometers. Of note, such causal effects on depression could not be detected for self-reported MVPA levels. In addition, no evidence was found for the reversed causal effect of depression on either self-reported or accelerometer-based physical activity.

This MR study constitutes the best available evidence to date for a causal effect of MVPA on mental health, based on epidemiological data. Even so, as with all methods, MR has to make some assumptions that, when violated, constitute a serious threat to deriving reliable causal effect sizes. MR assumes that the genetic instruments are strong (i.e. they explain substantial variance in the exposure), that there is no horizontal pleiotropy, and that intergenerational transmission through the shared (family) environment can be ignored. Due to the highly polygenic nature of complex behavioural traits like exercise and depression most causal SNPs have very small effect sizes, which makes them weak instrumental variables hampering MR unless very large samples are used. Weak instrument bias can be partly amended by (inverse variance weighted, IVW) averaging across multiple SNPs but this then increases the risk of violating another important assumption of MR, the ‘no horizontal pleiotropy’ assumption. Horizontal pleiotropy occurs when SNPs are used as genetic instruments that have an effect on both exposure and outcome, other than through their causal path. For instance, a SNP could influence a trait like neuroticism, which can independently affect depression and exercise behaviour. A number of sensitivity analysis can be used to mitigate bias due to pleiotropy (reviewed in Burgess, Foley, & Zuber, 2018) although at the cost of reduced statistical power. Choi et al. (2019) used these solutions to assure that their findings were robust to invalid instrument bias due to pleiotropy. They could not, however, rule out a possible violation of MR assumptions by dynastic effects.

Dynastic effects occur when parental genotypes affect the child via the environment that parents create for their child, also called ‘genetic nurture’ because genotypes affect the nurturing environment (Bates et al., 2018; Kong et al., 2018). This taps in to the classical nature versus nurture complexity faced by behavioural genetics research. Non-genetic, cultural transmission of behavioural traits from parents to offspring co-exists with genetic transmission, and are both sources of parent-offspring resemblance. They are very hard to separate because they are intrinsically correlated. For MR, dynastic effects lead to a direct violation of the assumption that all effects of the exposure act directly on the outcome because the genetic instrument in the child is passively correlated with the environment created by the parents. Dynastic effects cannot be controlled for in MR on unrelated individuals, as was used by Choi et al. (2019). They can be addressed by embedding MR within family-based designs (Davies et al., 2019). The strongest application of such an approach is to combine MR with the direction-of-causation modelling in the classical twin design (Minica et al., 2018). This ‘MR-DoC’ method can simultaneously mitigate against distortion by assortative mating, population stratification, and dynastic effects while explicitly modelling both pleiotropic and casual effects (Minica, Boomsma, Dolan, de Geus, & Neale, 2020). As far as I know, this promising method has not yet been applied to the field of mental health and physical activity.

5. Stratification on polygenetic risk scores

When our basic concern is genetic confounding of the exercise – mental health relationship, the recent availability of genome-wide association statistics has enabled yet another strategy to deal with this. The basic idea is that if genetic risk for depression is causing (prospective) differences in exercise behaviour, then stratifying for the genetic risk for depression should lead to an attenuation of the exercise – depression relationship within each of the risk strata. In actuality, Choi et al. (2020) found the reverse. In 7968 individuals of European ancestry a lifestyle survey was used to assess eight different types of recreational physical activity (walking/hiking, jogging, running, biking, racquet sports, swimming, aerobics and similar high-intensity exercise) and low-intensity exercise (e.g., yoga and stretching). They next used the electronic health records of the Partners HealthCare hospital system to identify incident episodes of depression based on two or more diagnostic billing codes for a depressive disorder within 2 years after the survey, and no such codes in the year prior. Higher amounts of time spent in recreational physical activity (including low intensity exercise) were associated with reduced odds of incident depression across all levels of genetic vulnerability for depression. The latter was operationalised as the polygenic risk score (PRS) derived based on a large GWA meta-analysis for major depression (Howard et al., 2019). The most important message of this work was that even individuals with high genetic vulnerability for depression could avoid new depressive episodes when they are sufficiently physically active (Choi et al., 2020).

6. Causality in the regular exercise – mental health association: where do we stand?

Taken the evidence reviewed above, how do we answer a major question for the readership of this journal: do the decreased levels of anxiety and depression and the increased levels of psychological well-being found in exercisers truly reflect a causal effect of exercise? From triangulation across the various genetically informed methods described above (discordant MZ twin design, intrapair MZ regression, multivariate genetic modelling, Mendelian Randomization, and stratification on polygenic risk scores) I conclude that, in the population at large, regular exercise participation is associated with higher levels of life satisfaction and happiness and lower levels of anxiety and depression in part through true causal effect of exercise on mental health. At the same time there is strong evidence from multiple sources to suggest that such causal effects co-exist with horizontal genetic pleiotropy, i.e. there are genetic factors that influence both exercise behaviour and well-being.

In short, the glass is both half-full and half empty. On the one hand, the extant data overwhelmingly support the hypothesis that enhancing regular exercise behaviour can be an effective strategy to increase mental health, even when we acknowledge parallel pleiotropy and reverse causality. Many well-conducted RCTs specifically testify to its potential therapeutic effects in psychiatric settings. On the other hand, it remains a matter of considerable conjecture how large the causal effect of exercise is, and how much of a sustainable mental health benefit interventions could induce in the population at large. Taken the evidence for genetic confounding, population-based associations, even if prospective, are very likely to overestimate the true magnitude of the effect of exercising on mental health. How large this overestimation is and how it varies across different types and intensity levels of exercise remains unknown. Not helpful is that all of the methods introduced above to triangulate the causal effect of exercise are known to perform less well under complex mechanisms of causality (e.g., a combination of common genetic factors and bidirectional causality). Yet, such a complex amalgam of causal mechanisms is likely to represent the true state of affairs.

Fortunately, a number of developments give rise to substantial optimism for future genetic research on the topic of exercise and mental health. An increasing number of large cohorts studies using genetically informative designs are gathering accelerometer-based data on regular physical activity habits, not seldom in parallel to qualitative self-report on the type of activities, i.e. sports and exercise activities versus transportation or household activities (Bernhardsen et al., 2019; Mork, 2019; Waller et al., 2018). Large international GWA meta-analysis consortia are working to elucidate the genetic variants underlying the heritability of exercise behaviours and mental health traits, with some even using a within-family GWA approach (Davies et al., 2019). This will increase the validity of results based on MR and stratification on polygenic risk scores.

7. Bolstering epidemiological approaches by testable mechanistic models

The above approaches to test causality through observational data in genetically informative samples reflect a counterfactual framework (Pearl, 2010, 2014). Conclusions from such a framework gain credibility when they can be supported by a mechanistic model for the potential causal processes (Canali, 2019). In short, we need a testable model that is in line with the conclusion from our triangulation, and explains how regular exercise behaviour causes better mental health, and yet incorporates the possibility of genetic factors that influence both exercise behaviour and well-being.

We presented an early version of such a model before, in the context of explaining the heritability of exercise behaviour (de Geus & De Moor, 2008). Since that time substantial empirical evidence has been gathered to support the model (Huppertz et al., 2014; Schutte, Nederend, Bartels, & de Geus, 2019; Schutte et al., 2017a, Schutte et al., 2017b) using a behaviour genetics approach. In parallel, our model has shown increasing overlap with data stemming from research approaches in exercise psychology, including the action control framework (Rhodes & de Bruijn, 2013) and dual-process theories (Brand & Ekkekakis, 2018) that address the well-known intention-behaviour gap in the adoption of exercise behaviours by appreciating the importance of the affective responses to exercise. Based on these developments I present an update of the model in Fig. 2, Fig. 3 below.


Fig. 2. Pathways for a causal effect of exercise behaviour on mental health.

Note: The figure depicts how the affective response during and after exercise and perceived exercise ability influence regular voluntary exercise behaviour as well as mental well-being. Solid arrows depict causal pathways; round dotted arrows depict moderation.

Download : Download high-res image (449KB) Download : Download full-size image


Fig. 3. Genetic correlation between exercise behaviour and mental health.

Note: The higher order latent genetic factor in the oval on the left contains all sets of genetic variants that explain the heritability of regular voluntary exercise behaviour. The sets of variants that are relevant for the model (G1 through G8) are repeated in the figure close to the traits where they apply. By influencing the causal mechanisms through which exercise influences mental health, these genetic variants create a genetic correlation between exercise and mental health. This genetic pleiotropy is indicated by the large dashed arrows.

Download : Download high-res image (554KB) Download : Download full-size image

The basis of the model (Fig. 2) can be designated ‘behaviouristic’ or ‘hedonistic’, as it has a strong focus on two instrumental conditioning loops influencing the adoption and maintenance of regular exercise behaviour. One loop is related to the affective responses during and after exercise; the other loop is related to the rewarding effects of being able to perform well on a valued activity. I explicitly note the model has its starting point in people that engage or have engaged in exercise activities multiple times. All the socio-cultural and socio-economic determinants that could frustrate or facilitate such engagement are conveniently put into the box ‘determinants of engagement in exercise activities’. The model essentially only starts when an exercise activity takes place and looks at how the experience of exercising itself will impact on future exercise behaviour.

The top part of Fig. 2 focuses on the net affective responses, which occur during and shortly after those exercise activities. These consist of a dynamic mixture of appetitive and aversive effects. Although on average most people tend to feel bad during exercise and rather well afterwards, there are large individual differences in the affective responses during and after exercise, which can be further moderated by exercise intensity (Ekkekakis, Hall, & Petruzzello, 2005, 2008; Backhouse, Ekkekakis, Bidle, Foskett, & Williams, 2007; Ekkekakis, 2006; Ekkekakis, Parfitt, & Petruzzello, 2011; Katula, Blissmer, & McAuley, 1999) and setting (Dunton, Liao, Intille, Huh, & Leventhal, 2015; Focht, 2009). For ease of presentation, Fig. 2 divides the population into individuals where the appetitive effects of the total exercise experience outweigh aversive effects and individuals where, vice versa, the aversive effects outweigh the appetitive effects. In reality, this balance between the appetitive and aversive effects will be dynamic over time for each individual rather than fixed for life. The model maintains that if most of the time the exercise-induced affective response is positive for an individual, then there is positive reinforcement and such individuals are likely to maintain the behaviour and become regular exercisers. Vice versa, if the net affective response is negative most of the time, the individuals experiencing this repeated punishment are at high risk of dropping out and becoming non-exercisers.

The importance of affective responses for future exercise behaviour has received substantial empirical support as reviewed by Rhodes & Kates (2015). Positive affective experiences were seen to impact key motivational constructs like the affective judgement about future exercise (expectation of enjoyment, fun and pleasure) and the formation of self-efficacy which translated into more exercise behaviour at follow-up (Rhodes & Kates, 2015). The central role of affective responses could also account in part for the well-known associations of a number of psychological traits with regular exercise, including a limbic ‘activity drive’, personality, self-regulatory capacity and positive attitudes towards exercise. In our model, these operate by moderating the instrumental conditioning effect of the immediate aversive and appetitive responses during exercise as well as those that occur shortly after the exercise bout. Differences in ‘activity drive’ and personality may moderate the acute affective effects directly, i.e. making exercise and the induced arousal feel good rather than bad, whereas self-regulatory capacity and positive attitudes may act to increase the tolerance for any aversive effects.

The activity drive is a hypothesized phylogenetically old mechanism reflecting an innate drive to be physical active to maintain energy balance (Rowland, 2017). The fulfilment of this ‘activity drive’ could be intrinsically rewarding, just as relieving hunger or thirst (Garland et al., 2011; Kelly et al., 2010; Lightfoot, 2011). A substantial body of evidence shows regular exercisers to score higher on extraversion and sensation seeking (de Moor, Beem, Stubbe, Boomsma, & de Geus, 2006; Rhodes & Smith, 2006; Wilkinson et al., 2013; Wilson & Dishman, 2015). Both of these traits have been linked to individual differences in the functioning of the arousal systems that are activated in response to exercise. Exercise-induced arousal may simply be more rewarding to extraverts than introverts and provide the stimulation that sensation seekers crave (Eysenck, Nias, & Cox, 1982). One of the strongest genome-wide significant genes to emerge from the UK biobank GWAS for regular physical activity was the CADM2 gene that is known to be associated with sensation seeking and extraversion (Boutwell et al., 2017; Sanchez-Roige et al., 2019).

The emphasis on the affective aspects of exercise does not rule out cognitive processing and goal setting as being important too. In contrast to theoretical models of behavioural change (e.g. Health Belief Model, Theory of Planned Behaviour, Transtheoretical stages of change Model, Social Cognitive Theory, Health Action Process Approach) that favour cognitive and reasoning processes as primary determinants of exercise behaviour, our model recasts the role of such cognitive processing as that of a moderator of the affective responses. Strong self-regulatory capacity may tip the balance between appetitive and aversive effects by down-weighting the latter through cognitive reinterpretation. This may be more salient in experienced exercisers as they can more easily rationalize exercise discomfort, exhaustion and fatigue since they better anticipate such feelings upfront, and can better predict the course of the recovery process from past experience. Finally, the affective response may also be moderated by attitudes on exercise, in that the experienced aversive effects during exercise are lower if the intrinsic value of the activity is considered to be high, for instance through a strong expectancy of health benefits or aesthetic body-shape benefits.

Apart from the short-term appetitive and aversive effects, longer-term reinforcement effects also weigh in to determine who becomes a regular exerciser, as depicted in the lower half of Fig. 2. The self-determination theory (Deci & Ryan, 1985) assumes experience of competence and self-worth to be one of our core psychological needs. The perception of competence when engaging in exercise activities will, therefore, lead to the repetition of exercise activities. Put succinctly, people generally like doing what they are good at, and will pursue those activities in leisure time as much as possible. Competence was indeed positively associated with exercise activities across many different samples (Teixeira, Carraca, Markland, Silva, & Ryan, 2012) and perceived self-efficacy remains the strongest correlate of regular exercise behaviour in the extant literature (Bauman et al., 2012; McAuley & Blissmer, 2000). In reverse, given the strong positive cultural attitudes towards exercise ability, those who do not perform as well as their peers may experience exercising as a threat to their self-worth. This will increasingly lead them to avoid it, as indicated by the punishment loop in the bottom of our model.

Perceived competence and physical self-efficacy build on the repeated experience of good exercise ability, which is often measured by comparison to the performance of others or the performance of past self. For a large part, perceived exercise ability will depend on the actual exercise ability. Such ability is influenced by skills specific to the exercise activity performed (particularly in sports), but a number of general characteristics including endurance capacity, balance, flexibility, and static and dynamic muscle strength are strong predictors of the ability to perform a variety of sports and exercise activities (Kenney, Wilmore, & Costill, 2020). This introduces a feedback loop into the model as the development of exercise ability depends in part on regular exercise behaviour. However, the effects of exercising on exercise ability are more nuanced than folk wisdom of the ‘10000 h rule’ has us believe (Ericsson, Krampe, & Teschromer, 1993). Whereas partaking in exercise will act to increase exercise ability in all individuals, some clearly fare better than others explaining why the relationship between exercise ability and regular physical activity is low to moderate in both human and animal studies (Lightfoot, 2013). This is caused by individual differences in ‘trainability’, which is the concept that some participants respond favourably to exercise, whereas others hardly respond at all. In studies carefully controlling for exercise dose and adherence, large differences in trainability have been reported for endurance capacity (Ross et al., 2019) and muscle strength (He et al., 2018; Pescatello, Devaney, Hubal, Thompson, & Hoffman, 2013).

Differences in trainability will further influence the positive reinforcement or punishment signals created by exercising. People who notice that they gain more or faster in performance than others that followed the same exercise regime will experience stronger feelings of competence and mastery. People who achieve lower levels of performance may feel disappointment and perhaps even shame at not meeting up to standards, even after substantial training. Such standards will again be often derived from comparison to peers, and are strongest when exercise is performed in a competitive context. The importance of self-to-other comparison may have been enhanced by the introduction of mass sharing of exercise performance through social media (e.g. the Strava app). In a large study of social contagion in exercise behaviours in more than a million regular runners (Aral & Nicolaides, 2017) comparison to others' exercise performance was seen as powerful motivator to change one's own exercise behaviour.3 Comparisons to those that perform better than we do, as well as comparisons to those that perform worse, can be motivators to change our own exercise behaviour. However, a much larger impact on the urge to increase exercise behaviour was found for downward comparisons than for upward comparisons, most strongly so in men (Aral & Nicolaides, 2017).

8. Exercise giving rise to mental health

The subjective perception of exercise ability and trainability may be important determinants of exercise-induced increases in physical self-esteem, which may translate to general self-esteem. Indeed, self-esteem is one of the earliest recognized correlates of regular exercise (Sonstroem & Morgan, 1989). When these repeated boosts in self-esteem are combined with the net ‘feel good’ affective response to exercise, a powerful mix is created to increase psychological wellbeing. Extensive interviews with persistent exercisers, recent adopters, fitness program dropouts and persistent sedentary individuals by Gauvin (1990) suggest that exercisers differ from individuals with less active lifestyles mainly in that they enjoyed the exercise itself and felt that something was missing in their life when they did not regularly exercise. At least for regular exercisers, exercise does seem to cause mental health. For the people populating the two punishment loops of our model, on the other hand, regular exercise may not be a stable source of mental health.

This of course leads to the important question of why only some of us end up being reinforced by exercise activities, whereas others seem to experience exercise mostly as aversive. Fig. 3 introduces genetic factors as an important source of these individual differences. In doing so, it assumes that heritable factors contribute to the affective response to exercise (G1) and its moderators (G2-G5), as well as the trainability (G6) and basic fitness traits (G7) that increase perceived exercise ability (G8). Evidence for the former, a significant genetic contribution to the affective response during and after exercise, comes from a large experimental study of young adult twins (Schutte et al., 2019; N. M.; Schutte et al., 2017a, Schutte et al., 2017b). A set of twins and their siblings (N = 499) completed two submaximal exercise tests on a cycle ergometer and a treadmill and a maximal exercise test on a cycle ergometer. Genetic factors explained 15% of the individual differences in affective responses (assessed by the Feeling Scale) during the cycle ergometer test, as well as 29% and 35% of the individual differences in Borg's Rating of Perceived Exertion (RPE) during the cycle ergometer and treadmill tests, respectively. Post—exercise affect measured with the Activation-Deactivation Adjective Checklist (AD ACL) yielded heritability estimates ranging from 17% to 37% after submaximal exercise and from 12% to 37% after maximal exercise. Without exception, more positive affective responses were associated with higher amounts of regular exercise activity.

Genetic moderation of the affective response to exercise (labelled G1 in Fig. 3) may explain why regular exercisers report greater acute exercise-induced mood enhancement than non-exercisers (Hoffman & Hoffman, 2008). Although some first candidate genes have been nominated (Bryan, Hutchison, Seals, & Allen, 2007; Karoly et al., 2012; Lee, Emerson, Bohlen, & Williams, 2018) the exact genetic variants influencing the affective response to exercise remain to be uncovered. Genetic variants could impact on the psychological exercise response of opioid, dopaminergic, or monoaminergic systems during exercise (Chaouloff, 1997; Dishman, 1997; Dubreucq et al., 2013; van der Mee et al., 2017) or on the post-exercise reduction of the sympathetic nervous system activity (Halliwill, 2001; Yamamoto, Miyachi, Saitoh, Yoshioka, & Onodera, 2001) and parasympathetically mediated heart rate recovery (Nederend, Schutte, Bartels, Ten Harkel, & de Geus, 2016). In view of the often-expressed ‘stress-buffering’ effects of exercise, genetic differences in the exercise-induced physiological hyporeactivity to psychological stressors (Hamer, Taylor, & Steptoe, 2006) could also play a role.

A slew of studies testify to the heritability of the proposed moderators in the model of the affective response during and after exercise. Studies using cross-strain comparisons and selective breeding for spontaneous voluntary wheel running suggest a strong genetic contribution to the activity drive in rodents (Kelly et al., 2010; Lightfoot et al., 2017). In humans, anorexia nervosa (AN) is increasingly regarded to reflect an excessive activity drive (Casper, 2018) which may explain part of AN's substantial heritability (Dinkler et al., 2019). Studies on personality (de Moor, van den Berg, & Boomsma, 2013; Sanchez-Roige, Gray, MacKillop, Chen, & Palmer, 2018; Stoel, De Geus, & Boomsma, 2006; Vukasovic & Bratko, 2015) and self-control (Willems, Boesen, Li, Finkenauer, & Bartels, 2019) have amply demonstrated the importance of genetics for these traits. Perhaps more surprisingly we showed that genetic factors also contribute to the formation of attitudes on exercise. Attitudes are typically depicted as shaped mainly through the socio-cultural and family environment, but around half of the variance in the perceived benefits of exercise as well as the experienced barriers to exercise (lack of enjoyment, lack of energy, and embarrassment) could be attributed to heritable factors (Huppertz et al., 2014; Schutte et al., 2019).

What about genetic influences on the lower loop of the model in Fig. 3? Here too, a large body of evidence shows that general fitness characteristics like strength (Schutte et al., 2016a, Schutte et al., 2016b) and endurance (Miyamoto-Mikami et al., 2018; Schutte et al., 2016a, Schutte et al., 2016b) are highly heritable traits. In the HERITAGE study, Bouchard and colleagues have furthermore established that the individual differences in trainability appear to derive for a large part from genetic variation between the good and bad responders (Bouchard, 2012; Bouchard & Rankinen, 2001). Whereas actual exercise ability will be an important contributor to perceived exercise ability, the latter is credited by the model with the strongest rewarding effects of exercise behaviour. So far, only one twin study has targeted perceived exercise ability (Schutte et al., 2019) asking young adult participants to compare their own sport performance, endurance capacity and muscle strength to that of their peers and to indicate on a 10- point scale, ranging from “very bad” (1) to “really good” (10) how well they generally performed at exercise sports activities. A substantial heritability estimate of 66% was found on a compound score of perceived exercise ability in this age group. Genetic effects on perceived exercise ability in different age cohorts remain to be established.

In summary, there is ample evidence to support a genetic effect on the affective response to exercise and its moderators (G1 to G5 in Fig. 3) as well as the trainability, basic fitness traits, and perceived exercise ability (G6 to G8 in Fig. 3). The strongest support for the model in Fig. 3, however, does not come from showing substantial univariate heritability of these traits. The model implies a necessary overlap between the genetic factors G1 to G8 and the genetic factors underlying the heritability of voluntary exercise behaviour. If the reinforcement loops shown are correct, the genetic factors influencing, e.g. the affective response (G1) or the perception of exercise capability (G8) should be part of the genetic factors causing the heritability of regular exercise behaviour. This is a strong assertion made by the model and is depicted by copying G1 to G8 and placing them on the left side of Fig. 3 as part of the full set of genetic factors that influence regular voluntary exercise behaviour.

This assertion can be empirically tested by computing the genetic correlation of regular voluntary exercise behaviour with the traits influenced by G1 to G8. This genetic correlation should be positive and significant. We have systematically confirmed this to be the case (Huppertz et al., 2014; N.M.; Schutte, Bartels, & de Geus, 2017; Schutte et al., 2019; N. M.; Schutte, Bartels, & de Geus, 2017). In a prospective twin study in adolescents and young adults we tested whether the genetic factors influencing personality, affective response to exercise, perceived benefits and barriers, objective exercise ability (strength and aerobic fitness), and subjective exercise ability at baseline were correlated to the genetic factor influencing exercise behaviour at a 3-year follow up (Schutte et al., 2019). At baseline, 29% of the variance in regular exercise behaviour could be explained by concurrent levels of extraversion, energy/vigour after submaximal exercise, perceived benefits and barriers, subjective ability and exercise-testing derived aerobic fitness. At the 3-year follow up, still 18% of the variance in regular exercise behaviour was explained by the baseline levels of these predictors. Importantly, the genetic correlations were all significant, indicating the predicted overlap in the genetic factors influencing all the predictors at baseline and the genetic factors influencing regular exercise behaviour at follow-up.

While independent replication by others is direly needed, our empirical work has thus far supported the model presented in Fig. 3. I hasten to add that the model is far from complete, and biased towards genetics. Missing from the model are the potential effects of exercise on epigenetic regulation of DNA and on the composition and diversity of the microbiome (McGee & Hargreaves, 2019; Monda et al., 2017). Changes in DNA methylation and changes in the gut-brain axis can both impact on mental health (Oh et al., 2015; Rudzki & Maes, 2020). Furthermore, social and environmental determinants were all cramped into a single and rather unspecified ‘determinants of engagement in exercise activities’ block. Our own twin studies have corroborated the importance of non-genetic factors as strongly as that of genetic factors. We found that in childhood common environmental factors that are shared by siblings of the same family play a much larger role in exercise behaviour than genetic factors (Huppertz et al., 2016; N.M. Schutte, Bartels, & de Geus, 2017). This may largely reflect a positive effect of social support by parents and siblings on exercise behaviours, reported by many others (Dishman, Sallis, & Orenstein, 1985; Gustafson & Rhodes, 2006; Haidar, Ranjit, Archer, & Hoelscher, 2019; Sallis, Prochaska, & Taylor, 2000; Soto, Arredondo, Haughton, & Shakya, 2018). In adolescence, parental effects on exercise behaviour tend to wane, apart from a small father-to-son cultural transmission, but now the peers start to exert a significant influence on exercise behaviour (de Moor et al., 2011). Finally, in adulthood, the exercise behaviour of the partners with whom one shares a household becomes an important environmental determinant of exercise behaviours (van der Zee et al., 2020).

Even with regard to the genetic effects, the model is likely to be incomplete. Apart from the genetic factors influencing the rewarding effects of exercise, many other genetic factors can also play a role (abbreviated as “Gx” in Fig. 3). For instance, susceptibility to musculoskeletal injury has a clear genetic component (Collins, September, & Posthumus, 2015; Ryan-Moore, Mavrommatis, & Waldron, 2020) and repeated injury will obviously influence the enjoyment of exercise behaviour and even induce active aversion. Body composition is another example of a highly heritable trait (Hemani et al., 2013) that is still missing from the model, even though one of the genome-wide significant variants for physical activity (in the CADM2 gene) has also been associated with BMI (Locke et al., 2015). There is a clear association between exercise behaviour and overweight in adults, which is bidirectional (Richmond et al., 2014): low levels of exercise exert a negative influence on the energy balance, but being overweight may itself impede engagement in regular exercise or, through social stigma, hamper its enjoyment (Ball, Crawford, & Owen, 2000).

As a third shortcoming, I note that a substantial part of the data corroborating the model in Fig. 3 was entirely based on self-report of voluntary exercise activities of moderate to vigorous intensity done in leisure time. Whereas the self-report bias for this type of activity may be less severe as for other forms of MVPA as argued above, it will not be zero. In addition, active transportation, intensive gardening, do-it-yourself home improvements, dancing or non-voluntary physical activity related to work and school including light physical activities may actively influence mental health through routes that are independent from those shown in the model. Future studies employing both self-report and accelerometer are needed to clarify this. Self-report may be better suited to detect the type of exercise activities people engage in, as these are still hard to extract from accelerometer data. Accelerometers, in turn, do much better in measuring the true frequency, duration and intensity of the activities. In general, genetically informative studies combining self-report (type of exercise activity) with accelerometer recordings (objective quantification of frequency, duration and intensity) would be more informative than the currently available data.

Acknowledging these shortcomings, the model does have considerable strengths. Few, if any, other models of exercise behaviour in exercise psychology have attempted to address the elephant in the room: the substantial heritability of exercise behaviour. Genetics outperforms all other determinants by an order of magnitude when it comes to the amount of between-subject variance explained. The model accommodates this and provides a mechanistic model that is completely compatible with the simultaneous presence of genetic horizontal pleiotropy and true causality in the association between exercise and mental health. Improved mental health in regular exercisers can be attributed to its feel-good and self-esteem effects, but these effects are strongly moderated by genetic variation such that not all people reap these benefits to the same extent. Because it is exactly the psychological benefits of exercise that increase the likelihood of future exercise behaviour, the genetic variants involved in this moderation will act as pleiotropic genes and create the genetic correlation between regular exercise behaviour and mental health that we repeatedly encountered in twin studies. This pattern of causality in the presence of genetic confounding clearly arose from the review of the genetic epidemiology of the exercise–mental health link above.

The model makes testable predictions about the factors that put people at risk for failure to adopt and maintain regular exercise behaviour, which can be used to improve exercise interventions. A sad but common misunderstanding is that demonstrating the heritability of a trait removes the rationale for intervention on that trait (Plomin & Haworth, 2010). Does heritability of IQ imply that we abandon compulsory education? Does heritability of diabetes or any other medical condition imply we do not treat it? The answers to these questions are obviously in the negative. The categorical mistake underlying the sad but common misunderstanding of “genetics-as-predestination” is a mix-up of the effects on the mean and on the variance of a trait. When it comes to regular exercise behaviour, genetic factors explain a chunk of the variance around the mean population level. This is true before an exercise intervention program and it will remain true well after that program. Nevertheless, the expected pre- and post-intervention heritability in no way prevents an intervention to have a beneficial effect on the population mean. Taken the large health benefits of increasing mean exercise behaviour of the population, the key question is not whether the knowledge of the genetic moderation proposed above still allows for successful intervention (it does), but how such knowledge can help make future interventions more successful.

The core idea of personalised medicine is that people respond differently to different treatments - based in part on their genetic make-up - and that predicting their response can help choosing the best treatment. Understanding the genetic pathways that lead to differences in voluntary exercise behaviours can likewise identify individual-specific biological and psychological determinants that would be solid targets for personalised intervention. Furthermore polygenetic risk scores could be used directly to improve intervention, by differentially predicting who is it increased risk for aversive affective responding to exercise, injury sensitivity, or low exercise ability/trainability. This is no longer ‘future medicine’. Polygenic risk scores for coronary artery disease already compete with the Framingham risk score or monogenic variants in predicting who should be prioritised for early intervention on obesity or coronary artery disease (Khera et al., 2018, 2019). As we have now entered the era of genome-wide association studies, personalised interventions employing polygenic risk for different causes of ‘exercise aversion’ may finally start to come to fruition.


This work did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.

Declaration of competing interest

The author has no conflicts of interest for this work.

Given his role as Editorial Board Member, Eco de Geus had no involvement in the peer-review of this article and has no access to information regarding its peer-review. Full responsibility for the editorial process for this article was delegated to Ana Abrantes.

1Such direction-of-causation twin models do exist (Heath et al., 1993) but they require the variance components to be firmly different for the two traits and measurement errors to be minimal (or well estimated).

2This chip heritability based on common SNPs is substantially lower than the twin-based heritability. This difference, also known as ‘missing heritability’, does not point to a flaw in either GWA study or twin design. Instead, precise estimation of the effects of all causal SNPs requires an even larger GWA study then currently employed with better tagging of the causal SNPs and ideally also adding the effects of the rare variants and epistatic interactions.

3As an aside, I note that this study also nicely demonstrated how classical epidemiology could overestimate effect sizes by ~80% in comparison to an instrumental variable approach.