Replication cohorts in microbial GWAS

Replication cohorts in microbial GWAS

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Replication in an independent cohort is of course the gold standard in GWAS studies, and many high profile journals will now (quite rightly) not accept finding indicating a phenotype genotype association without evidence of replication.

However GWAS is no longer the domain only of human genetics and is being increasingly used on microbial species, pathogenic on humans, animals, and plants. However having read many of these studies (mostly in plant pathology) i can find no discussion of technical replication practices in their methods. In my own research I intend to conduct GWA between snps and gene presence-absence genotypes gathered by illumina sequencing, and phenotypes of virulence (as a continuous quantitative variable) when inoculated onto the host plant (which is wheat), of a large population of a fungal pathogen. i have inoculated 120 plants with 120 isolates of the pathogen, and i'm an additional replication cohort of 80 plants and isolates.

My question is, what is required to make these additional inoculations an 'independent' cohort?

The idea of independence in technical replicates is to correct for systematic bias present in the discovery cohort, I will likely grow these plants in a separate greenhouse bay for example (although since the different host-isolate combinations are randomised within the greenhouse even this may not be necessary); in a human GWAS different genotyping methods would be ideal in the replication cohort due to possible introduction of bias in the snp-chip, in my own study i'm unsure if this would be necessary as variants in my data will undergo stringent filtering prior to genotype calling.

If anyone could help point out to me what my possible sources of systematic biases are, i'd be very grateful.

Replication assessment of NUS1 variants in Parkinson's disease

The NUS1 gene was recently associated with Parkinson's disease (PD) in the Chinese population. Here, as part of the International Parkinson's Disease Genomics Consortium, we have leveraged large-scale PD case-control cohorts to comprehensively assess damaging NUS1 variants in individuals of European descent. Burden analysis of rare nonsynonymous damaging variants across case-control individuals from whole-exome and -genome data sets did not find evidence of NUS1 association with PD. Overall, single-variant tests for rare (minor allele frequency<0.01) and common (minor allele frequency>0.01) variants, including 15 PD-GWAS cohorts and summary statistics from the largest PD GWAS meta-analysis to date, also did not uncover any associations. Our results indicate a lack of evidence for a role of rare damaging nonsynonymous NUS1 variants in PD in unrelated case-control cohorts of European descent, suggesting that the previously observed association could be driven by extremely rare population-specific variants.

Keywords: NUS1 Parkinson's disease Rare-variant burden.

Statistical correction of the Winner's Curse explains replication variability in quantitative trait genome-wide association studies

Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10-14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner's Curse (p < 10-16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium.

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1. Expected and observed replication rate…

Fig 1. Expected and observed replication rate per publication, stratified by journal.

Fig 2. Expected and observed rates of…

Fig 2. Expected and observed rates of replication in replication deciles.

Replication cohorts in microbial GWAS - Biology

A proportion of gut bacteria are heritable.

The impact of host genetics on the gut microbiome in humans is being revealed through genome-wide association studies.

The effect size of host genetics on the microbiome appears to be modest.

Several associations are found between the microbiome and genes associated with diet, innate immunity, vitamin D receptors, and metabolism.

A consistent genetic signal comes from pattern recognition receptor molecules, particularly C-type lectins.

The mammalian gut is colonized by trillions of microorganisms collectively called the microbiome. It is increasingly clear that this microbiome has a critical role of in many aspects of health including metabolism and immunity. While environmental factors such as diet and medications have been shown to influence the microbiome composition, the role of host genetics has only recently emerged in human studies and animal models. In this review, we summarize the current state of microbiome research with an emphasis on the effect of host genetics on the gut microbiome composition. We focus particularly on genetic determinants of the host immune system that help shape the gut microbiome and discuss avenues for future research.


Diabetes impacts approximately 200 million people worldwide [1], with microvascular and cardiovascular disease being the primary complications. Approximately 10% of cases are type 1 diabetes (T1D) sufferers, with ∼3% increase in the incidence of T1D globally per year [2]. It is expected that the incidence is 40% higher in 2010 than in 1998 [3].

T1D is a clear example of a complex trait that results from the interplay between environmental and genetic factors. There are many lines of evidence that there is a strong genetic component to T1D, primarily due to the fact that T1D has high concordance among monozygotic twins [4] and runs strongly in families, together with a high sibling risk [5].

Prior to the era of GWAS, only five loci had been fully established to be associated with T1D. However, the majority of the other reported associations in the pre-GWAS era [6]–[8] remain highly doubtful, where an initial report of association does not hold up in subsequent replication attempts by other investigative groups. This previous hazy picture of the genetics of T1D can be put down to the use of the only methodologies that were available at the time and which were much more limited than GWAS i.e. the candidate gene approach (where genomic regions were studied based on biological reasoning) and family-based linkage methodologies. Inconsistent findings can also be attributed to small sample sizes i.e. when power is low the false discovery rate tends to be high GWAS per se has not improved consistency, rather it has leveraged large, well powered sample sizes combined with sound statistical analyses.

It has been long established that approximately half of the genetic risk for T1D is conferred by the genomic region harboring the HLA class II genes (primarily HLA-DRB1, -DQA1 and -DQB1 genes), which encode the highly polymorphic antigen-presenting proteins. Other established loci prior to the application of GWAS are the genes encoding insulin (INS) [9]-[12], cytotoxic T-lymphocyte-associated protein 4 (CTLA4) [13]–[16], protein tyrosine phosphatase, non-receptor type 22 (PTPN22) gene [17], [18], interleukin 2 receptor alpha (IL2RA) [19]–[21] and ubiquitin-associated and SH3 domain-containing protein A (UBASH3A) [22].

The application of genome wide association studies (GWAS) has robustly revealed dozens of genetic contributors to T1D [23]–[29], the results of which have largely been independently replicated [30]–[36]. The most recently reported meta-analysis of this trait identified in excess of forty loci [29], including 18 novel regions plus confirmation of a number of loci uncovered through cross-disease comparisons [34]–[36]. As such, the risks conferred by these additional loci are relatively modest compared to the ‘low-hanging fruit’ described in the first studies and could only be ultimately uncovered when larger sample sizes were utilized.

We sought to expand further on this mode of analysis by combining our cohort with all publically released genome wide SNP datasets to identify additional loci contributing to the etiology of T1D. Unfortunately, there is a relative paucity of control genotype data in these publically available sources. To circumvent this problem, we combined individual level data from each available cohort and we then compared the cases with controls from two sources. We next separated all the individual level data into two groups, characterized by the type of genotyping platform that was used to genotype the samples, which would later be recombined using inverse-variance meta-analysis. The 6,523 cases genotyped on an Illumina BeadChip included subjects from McGill University, The Children's Hospital of Philadelphia (CHOP), The Diabetes Control and Complications Trial – Epidemiology of Diabetes Interventions and Complications (DCCT-EDIC) cohort, and the Type 1 Diabetes Genetics Consortium (T1DGC), which in turn were compared with 6,648 similarly genotyped controls recruited at CHOP. The 3,411 cases genotyped on Affymetrix arrays included subjects from the Genetics of Kidneys in Diabetes Study (GoKinD) and the Wellcome Trust Cases Control Consortium (WTCCC) that were then compared with 10,308 similarly genotyped controls, including being derived from non-autoimmune disease related cases from the WTCCC, as well as from the British 1958 Birth Cohort and the UK National Blood Service [24].

4. Genotyping Technologies

Genome-wide association studies were made possible by the availability of chip-based microarray technology for assaying one million or more SNPs. Two primary platforms have been used for most GWAS. These include products from Illumina (San Diego, CA) and Affymetrix (Santa Clara, CA). These two competing technologies have been recently reviewed [20] and offer different approaches to measure SNP variation. For example, the Affymetrix platform prints short DNA sequences as a spot on the chip that recognizes a specific SNP allele. Alleles (i.e. nucleotides) are detected by differential hybridization of the sample DNA. Illumina on the other hand uses a bead-based technology with slightly longer DNA sequences to detect alleles. The Illumina chips are more expensive to make but provide better specificity.

Aside from the technology, another important consideration is the SNPs that each platform has selected for assay. This can be important depending on the specific human population being studied. For example, it is important to use a chip that has more SNPs with better overall genomic coverage for a study of Africans than Europeans. This is because African genomes have had more time to recombine and therefore have less LD between alleles at different SNPs. More SNPs are needed to capture the variation across the African genome.

It is important to note that the technology for measuring genomic variation is changing rapidly. Chip-based genotyping platforms such as those briefly mentioned above will likely be replaced over the next few years with inexpensive new technologies for sequencing the entire genome. These next-generation sequencing methods will provide all the DNA sequence variation in the genome. It is time now to retool for this new onslaught of data.


We report the results of the largest GWAS for human eye color to date. In addition to confirming the association of SNPs in 11 previously known eye color genes (11, 13, 14, 17, 28), the identification of 50 novel eye color–associated genetic loci helps explain previously missing heritability of eye color variability in European populations. Moreover, because of the multiethnic design of our study, we demonstrate that several of the genetic loci discovered in Europeans also have an effect on eye color in Asians.

Eight of the genes in or near the loci newly associated with eye color in our study were previously reported for genetic associations with other pigmentation traits, such as hair and skin color, for instance, TPCN2, MITF, and DCT (27, 30, 32, 45). The commonality of associated DNA variants across the three pigmentation traits helps explain why the different pigmentation traits frequently (but not completely) intercorrelate in European populations. While many significant genetic associations are shared between iris color and other pigmentation traits, there are also notable differences. Although DNA variants within the MC1R gene are strongly associated with light skin and red hair color (27), no detectable association with eye color was found in our large GWAS, in line with previous albeit smaller-sized GWASs of more limited statistical power (11, 12, 14). Similarly, other DNA variants strongly associated with skin and hair color within genes, such as SILV, ASIP, and POMC (30), showed no statistically significant effect on eye color in this study, nor in previous studies. Moreover, we also identified 34 genetic loci that were significantly associated with eye color, but for which there is no report of significant association with hair and/or skin color. This is remarkable as the statistical power of the recent GWASs on hair color (31, 46) and sun sensitivity (32) were similar to that of our current eye color GWAS. Significant associations for SNPs in/near genes involved in iris structure, such as TRAF3IP1 and SEMA3A, suggest that they exert their effects with changes in Tyndall scattering, rather than through alterations of melanin metabolism. Overall, this demonstrates that although many genes overlap between eye, hair, and skin color, the different human pigmentation traits are not completely determined by the same genes as we showed.

The major strengths of our study compared with previous eye color GWASs arise from the larger sample size, which translated into increased statistical power and also the ability to lower the threshold of MAF for which sufficient power to detect association is available. Rare SNPs are often a source of considerable phenotypic variation (47). For instance, seven (6%) of the independently associated SNPs identified by conditional analysis in the discovery cohort had a MAF between 0.1 and 1%. Despite their low frequency, however, five (71%) of these rare SNPs were in the same region as other, more common conditional SNPs that did replicate. The remaining two loci (DAB2 and an intronic region on chromosome 4) that were not formally replicated should therefore be considered only as strong candidates with respect to their association with eye color, pending independent validation in future studies.

Another strength of this work is the inclusion of European and non-European populations. Non-European populations are underrepresented in the GWAS literature in general, including in pigmentation GWASs, but their study is important for the understanding of the genetic basis of human phenotypes (48). Although eye color variation is typically attributed to individuals of (at least partial) European descent, or those originating from areas nearby Europe, more subtle variation in brown eyes is also observed in Asian populations without European admixture (9). Our results from the Asian cohorts showed remarkable consistence in the genetic architecture of eye color among individuals of different continental ancestries with Asian replication for the two major European genes OCA2 and HERC2. Moreover, our findings also suggest that while a single regulatory variant in HERC2 is responsible for most blue/brown variation in Europeans (16), many additional DNA variants across both OCA2 and HERC2 seem to have independent effects. This hypothesis is further supported by our conditional analysis in the European discovery cohort, identifying independent associations spanning

14 mbp across both genes rather than a concentrated cluster centered at HERC2 rs1129038. This is remarkable given the large eye color variation from the lightest blue to the darkest brown in Europeans, compared with the more limited variation within brown eye color in Asians.

In conclusion, our work has identified numerous novel genetic loci associated with human eye color in Europeans, of which a subset also shows effects in Asians, despite their largely reduced phenotypic eye color variation compared with Europeans. The genetic loci we identified explain the majority (53.2%) of eye color phenotypic variation (classified using a three-category scale) in Europeans and a large proportion of the previously noted missing heritability of eye color. Our findings clearly demonstrate that eye color is a genetically highly complex human trait, similar to hair (31) and skin color (32), as highlighted recently in large European GWASs. The large number of novel eye color–associated genetic loci identified here provide a valuable resource for future functional studies, aiming to understand the molecular mechanisms that explain their eye color association, and for future genetic prediction studies, aiming to improve DNA-based eye color prediction in anthropological and forensic applications.


We thank the participants of all the cohorts for agreeing to join the study and field staff for their contributions in sample collection and community work. The help of Dr Seema Bhaskar, K Radha Mani and Inder Deo Mali, CSIR-Centre for Cellular and Molecular Biology, Hyderabad in genomic DNA isolation from blood samples and in managing the DNA samples is sincerely acknowledged. We acknowledge major contributions by S Rao, S Hirve, P Gupta, D S Bhat, H Lubree, S Rege, P Yajnik and the invaluable community work contributed by T Deokar, S Chaugule, A Bhalerao and V Solat from the KEM Hospital Research Centre, Pune. We are grateful to Professor Oluf Pedersen, Professor Niels Grarup and collaborators, Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark for providing anonymized genotype data for our replication study. We also thank Prof T Tanaka, Translational Gerontology Branch, NIA at Harbor Hospital, Baltimore, USA and acknowledge the contribution of the data from three studies, Sardinia, BLSA and InCHIANTI.

Conflict of Interest statement. None declared.


Council of Scientific and Industrial Research (CSIR), Ministry of Science and Technology, Government of India, India (XII Five-Year Plan titled “CARDIOMED”). Wellcome Trust, London, UK, Medical Research Council, London, UK and Department for International Development, UK. Parthenon Trust, Switzerland and ICICI Bank, Social Initiatives Group. Funding to pay the Open Access publication charges for this article was provided by Council of Scientific and Industrial Research (CSIR), Ministry of Science and Technology, Government of India, India.

Replication cohorts in microbial GWAS - Biology

Understanding host–microbe interactions remains critical as growing antibiotic resistance, new outbreaks, and re-emerging pathogens put lives at risk.

Microbiologists and infectious disease researchers have leveraged and advanced the genomic technological breakthroughs of the last decade.

Advances in clustered regularly interspaced short palindromic repeats (CRISPR/Cas9)-mediated genome editing and reduced sequencing costs have made CRISPR screens the dominant loss-of-function and gain-of-function screening platform.

Advances in sequencing, high-throughput technologies, and resources for model organisms and humans have dramatically improved natural diversity screens. Such resources include consortia with repositories for electronic medical records and human cell lines, and model-organism diversity panels.

Integrating pathogen diversity into host resistance screens helps to further define the genetic landscape of host–pathogen interactions.

Humanity’s ongoing struggle with new, re-emerging and endemic infectious diseases serves as a frequent reminder of the need to understand host–pathogen interactions. Recent advances in genomics have dramatically advanced our understanding of how genetics contributes to host resistance or susceptibility to bacterial infection. Here we discuss current trends in defining host–bacterial interactions at the genome-wide level, including screens that harness CRISPR/Cas9 genome editing, natural genetic variation, proteomics, and transcriptomics. We report on the merits, limitations, and findings of these innovative screens and discuss their complementary nature. Finally, we speculate on future innovation as we continue to progress through the postgenomic era and towards deeper mechanistic insight and clinical applications.

Ethics declarations

Ethics approval and consent to participate

Study approval was granted by the Health Research Ethics Committee of the Lagos University Teaching Hospital, Lagos, Nigeria (ADM/DCST/HREC/APP/1669). Written informed consent was obtained from the parents or legal guardian of the study participants.

Consent for publication

Competing interests

The authors declare that they have no competing interests with respect to the authorship and/or publication of this article.

Watch the video: Genome-Wide Association Study - An Explanation for Beginners (June 2022).


  1. Saramar

    Sorry, I pushed that idea away :)

  2. Chanoch

    his phrase is brilliant

  3. Odel

    It only reserve

  4. Bevyn

    Thanks for the help in this question. I did not know it.

  5. Vigami

    Mlyn, spammers have already got it freely with this primitive!

Write a message