Học bổng PhD tại Pháp về Recombination

Trong mùa thu 2017 này, nhóm nghiên cứu của Tiến sỹ Pierre Sourdille tại Viện Nghiên Cứu Nông Nghiệp Quốc Gia Pháp (INRA – Institut National de la Recherche Agronomique) đang tìm kiếm một nghiên cứu sinh PhD để tham gia dự án PredHaptor. 

Để nộp đơn ứng cử, quý độc giả xin vui lòng liên hệ và gởi curriculum vitae đến Dr. Sourdille hoặc Dr. Sophie Bouchet tại địa chỉ email sau:

Dr. Pierre Sourdille:

Directeur de Recherche
Tel. : +33 (0)4 43 76 15 17
Fax : +33 (0)4 43 76 15 10
Mob. : +33 (0)7 77 37 08 28
Mail : pierre.sourdille@inra.fr

Dr. Sophie Bouchet:

Chargée de recherche, phD
Tel. : +33 (0)4 43 76 15 09
Mobile : +33 (0)6 95 80 93 14

Để biết thêm về học bổng, quý độc giả xin vui lòng liên hệ nhóm IBSG qua email: admin@ibsgacademic.com.

Nguồn: Dr. Nguyễn Tấn Trung

Thông tin về dự án nghiên cứu như sau:

Estimation and integration of historical recombination rates in genomic prediction models


Recombination (or crossover: CO) is a major process in species improvement, since it contributes to generating new allele combinations and thereby exploitable genetic diversity. However, recombination rate is fairly low (~three COs/Chromosome/Meiosis) and localized in small regions (hotspots)  unevenly distributed along the chromosomes. This aspect is only poorly taken into account in the models predicting cross value. In the PredHaptor project, we propose to integrate historical recombination pattern data, derived from a whole-genome analysis, in cross value prediction for breeding programs. We will focus our analyses on wheat (bread and durum), pig, and small ruminants. We will estimate the patterns of ancestral recombination in these five species (T1). We will compare the location of the recombination events between the different homoeologousgenomes of bread wheat (ABD) and durum wheat (AB) as well as between these two related species (T2). We will develop a new utility criterion to take into account the pan-genomic recombination pattern to predict cross value (T3). We will simulate breeding schemes with this new criterion to optimize the genetic gains at mid to long-term scale and with a fixed number of generations, that will be compared to gains obtained with more classical approaches (T4).

Scientific Context:

Increasing both animal and crop production in the world becomes an urging task to meet the growing demand resulting from an expanding population that will meet 9 billion people in 2050 as well as new industrial uses such as biofuels. Moreover, this challenge has to be met while cultivated areas and natural resources are limited, and facing ecological constraints to reduce the environmental footprints of crop production (i.e. use less water, fertilizers, pesticides, fungicides…). Rapid development of new varieties that sustain crop yield and with improved quality and nutritional values is thus necessary. This relies on the creation of new allele combinations through breeding processes, both by mixing alleles existing in an elite germplasm and by introgression of new variability from genetic resources.

Genebanks provide desirable alleles for genetic improvement of crops. But the use of landraces is often hindered by unfavourable linkage between desirable and undesirable alleles. For a long time, this diversity has not been fully used because of a lack of genome information. Molecular markers like SNPs have been used for Marker Assisted Selection (MAS) by identifying genes controlling variation of key agronomic traits through Genome-Wide Association Mapping (GWAM) and monitoring favorable alleles in the breeding program. High throughput cheap genotyping and genomic predictions enable systematic exploration and extraction of targeted germplasm for complex trait improvement. Breeding values are molecular scores predicted for one trait or a combination of traits (Meuwissenet al. 2001). Methodological developments are in progress to use molecular scores to predict cross value and design new breeding schemes. The objective is to optimize mating decisions, i. e. cross parents with higher chances to generate recombinant progenies with many desirable alleles.Instead of estimating breeding values of individuals per se, which represent predictions of general combining ability (GCA), we want to predict specific combining ability (SCA)  (van Berloo and Stam 1998; Han et al. 2017). This can help converting useful diversity into breeding-ready genetic resources and maintaining adaptation capacity of germplasm in rapidly changing environments.

The statistical model that is commonly used in genomic predictions (Genomic BLUP : GBLUP) simultaneously fits all allelic effects, considering them as random effects. In this model we consider that effects are additive and breeding values are predicted by the sum of allele effects (Meuwissenet al. 2001). To predict cross value,i. e. the probability that a pair of parents will produce a gamete with maximum desirable alleles, the more simple utility criterion is the sum of best allele effects among both parents at each marker. Different metrics have been proposed and tested (Lado et al 2017). By considering evenly distributed crossovers (CO) along the genetic map, the parent mean predicts well the mean of the progeny. It is more challenging to predict the variance and the best progeny, especially in a limited number of generations and population size. The best allele combination is rarely reachable, especially when considering several traits, sometimes negatively correlated. We also know that recombination events (or crossovers, COs) are not evenly distributed. They occur at the formation of the female and male gametes during a process called meiosis. COs are more frequent in the distal parts of the chromosomes (telomeres) while they can be almost absent of the pericentromeric regions covering sometimes several hundreds of megabases. Moreover, COs are located in small regions of a few hundreds of bases called hotspots that show extremely high recombination rates while in other regions there is only a few COs, the intensity of recombination is much lower. Recombination rate has been shown to be correlated with gene density in wheat (Saintenacet al. 2011) with enrichment in promoters in sorghum (Bouchetet al. 2017). It is correlated with  GC content in different species (Galtieret al. 2001; Birdsell 2002; Marais et al. 2004; Meunier and Duret 2004), transposable elements content (Duretet al. 2000; Rizzonet al. 2002; Wright et al. 2003) and diversity (Hellmann et al. 2003; Bouchetet al. 2017). Recombination rate has been shown to be highly predictable in different species. It is correlated in different experimental crosses in maize (Bauer et al. 2013; Rodgers-Melnicket al. 2015) and sorghum (Bouchetet al. 2017). Historical recombination rates in global  diversity panels  has also been shown to be correlated with experimental population in wheat (Darrieret al. 2017). This means that some genetic mechanisms exist that control recombination and that we can predict recombination profiles. To predict cross value, i. e. estimate the probability to get agronomically interesting recombinants knowing parent genotypes, we can simulate progenies in silico and / or calculate metrics (utility criterion) that use the recombination profile. For wheat, such as other species with long chromosomes and rabl conformation (Chouletet al. 2014), we know that genome features and recombination rate are partitioned. Daetwyleret al (2015) partitioned the wheat genome in simulations to predict the best haploid value. Predictions using chromosomes divided in three segments were the best. Zhong and Jannink (2007), Bonk et al (2016), Han et al, (2017) included recombination rates (linkage map) in their model. We propose to generalize those methodologies by using a utility criterion that takes whole genome recombination rates as an input vector estimated using historical recombinations observed in diversity panels. We will test the efficiency of the model to predict crosses to produce genetic gain using a fixed number of generations, for different trait architectures.

Using coalescence theory and haplotype information will give better estimation of recombination rates. Reconstruction of the full ancestral recombination graph is challenging because the space of possible graphs is extremely large. Methodologies to identify ancestry segments are in progress and we evaluate most recent ones in the Haptitude project. We can jointly estimate haplotype sharing along the genome using approximate sampling methods such as  Hidden Markov Model (HMM) (Lawson et al. 2012; Hellenthalet al. 2014) or Multiple Sequentially Markovian Coalescent (MSMC) HMM implemented in the Pairwise Sequential Markovian Coalescent (PSMC’) algorithm (Schiffels and Durbin, 2014) for instance, recombination rate and Watterson’s mutation rate (Li and Stephens 2003; Cardin 2006). Note that although Lawson’s HMM algorithm is implemented for SNP data, PSMC’ is implemented for sequence data (exome capture or full sequence) only.

We will use SNP and sequences data that are available for four species of agronomical interest (bread wheat, durum wheat, pig and sheep). Once identified, these ancestral recombination break-points will be compared between genomes and between species concerning bread and durum wheat which are two closely related polyploid species (ABD and AB genomes respectively). Such a comparative analysis has never been conducted in any species so far and is thus particularly original.

Project, Work plan, Experience, and Skills That Will Be Acquired by the PhD Student:

The project will be backed up to Haptitude, which aims at modelling haplotypes along the genomes of wheat, grape and pig. In polyploid wheats, we will extend the modelling by building across-genome haplotypes using the synteny that exists between homoeologous chromosomes. This will lead to the production of a public across-genome HapMap database. An analysis pipeline will be developed to produce recombination rates along haplotypes. From this point of view, the PredHaptor project follows the previous SELGEN BO-DeLiRe program, which studied the relationships between recombination and Linkage Disequilibrium in bread wheat and sheep. In BO-DeLiRe project, we successfully applied coalescent strategy to a limited set of bread wheat contigs (Darrier et al. 2017) and we showed that ancestral and actual recombination mainly co-localized on the genome. In the PredHaptor project, we will use the same strategy and we will evaluate new methods to estimate ancestral recombination rates and apply them at the whole-genome scale.

The project is thus based on the identification of these ancestral recombination break-points in the four species, bread and durum wheat, pig and sheep. This rely on coalescent analysis, which uses the combination of high-density genotyping data and whole genome-assembly sequences.

Regarding genomic prediction, an R-package will be developed that compute a utility criterion that takes into account the recombination pattern. Another R package that predicts crossing schemes to optimize genetic gain for a fixed number of generations will be developed.

The PredHaptor project will benefit from various projects that generated (or will very soon generate) many useful and essential data. For bread wheat, the BREEDWHEAT project aiming at developing tools and methodologies for genomic breeding in wheat, has genotyped 5000 exotic or synthetic lines from the world diversity and 2000 elite lines with an array of ~420,000 SNPs distributed on all chromosomes. For durum wheat, 480 wheat varieties including wild species have been genotyped using the same 420K array in the course of the SELGEN CropDL project. All two species have (or will have soon) a high standard and fully annotated sequence of their respective reference genomes.

The project will be organized into four main tasks. In Task 1 (T1), we will collect and analyze carefully the data that will be produced in the projects described above. For the four species, we will develop core-collections representative of the available diversity, which will be used for ancestral recombination break-points analysis. We will analyze each full collection to identify its structure and the number of groups. Then, we will select a set of lines, representative of each group according to the number of lines constituting each group to avoid unbalanced datasets. Several softwares are already available to apply coalescent theory (Choi et al. 2013; Hellsten et al. 2013) to a SNP data set. We already experienced PHASE 2.1.1 (Li and Stephens 2003; Crawford et al. 2004) to estimate the background recombination rate parameter, ρ, and to infer hotspot position between pairs of SNPs using lambda (λ) (Darrier et al. 2017). From the software output, we extracted in each interval the posterior distribution of λ. We used the median of this posterior distribution as an estimate of the interval specific recombination intensity. However, other softwares such as LDhelmet (Chan et al.  2012), LDHat (McVeanetAuton 2011), LDHot (Auton et al. 2014), Chromopainter (Lawson et al. 2012) could be used. The best model to estimate whole genome recombination rate in a diversity panel may be the Pairwise Sequential Markovian Coalescent (PSMC’ ) extension of Schieffels and Durbin (2014). We could locally compare PHASE and PSMC’ estimates that are time consuming algorithm with other whole genome estimates.

In the second task (T2), we will compare the recombination pattern between the homoeologous chromosomes of wheat. This will be based on the synteny that exists between the chromosomes that derive from a common diploid ancestor. We will identify syntenic blocks using the coding sequences that are well conserved between homoeologous chromosomes (IWGSC 2014; Glover et al. 2015). We will thus align recombination break points between A, B and D homoeologous chromosomes. Similarly, we will compare recombination between the same homoeologous chromosomes from bread wheat (ABD) and durum wheat (AB). Moreover, the comparative analysis could be completed by data derived from barley (H genome; data available from The James Hutton Institute), a diploid related species that diverged from wheat diploid ancestors (T. monococcumA genome, Ae. speltoides S genome related to the B genome, Ae. tauschii D genome) about 10 MYA. We will compare both recombination location and intensity and we will classify break points according to their age to estimate their possible evolution rate. We will compare historical recombination profiles with the last 20 years recombination profiles obtained on breeder (INRA-AO) and pre-breeder material (synthetic populations derived from crosses between tetraploids or hexaploids with tauschii) for which we know pedigrees.

In the task (T3), we will integrate the previous recombination profile from T1 in genomic prediction models of individual breeding values or cross values. We will focus on individual gametes to predict individual values of reproductive animals or crop parental lines and  F1 gametes derived from crosses to predict cross values in crop breeding schemes in particular. We will develop a utility criterion, ie a criterion that rank individuals or crosses according to the expected value of selection candidates at generation n, that integrate recombination rate information along the genome. The idea is not to reason with the expected mean of progeny but with the one of future candidates at generation n+1. It is an extension to Zhong and Jannink (2007) propositions that targeted crosses between inbred lines. With such a utility criterion, the rank of a candidate (individual or cross) at generation n will be correlated to the probability that it produces gametes that recombine in regions enriched in QTLs in repulsion phase.

Our approach will be to appreciate the value of this new utility criterion compared to other classical metrics, and to measure the sensitivity of this comparison to divers parameters characterizing breeding schemes (population size, genetic architecture of traits, the way we implement genomic selection). In thatpurpose, wewilltaketwodifferentapproaches:

  • the mathematical definition of a utility criterion corresponding to the proposed theory,
  • progeny simulations from known genotyped parents. Crossing-overs will not be positionned randomly on the genetic map but according to a recombination profile estimated on a diversity panel.

Finally, in Task 4 (T4), we will use this latter criterion to select individuals and /or crosses using simulations of several generations of selection, in order to optimize genetic gain at short and / or long term. This is based on the hypothesis that some combinations have a low probability of occurrence even when large progenies are considered. Rare recombination events will be taken into account in long term crossing schemes only. We will identify the crosses that exhibit the highest probability to generate a maximal genetic gain for a fixed number of generations.

The phD student will produce whole genome ancestral recombination rates data, which has only been scarcely done in the literature. This will be the first comparative analysis of break points between species   The Triticeae tribe is a relevant model for this pioneer study as genomes are highly conserved. He will produce analysis and decision tools that will be useful for plant and animal breeders to optimize their breeding scheme and monitor diversity levels in their germplasm.

During this project, the PhD student will acquire experience in population genetics and quantitative genetics, with both theoretical and applied aspects. He/she will develop skills in bioinformatics and sequence / synteny analyses, algorithm, pipeline and R package development.

P Sourdille and S Bouchet at INRA UMR GDEC will supervise the PhD student who will work at the interface of the two teams (GeCO& DGS). Both are already involved in different projects in relation with the PredHaptor project and having a PhD student is a great opportunity to improve the interactions between the two teams and develop future prospects. Moreover, the PredHaptor project was initially forecasted in none of the projects lead by P Sourdille and S Bouchet or by the teams they belong to. This is thus an opportunity to exploit the data that will be produced in the ongoing programs. Otherwise, they will be released in the public domain and they could thus be exploited by external laboratories. The PhD student will also benefit from the experience of the lab concerning quantitative genetics and bioinformatics since UMR GDEC has a dedicated bioinformatics platform for high-throughput sequence and data analyses. This will also be the opportunity to develop and distribute new bioinformatic tools.

The PredHaptor project is a collaborative project involving different teams working on wheat (bread and durum), pig, sheep and bioinformatics. This will thus be the opportunity for the student to participate to regular meetings where the main results of his/her PhD will be discussed and compared to those obtained on different species. This will undoubtedly give an added value to the subject and improve its perspectives and outcomes. The competences of the partners are diverse and complementary in quantitative genetics, population genetics, genomics and bioinformatics. Each partner will bring its own data set for genotyping, sequences and phenotypes on which the research questions will be evaluated. The conclusions from the project should allow a better exploitation of the recombination information to predict the crossing schemes for any species of interest and not only for those studied in the PredHaptor project.


  1. Auton A, Myers S, McVean G (2014) Detect recombination hotspots using population genetic data.http://arxiv.org/abs/1403.4264
  2. Bauer E., Falque M., Walter H., Bauland C., Camisan C., et al., 2013 Intraspecific variation of recombination rate in maize. Genome Biol. 14: R103.
  3. Berloo R. van, Stam P., 1998 Marker-assisted selection in autogamous RIL populations: a simulation study. Theor. Appl. Genet. 96: 147–154.
  4. Birdsell J. A., 2002 Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19: 1181–1197.
  5. Bonk S., Reichelt M., Teuscher F., Segelke D., Reinsch N., 2016 Mendelian sampling covariability of marker effects and genetic values. Genet. Sel. Evol. 48: 36.
  6. Bouchet S., Olatoye M. O., Marla S. R., Perumal R., Tesso T., et al., 2017 Increased power to dissect adaptive traits in global sorghum diversity using a nested association mapping population. Genetics 206: 573–585.
  7. Cardin N., 2006 Approximating the coalescent with recombination. University of Oxford.
  8. Choi K., Zhao X., Kelly K. A., Venn O., Higgins J. D., et al., (2013) Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters. Nat. Genet. 45: 1327–1336.
  9. Choulet F., Alberti A., Theil S., Glover N., Barbe V., et al., 2014 Structural and functional partitioning of bread wheat chromosome 3B. Science 345: 1249721.
  10. Crawford D. C., Bhangale T., Li N., Hellenthal G., Rieder M. J., et al., (2004) Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36: 700–706.
  11. Darrier B., Rimbert H., Balfourier F., Pingault L., Josselin A.-A., et al., 2017 High-Resolution Mapping of Crossover Events in the Hexaploid Wheat Genome Suggests a Universal Recombination Mechanism. Genetics: genetics-116.
  12. Duret L., Marais G., Biémont C., 2000 Transposons but not retrotransposons are located preferentially in regions of high recombination rate in Caenorhabditis elegans. Genetics 156: 1661–1669.
  13. Esch, E., Szymaniak, J. M., Yates, H., Pawlowski, W. P., & Buckler, E. S. (2007).Using crossover breakpoints in recombinant inbred lines to identify quantitative trait loci controlling the global recombination frequency. Genetics, 177(3), 1851-1858.
  14. Galtier N., Piganeau G., Mouchiroud D., Duret L., 2001 GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159: 907–911.
  15. Glover N, Daron J, Pingault L, Vandepoele K, Paux E, Feuillet C, Choulet F (2015) Small-scale gene duplications played a major role in the recent evolution of wheat chromosome 3B. Genome Biol. 16: 188-196
  16. Han Y., Cameron J. N., Wang L., Beavis W. D., 2017 The Predicted Cross Value for Genetic Introgression of Multiple Alleles. Genetics 205: 1409–1423.
  17. Hellenthal G., Busby G. B., Band G., Wilson J. F., Capelli C., et al., 2014 A genetic atlas of human admixture history. Science 343: 747–751.
  18. Hellmann I., Ebersberger I., Ptak S. E., Pääbo S., Przeworski M., 2003 A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72: 1527–1535.
  19. Hellsten U., Wright K. M., Jenkins J., Shu S., Yuan Y., et al., (2013) Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc. Natl. Acad. Sci. 110: 19478–19482.
  20. Holtz Y, Ardisson M, Ranwez V, Besnard A, Leroy P, Poux G, et al. (2016) Genotyping by Sequencing Using Specific Allelic Capture to Build a High-Density Genetic Map of Durum Wheat. PLoS ONE 11(5): e0154609. https://doi.org/10.1371/journal.pone.0154609.
  21. International Wheat Genome Sequencing Consortium (IWGSC) (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticumaestivum) genome. Science 345:1251788.
  22. Jordan KW, Wang SC, Chao S, Lun Y, Paux E, Sourdille P, Sherman J, Akhunova A, Blake NK, King R, Phillips AL, Uauy C, Dubcovsky J, Talbert L, Akhunov E (2017) Unraveling the genetic basis of recombination rate variation in wheat by nested-association mapping and reverse genetic scans. Nat Comm submitted
  23. Lado B., Battenfield S., Guzman C., Guincke M., Singh R.P., Dreisigacker S., Pen A.J., Fritz A., Silva P., Poland J., Gutierrez L, 2017 Strategies to select crosses using genomic prediction in two wheat breeding programs. Plant Gen.
  24. Lawson D. J., Hellenthal G., Myers S., Falush D., 2012 Inference of population structure using dense haplotype data.PLoS Genet. 8: e1002453.
  25. Li N., Stephens M., 2003 Modeling linkage disequilibrium and identifying recombination hotspots using Single-Nucleotide Polymorphism data. Genetics 165: 2213–2233.
  26. McVean G, Auton A (2011) LDhat 2.2: A package for the population genetic analysis of recombination. http://ldhat.sourceforge.net
  27. Marais G., Charlesworth B., Wright S., 2004 Recombination and base composition: the case of the highly self-fertilizing plant Arabidopsis thaliana. Genome Biol. 5: R45.
  28. Meunier J., Duret L., 2004 Recombination drives the evolution of GC-content in the human genome. Mol. Biol. Evol. 21: 984–990.
  29. Meuwissen T., Hayes B., Goddard M., 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819.
  30. Nachman, M. W. (2002). Variation in recombination rate across the genome: evidence and implications. Current opinion in genetics & development, 12(6), 657-663.
  31. Petit M., Astruc J.M., Sarry J., Drouilhet L., Fabre S, Moreno C., Servin B. Insights into the genetic determinism and evolution of recombination rates from combining multiple genome-wide datasets in Sheep. 2017. bioRxiv 104976; doi:https://doi.org/10.1101/104976.
  32. Rizzon C., Marais G., Gouy M., Biémont C., 2002 Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Res. 12: 400–407.
  33. Rodgers-Melnick E., Bradbury P. J., Elshire R. J., Glaubitz J. C., Acharya C. B., et al., 2015 Recombination in diverse maize is stable, predictable, and associated with genetic load. Proc. Natl. Acad. Sci. 112: 3823–3828.
  34. Saintenac C., Faure S., Remay A., Choulet F., Ravel C., et al., 2011 Variation in crossover rates across a 3-Mb contig of bread wheat (Triticumaestivum) reveals the presence of a meiotic recombination hotspot. Chromosoma 120: 185–198.
  35. Schiffels S., Durbin R., 2014 Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46: 919–925.
  36. Simianer, H., Szyda, J., Ramon, G., & Lien, S. (1997). Evidence for individual and between-family variability of the recombination rate in cattle.Mammaliangenome, 8(11), 830-835.
  37. Vaissayre, L., Ardisson, M., Borries, C., Santoni, S., David, J., &Roumet, P. (2012). Elite durum wheat genetic map and recombination rate variation in a multiparental connected design. Euphytica, 185(1), 61-75.
  38. Wright S. I., Agrawal N., Bureau T. E., 2003 Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res. 13: 1897–1903.
  39. Zhong S., Jannink J.-L., 2007 Using quantitative trait loci results to discriminate among crosses on the basis of their progeny mean and variance. Genetics 177: 567–576.