Back to search

BIOTEKNOLOGI-BIOTEKNOLOGI

KSP: Cataloging and utilizing structural variants in DNA to improve sustainability of Norwegian livestock production

Alternative title: Beskrivelse og anvendelse av strukturelle varianter i DNA for økt bærekraft i norsk husdyrproduksjon (CAUSATIVE)

Awarded: NOK 9.8 mill.

The process of domesticating cattle and pigs has accelerated markedly in recent decades as specialized breeding companies provide farmers with sperm from individuals judged as “superior” from observable performance (e.g. growth rate, fertility). These traits are now being supplemented with more subtle quantitative measures such as milk and meat composition, and disease resistance, animal welfare and behavior etc. Breeding companies now use advanced genetic methods to measure a multitude of genetic markers linked to desirable traits and guide matings to enhance these within the population. The most widely adopted analytical approach involves measuring thousands of tiny variations in DNA called single nucleotide polymorphisms (SNPs) that together reflect some of an animals genetic value. Unfortunately, this well-established approach fails to capture genetic variation contained in structural variations (SVs). This class of variants includes large deletions, insertions or rearrangements of genetic information, which can radically impact an animal’s biology. Further, the lab tools used to gather SNP data were developed based on information from breeds popular in the US or UK and overlook genetic characteristics specific to Norwegian breeds. More simply put, the unique genetic makeup of Norwegian pig and cattle breeds is, today, not accounted for when choosing sires for future, sustainable animal production. CAUSATIVE seeks to address these limitations by developing representations of the DNA in Norwegian cattle and pig breeds that capture SVs from multiple individuals. This data will improve accuracy and precision when calculating genetic value based on SNPs, allow breeders to incorporate knowledge about SVs in their evaluations (something that has been completely overlooked before now) and understand more about the way an animal’s DNA code affects its biology and health. Our activities began with collection of biological samples from 15 boars (Landrace) and 17 bulls (Norwegian Red; NRF) that showed little family relatedness and together represent a genetic cross-section of the breeds. After purification, their fragmented chromosomal DNA was translated from genetic code to computer code using long-read sequencing technologies. Once available as a data file, a high-performance computer cluster reassembled the 2.5 trillion characters of code into multiple continuous DNA sequences representing chromosomes from each individual. DNA from one single bull and boar was subjected to additional sequencing and assembly to produce a high-quality gapless reference genome for these Norwegian breeds. This immediately allowed us to incorporate 90 million bases (Mb) of new sequence into the Landrace assembly and 136 Mb into the NRF and refine the gene content by including hundreds of previously missing genes or correctly relocating genes that were misplaced in the US/UK references. A comprehensive search for sequence representing centromere and telomers (the hard-to-assemble middle and ends of chromosomes) reveals that these new, bespoke reference genomes are gapless (ie. continuous sequence) and largely complete, missing only some sequence at the chromosome ends. These references (which are publicly available and soon to be described in scientific journals) will serve as foundations for the construction of genome graphs (also known as pan-genomes). A genome graphs main advantage over single arbitrary reference genomes is that it simultaneously represents multiple genomes with their genetic variation and gives a much more realistic overview of a breed or population. This complex but accurate representation of genomes is gaining traction among livestock breeders globally. Our unique data and perspectives on Norwegian breeds have led us to become contributors and participants within both the international bovine and porcine pan-genome initiatives. Using these genome graphs, we were able to reveal >70,000 previously unidentified SVs in NRF and >100,000 in Landrace. A significant proportion of these (1/3rd ? ½) occur within gene regions, implying that they could affect the gene expression and subsequently affect traits. Our modelling predicts that these lists include the bulk of the most frequent SV’s in these breeds, and that additional sequencing will reveal more rare variants. Graphs are now being used to re-analyse existing sequencing data from many hundreds of animals provided by project partners. Armed with a high-quality representation of the genomes from NRF and Landrace, and comprehensive catalogues of SNP (almost 20M) and SV variants (almost 200K) we are now beginning the process of predicting (imputing) SVs and SNPs into many hundreds-of-thousands of individuals for which we have low resolution (but high accuracy) SNP signature information. Imputation results will be used in the remaining project period to search for associations between SVs and traits related to sustainability.

For centuries, farmers bred superior livestock by observing how particular individuals, or their offspring, perform in terms of growth, fertility etc. In recent decades, the challenge of producing superior animals has been addressed by breeding companies by implementing elaborate systems to methodically record production traits. By considering this information within a known pedigree structure, it was possible to ensure that animals with superior potential were maintained and used for breeding. This success is based on the fact that genetic variation explains (to varying degree) trait variation. Over the last 10 years, breeding companies have transitioned towards genetic testing as a strategy to measure individual genetic variation with unprecedented accuracy; so called SNP genotyping. Combined with extensive measurement recordings and classical methodology this has allowed them to significantly improve multiple traits simultaneously, and today thousands of cattle and pigs are genetically tested each year and their genetic value calculated. Unfortunately the entire approach is founded on testing one specific type of genetic variation and disregards the important class structural variants (SVs) present in all genome. Moreover, all analysis is founded on a gold-standard reference genome representing a single individual from non-Norwegian breeds. In CAUSATIVE, scientists at NMBU and from Norwegian breeding companies (Geno and Norsvin) will use state-of-the-art sequencing and bioinformatic tools to build novel reference genomes for Norwegian breeds that capture all structural variations present in the population. Breeding companies will use these resources to improve their ability to calculate breeding values and to find ways to select for SVs that until now have been invisible in SNP genotyping data. Finally, we will use our new understanding of genome architecture to identify SVs associated with increasing sustainable production.

Publications from Cristin

No publications found

No publications found

No publications found

No publications found

Funding scheme:

BIOTEKNOLOGI-BIOTEKNOLOGI