Back to search

# Knowledge Based Non-Stationary Modeling

#### Awarded: NOK 9.2 mill.

Project Number:

250362

Application Type:

Project Period:

2016 - 2020

Location:

Subject Fields:

Partner countries:

Challenge 1: Use statistical models to estimate annual runoff. Annual runoff is a measure of the amount of water that flows through a river during a year. Knowledge about annual runoff is important when planning a new drinking water reservoir or a hydro-power plant. For most rivers in the world, there are no measurements of runoff because it is too expensive and time consuming to take measurements in every river. Hence, in such rivers, we must estimate runoff. This can be done by using data from nearby rivers, precipitation data and other relevant observations. Often it is of interest to have information about the long-term behaviour of a river, e.g. how much water it stores over a 30 year period. This is a measure of how much water that is available, e.g. for hydropower production. In some rivers, there are only 1-5 years of measurements when we are interested in 30 years. Often, these short data records are simply omitted from the runoff estimation. The reason is that these data are regarded as too uncertain when estimating the average runoff for a 30 year period. In this project, we have developed statistical models that are able to exploit the short data records. The models we have developed use correlation between rivers: Two rivers that are located close to each other probably have more in common than rivers that are far away from each other in terms of streamflow generation. Further, we utilise that runoff patterns repeat themselves over time. For example, on average it rains more in Bergen than in Oslo, and in Oslo it rains more than in Skjåk, and if it is particularly rainy in Oslo one year, it is probably still even more rain in Bergen. By exploiting these properties in our statistical models, we are also able to exploit the short data records. Our results show that we get considerably better estimates of the 30 year runoff in Norway when having only one annual observation available, compared to when having zero observations available. This shows that a lot information is lost when omitting the short data records from the runoff estimation. In our work we have also explored how different data types can be combined to improve our runoff estimates. Challenge 2: Use statistical models to predict quantitative traits from pedigrees and gene data. In the field of quantitative genetics, scientists are interested in identifying genes and regions in the DNA that affect various physical traits. They are also interested in estimating how much of an observed trait is due to genetics and how much is due to the environment. This is especially useful for plant and animal breeders. If we can estimate how much of a trait is due to genetics and how much is due to the environment, we can select individuals with high genetic values for traits of interest and use these for breeding. In this way, we can improve characteristics in plants and animals over time. It is, for example, possible to develop new wheat varieties that produce higher grain yields than the varieties we have today. With biotechnology, it is possible to extract genetic markers from the DNA of plants and animals. These markers can be used to predict genetic values for individuals that have not yet developed the traits that we want to improve, so that selection in the breeding process can be performed at a much earlier stage than with traditional breeding without the use of genetic markers. It is usually not straightforward to estimate how much of a trait is due to genetics and how much is due to the environment. Statistical trials, models and methods are necessary, and researchers are continuously working to improve their methods. This project contributes to their efforts by proposing statistical models for estimation of genetic values in agriculture. Some of the models we use are also relevant for studies of wild animal populations and studies of hereditary diseases in humans. The models are focused on the inclusion of prior knowledge that we have about the underlying processes we study. This knowledge is both within the environmental process, such as the dependency in the environment between farms in close distance, and the genetic processes, such as known genes that may be important for the genetic markers or the dependency between genetic markers due to several generations of mutations. We fit the models to both real and simulated data, and the results show that inclusion of knowledge about dependency in environment or knowledge about marker type and dependency, can improve estimates and predictions of genetic values.

Prosjektet sitt hovudbidrag er nye klassar av metodar for å oppnå gode analyser når ein nyttar ulike datakjelder og å gjere det mogeleg å ta inn kunnskap om avhengighetstrukturar. Dette har mest effekt i seinare bruk av desse metodane, både i forsking og meir direkte som kunnskapsgrunnlag for beslutninger. Eit eksempel på effekt som kan ha stor betydning er bruk av korte tidsserier av avrenning for avrenningskart, som igjen blir brukt i for eksempel planlegging og dimensjonering av infrastruktur. Basert på resultata i prosjektet vurderer no NVE å nytte den utvikla metodikken når dei no utarbeider nye avrenningskart. Eit anna eksempel på verknad er forbetra avl basert på gardsbruk med få dyr, typisk i utviklingsland, basert på genetiske data og romlige strukturar.

The project is based on collaboration between researchers in statistics, quantitative genetics and hydrology. From a statisticians point of view the goal is to develop statistical methods and knowledge for non-stationary processes. We have chosen two important challengers, one in quantitative genetics and one in hydrology. Quantitative genetics challenge: Predict breeding values and identifying quantitative trait loci from SNP-panel data and pedigree information. Hydrology challenge: The problem of ungauged basins, i.e. challenge of estimating streamflow variables for locations where no streamflow observations are available. Formulating models as solutions to stochastic partial differential equation (SPDE) has been demonstrated to enable fast inference as INLA (integrated nested Laplace approximations) can be used. The SPED-formulation also provides a flexible framework for non-stationary models, and we therefore chose to focus on what we call non-stationary latent SPDE models. We first formulate existing useful models as latent SPDE-models. Next these models are extended to non-stationary models using the SPDE formulation. For non-stationary models identifiability challenges have been found. We develop tools, experience and knowledge to tackle these challenges. We further investigate how identifiability of non-stationary parameters can be improved in ways that are realistic from the selected challenges point of view. When non-stationary models are evaluated based on predictive performance, there are seldom substantial improvement observed. We develop methodology for evaluating predictions, and investigate in which settings, realistic and interesting from the selected challenges, predictive performance is improved when using non-stationary models. To make the developed methods available to users, we provide software, documentation and courses as well as journal articles and presentations at conferences. Further, meetings for potential new users are organized.