Back to search

NAERINGSPH-Nærings-phd

Tuning the Model Factory

Alternative title: Forskningsbasert forbedring av modellfabrikken til DNB

Awarded: NOK 1.8 mill.

Project Number:

313047

Application Type:

Project Period:

2020 - 2024

Funding received from:

Organisation:

Location:

DNB has for several years been using a framework for generating prediction models, which is used for example to predict the customers’ propensity to buy a product. Such models enable personalization of digital communication, as well as achieving more sales with fewer calls at the customer center. The customers are used to receive high quality personalized from tech giants such as Facebook and Netflix, constantly raising the bar for what is considered good relevant marketing. New algorithms and analysis platforms are rapidly increasing the modelling opportunities. A central part of the modelling process is choosing the settings used in the algorithm. This is often referred to as hyperparameter tuning. Methods to do so exist, but the most common are not systematic, require many runs, are not suited for high-dimensional spaces and do not yield information about how the parameters affect the functionality of the algorithm. In addition, the methods have not been thoroughly compared. Therefore, the original goal of this project was to test a more methodical approach to hyperparameter tuning, by using design of experiments to find the hyperparameters that affect the predictive power of the algorithm the most. Thereby the number of hyperparameters to test can be narrowed down, and the effect of each hyperparameter can be estimated. To get a solid theoretical foundation, new methods within design of experiments should also be developed. In the applied part of the project, a new kind of marketing models was tested, to expand the framework for prediction models and thereby yield increased benefits to DNB. The new model type is called uplift models, and are aimed at identifying the customers that are most positively affected by marketing. Despite gathering and analyzing data from a variety of campaigns, the resulting uplift models were weak, and neither suited for practical use nor publication. Meanwhile, a well-functioning framework for hyperparameter optimization had been implemented. It was therefore decided to let the project focus mainly on design of experiments. The final thesis therefore consists of four works within that field, in addition to a description of potential use cases within DNB. Methodology from design of experiments may be useful in a wide variety of areas relevant for DNB, for instance online controlled experiments, conjoint analysis, online active learning and the previously mentioned tuning of hyperparameters.

Den opprinnelige planen var å bruke forsøksdesign til hyperparametertuning for å finne innstillinger for maskinlæringsalgoritmene benyttet til personalisering hos DNB. Et stykke ut i prosjektet ble det besluttet å utvide planen til å også omfatte testing av hyperparametertuning på en ny type markedsføringsmodeller, uplift-modeller. Dette er modeller som tar sikte på å identifisere kundene med størst økning i kjøpssannsynlighet som følge av en kommunikasjon, og er dermed veldig relevante for DNB, som har et fokus på kostnadseffektiv personalisert markedsføring. Ved å teste metodikken på uplift-modeller ville DNB kunne få en ny modelltype i rammeverket og dermed økt nytte av prosjektet. Data til modellering ble samlet inn i tre forskjellige sammenhenger, men ingen av testene resulterte i modeller med god nok prediksjonsevne til å være nyttige i praksis. Dette skyldes trolig svake signaler i dataen og datainnsamling som kunne vært gjort på en bedre måte. Dette har vært diskutert internt, samt presentert på Math meets industry-konferansen i Trondheim i 2022. Uplift-tankegangen har blitt gjort kjent i relevante miljøer, og koden er tilgjengelig for ytterligere testing om det skulle dukke opp relevante bruksområder. En velfungerende automatisert prosedyre for hyperparametertuning ble implementert i rammeverket mens kandidaten hadde foreldrepermisjon, så ved gjenopptakelse av prosjektet ble det besluttet at fokuset i resten av prosjektperioden skulle legges på forsøksdesign. Avhandlingen består derfor av fire arbeider innenfor forsøksdesign, samt en oversikt over bruksområder knyttet til forsøksdesign som kan være relevante for banken. Bruksområdene omfatter conjoint analyse, online controlled experiments, online active learning samt tuning av hyperparametre. Resultatene i paperne er generelle, og kan være nyttige i mange sammenhenger der man vurderer bruk av screening-design. Dette er spesielt relevant i industrielle settinger der gjennomføring av forsøk er kostnadskrevende. To av arbeidene omhandler gjennomføring av forsøk under ulike forhold (blokkdeling), de to andre fokuserer på analyse av screening-design.

Design of Experiments (DoE) is a field within statistics in which the goal is to plan the collection of data to accommodate analysis and possibly optimization of a process. Typically, one wants to find the variables impacting the response, model the relationships and suggest settings for maximizing the response. This can be done using sequential testing. In the first step, called screening, a large number of variables are tested in a limited number of runs, trying to identify the few important ones. Then follow-up runs are used to further investigate the relationship between the selected variables and the response. If all runs cannot be performed under the same conditions, they must be divided into blocks, and the impact of the blocks must be considered when performing the analysis. In this thesis, the emphasis is on improving the screening phase by suggesting new methods for blocking and analyzing popular screening designs. This methodology is general and can be useful in a wide variety of settings where the data collection can be planned and gathering information is costly. One of the objectives of the thesis is to review how DoE can be applied within the field of machine learning, especially considering use cases within finance.

Publications from Cristin

No publications found

No publications found

No publications found

No publications found

Funding scheme:

NAERINGSPH-Nærings-phd