HELSEVEL-Gode og effektive helse-, omsorgs- og velferdstjenester

Breast cancer is the most common cancer among women in Norway and worldwide. Preventing breast cancer is difficult, but early detection through mammographic screening is an effective way to reduce breast cancer deaths. The standard screening procedure in Norwegian Breast Cancer Screening Program (NBCSP) takes x-ray images of each breast (mammograms) from two different angles. Two radiologists independently read all mammograms. If either of the radiologists identifies suspicious findings, a consensus meeting is held to decide whether the woman should be recalled for further assessment. Most women attending screening do not have any signs of breast cancer, 93% of the screening mammograms show no signs of breast cancer. As a result, today's radiologists spend a substantial amount of their time reading normal mammograms with no signs of breast cancer. With recent advancements in artificial intelligence, more specific machine learning (ML), there is a potential to improve the NBCSP. The main aim of this project was to develop a model that can be used to read mammograms in screening. ML was used to develop automatic systems that picks out images that most likely show no signs of breast cancer. This may contribute to more breast cancer cases being detected at an early stage, fewer people being diagnosed with breast cancer between two screening rounds, and radiologists spending more time on women who shows signs of breast cancer. The project received a "pilot dataset" from the University Hospital of Northern Norway in 2018. In 2020 mammograms and screening information from St. Olavs hospital and Møre and Romsdal hospital trust were transferred to the Cancer Registry and the project, while in 2021 the project has received data from the University Hospital of Northern Norway and from four hospital trusts in Helse Sør-Øst. Processes related to preparing for data collection, and receiving the data, have been more time consuming than expected due to legal aspects and procedures for extracting the data at some of the health trusts. Earlier in the project we had to rely on a pre-trained model for testing and development. The data received during 2020 and 2021 has opened the possibility of training models from scratch on the Norwegian data. The model has been tested on a dataset not used for development of the algorithm. The results showed that there is a great potential for increasing the sensitivity of mammographic screening by detecting more breast cancers based on finding on screening mammograms, reducing interval cancer cases, and reducing the workload of the radiologists, by implementing ML in mammography screening. We also expect artificial intelligence to reduce the burden of false positive screening results, overdiagnosis and thus overtreatment, and eventually reduce breast cancer mortality. We have also worked on plans for hoe artificial intelligence can be used in the NBCSP. We have particularly focused on describing the characteristics and requirements of the methods and of the screening service that will affect the choices that can be made. In this project we have contributed to fill large knowledge gaps related to artificial intelligence in radiology and screening. We consider the preliminary results to be very promising and look forward to further studies and thus a solid knowledge base for implementing ML in the NBCSP.

The machine learning (ML) model developed in this project was based on deep convolutional neural networks trained on more than 1,5 million images from more than 300 000 mammographic exams from the Norwegian Breast Cancer Screening Program (NBCSP). Due to the time-consuming process of accessing images, about 1 million more are now available and will be included in future development of the model. While most models in this field are trained from pixel-level annotations, which are time-consuming to produce, the present ones were trained from image-level diagnosis only. The architecture of the model used a two-stage process, where the first stage was a relatively standard convolutional network (resnet101) trained to classify down-sampled versions of the images as positive (having cancer) or negative. When this Holistic model was fully trained, a method from explainable AI known as ‘layered gradcam’ was used to identify the part of the image that contributed the most to a higher output score. This was defined as the Holistic model’s region of interest (ROI). A separate resnet101 model, called Attention, was trained to classify the ROIs as positive or negative utilizing the full image resolution without down-sampling. The model was tested on a subset of data not used in training and evaluated by the area under the curve (AUC) metric. An ensemble model consisting of the Holistic model and three parallel Attention models reached an AUC = 0.960 for screen-detected cancer and an AUC = 0.917 for all cancers (screen-detected and interval cancers). These results showed far better results than those from a pre-trained external model and the results were comparable to values derived from the double radiologist reading scores with an AUC of 0.984 for screen-detected cancers and 0.893 for all cancers. The results showed that there is a great potential for increasing the quality of the NBCSP by increasing the sensitivity by detecting more interval cancers, and reducing the workload of the radiologists, by implementing ML in screening. We also expect ML to reduce the burden of false positive screening results, overdiagnosis and the adherent overtreatment, and in the long run, reduce breast cancer mortality. The use of ML can also lead to a larger capacity to do screening. Today NBCSP targets women aged 50-69. The European Commission Initiative on breast cancer just stated conditional evidence for screening women aged 45-49 and 70-74. If the target group will be expanded, increased capacity is needed. Demand is also expected to rise for population-based screenings for more cancer forms. Hence, increasing capacity using ML can save lives and give new opportunities for improved health services.

Breast cancer is the most common cancer among women in Norway and worldwide. Since the cause of breast cancer is not known, mammographic screening is offered as a secondary prevention, aimed at reducing the mortality from the disease. About 500 000 women have participated in the Norwegian Breast Cancer Screening every second year since the program was made nationwide in 2005. The radiologists spend a substantial time interpreting screening mammograms of healthy women, as about 7% of the exams are discussed at consensus, 3-4% are recalled for further assessment and 20% of those recalled, 0.6% of the attending women, are diagnosed with breast cancer and additional 0.17% are diagnosed before the next screening. By exploiting machine learning in the process the aim is to reduce the recall rate, the rate of missed screen-detected and interval breast cancer and obtain knowledge which can help us reducing overdiagnosis and overtreatment, which again will reduce the disease specific mortality. By achieving this goal, we will be able to reduce the human and financial burden of mammographic screening. A realistic ambition is that 100 women will get a breast cancer diagnosis 1-4 years earlier. An on-the-fly control of the image quality may reduce the number of recalls of 1 200 women annually and also improve the image quality in the further assessment. The project take advantage of three main factors: There has been a revolution in machine learning, also on medical images where machine learning together with experts is better than only human expertise. Our database with mammograms is at least 20 times larger than any published study. This is critical for machine learning. We will focus on questions that are relevant for the Norwegian Breast Cancer Screening Program. The project will build world leading competence which is also valuable for other screening programs and other medical applications.